From dmessina at wustl.edu  Sun Apr  1 22:54:58 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 1 Apr 2007 21:54:58 -0500
Subject: [Bioperl-l] installation bioperl
Message-ID: <6EFFF13A-66E7-418F-8B8E-A8AA8826DE83@wustl.edu>

We need more information to be able to help you. Could you please  
show us the actual output you see when trying to install Bioperl?

Also, we need to know:

- what operating system you have
- what version of Bioperl you are trying to install

See

http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance

and please read the rest of the document, too.

Dave


From aharry2001 at yahoo.com  Mon Apr  2 06:09:25 2007
From: aharry2001 at yahoo.com (Ambrose)
Date: Mon, 2 Apr 2007 03:09:25 -0700 (PDT)
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <B04E1B58-9BE1-407A-91D2-6EA9C0BA2A38@uiuc.edu>
Message-ID: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>

Hello All,
             I have some problems parsing KEGG using bioperl. I get out of memory problem.I current have 1G RAM.Can some tell me why this is happening and how it can be solved.It is beacuse the objects passed to bioiperl are so big or what?

best regrads
Ambrose

 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.

From cjfields at uiuc.edu  Mon Apr  2 08:43:18 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 07:43:18 -0500
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>
References: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>
Message-ID: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu>

This doesn't really explain much beyond stating you are having  
problems.  You need to post some code (to the mail list!) and let us  
know what version of BioPerl you are using.

chris

On Apr 2, 2007, at 5:09 AM, Ambrose wrote:

> Hello All,
>              I have some problems parsing KEGG using bioperl. I get  
> out of memory problem.I current have 1G RAM.Can some tell me why  
> this is happening and how it can be solved.It is beacuse the  
> objects passed to bioiperl are so big or what?
>
> best regrads
> Ambrose
>
>
> ---------------------------------
> TV dinner still cooling?
> Check out "Tonight's Picks" on Yahoo! TV.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From aharry2001 at yahoo.com  Mon Apr  2 09:56:33 2007
From: aharry2001 at yahoo.com (Ambrose)
Date: Mon, 2 Apr 2007 06:56:33 -0700 (PDT)
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu>
Message-ID: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>


Hello ALL,

I have the code below,which parses my kegg files.A host of the files are parsed and the information is inserted into my databases but unfortunate after the program runs for some hours it stops showing the message out of memory.I assume that this happens because the bioperl object is too big.Please just check the code below

best regards Ambrose


#!/usr/local/ActivePerl/bin/perl
#
#

use strict;
use Bio::SeqIO;
use Bio::FASTASequence;
use DBI;
use Benchmark  qw(:all) ;

my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1,$pathway_id1,$pathway_name1,$ec_num);
my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,%dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
my( @kg_id);
my $db="gbdb";
my $host="localhost";
my $userid="root";
my $passwd="ubuntu";
my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/mysqld.sock";
my ($t1,$t2);
my $dbh = DBI->connect($connectionInfo,$userid,$passwd);
my $time_used;
 
 
 eval { $dbh->do("DROP TABLE kegginfo") };
 print "Dropping kegginfo failed: $@\n" if $@;
 $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL AUTO_INCREMENT,
                                   up_id INT UNSIGNED REFERENCES uniprotinfo(up_id),
                                                                  filename VARCHAR(50),
                                                    kegg_id VARCHAR(50),
                                   keggaccn VARCHAR(50),
                                                                  description VARCHAR(250),
                                   ec_numbers VARCHAR(250),
                                              pathway_id VARCHAR(250),
                                              pathway_name VARCHAR(250),
                                              crc64 VARCHAR(50),
                                   ko_id VARCHAR(50),
                                   pfam_id VARCHAR(50),
                                   ncbigi_id VARCHAR(50),
                                   ncbigeneid_id VARCHAR(50),
                                   uniprot_id VARCHAR(50),
                                   prosite_id VARCHAR(50),
                                   PRIMARY KEY (kg_id)
                                 )");
                                 

eval { $dbh->do("DROP TABLE keggntsequence") };
print "Dropping keggntsequence failed: $@\n" if $@;
$dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id),
                                                    keggaccn VARCHAR(50),
                                  nucleotidesequence text
                                 )");

eval { $dbh->do("DROP TABLE keggaasequence") };
print "Dropping keggaasequence failed: $@\n" if $@;
$dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id),
                                                    keggaccn VARCHAR(50),
                                                    crc64 VARCHAR(50),
                                  aminoacidsequence text
                                 )");
eval { $dbh->do("DROP TABLE timestable") };
print "Dropping timestable failed: $@\n" if $@;
$dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT NULL AUTO_INCREMENT,
                                   genome VARCHAR(100),
                                    totaltime_seconds int(100),
                                                                  PRIMARY KEY(aut_id))");


open (LIST, "genomes.list") || die "Cannot open input kegg genomes file genomes.list\n $! \n";
$t1=new Benchmark;
my @genomelist = ();
while (my $line=<LIST>) {
    #ignore comment lines
    if ($line !~ /^#/) {
        chomp $line;
                
        push (@genomelist, $line); #store the filename
    }
}

close LIST;
my $count=0;
foreach my $genomefile (@genomelist) {

    #in case the user fails to remove some strange files from
    #the genomes.list file.. check for the KEGG format
    my $check=checkKeggFormat($genomefile);
    if ($check==0) {
        #if file is not kegg, start with the next one...
        print "ERROR: $genomefile doesn't look like a KEGG file to me! \n";
        #<stdin>;
        next;
    }
#print $genomefile,"\n";
    my $stream = Bio::SeqIO->new(-file => $genomefile, -format => 'KEGG');

    while ( my $seq = $stream->next_seq() ) {

        my $primary_id = $seq->primary_id;
        my $display_id = $seq->display_id; #name
        my $keggaccn   = $seq->accession; #accn
        my @description = $seq->annotation->get_Annotations('description');
        
        my @dblinks     = $seq->annotation->get_Annotations('dblink');
        my @orthologs   = $seq->annotation->get_Annotations('ortholog');
        my @orthologs   = grep {$_->database eq 'KO'} $seq->annotation->get_Annotations('dblink');
        my @class       = $seq->annotation->get_Annotations('pathway');
         $ntseq{$keggaccn} = $seq->seq;
         $aaseq{$keggaccn} = $seq->translate->seq; 
         $aaseq{$keggaccn} =~s /\*$//;
                 my $fasta = ">".$count."\n".$aaseq{$keggaccn};
         my $newseq = Bio::FASTASequence->new($fasta);
         $crc64{$keggaccn}=$newseq->getCrc64();
#print $keggaccn,"crc64:$crc64{$keggaccn}\n";
        
        $count++;
        if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no keggaccn\n";
        next;}    

        if(@dblinks)
        {
                my @dblink_KO=();
                my @dblink_Pfam=();
                my @dblink_PROSITE=();
                my @dblink_NCBIGI=();
                my @dblink_NCBIGENEID=();
                my @dblink_UniProt=();
        
                foreach my $ele (@dblinks) {
                    if ($ele =~ /^KO:/){
                        $ele=~s/KO://;
                        push (@dblink_KO,$ele);
                        $dblink_KO{$keggaccn}=$ele;
                        next;
                    }
                        #parse Pfam: dblink
                    if ($ele =~ /^Pfam:/){
                        $ele=~s/Pfam://;
                        push (@dblink_Pfam,$ele);
                        $dblink_Pfam{$keggaccn}=$ele;
                        next;
                    }
                        #parse PROSITE: dblink
                    if ($ele =~ /^PROSITE:/){
                        $ele=~s/PROSITE://;
                        push (@dblink_PROSITE,$ele);
                        $dblink_PROSITE{$keggaccn}=$ele;
                        next;
                    }
                        #parse NCBI-GI: dblink
                    if ($ele =~ /^NCBI-GI:/){
                        $ele=~s/NCBI-GI://;
                        push (@dblink_NCBIGI,$ele);
                        $dblink_NCBIGI{$keggaccn}=$ele;
                        next;
                    }
                        #parse NCBI-GeneID: dblink
                    if ($ele =~ /^NCBI-GeneID:/){
                        $ele=~s/NCBI-GeneID://;
                        push (@dblink_NCBIGENEID,$ele);
                        $dblink_NCBIGENEID{$keggaccn}=$ele;
                        next;
                        }
                        #parse UniProt: dblink
                    if ($ele =~ /^UniProt:/){
                        $ele=~s/UniProt://;
                        push (@dblink_UniProt,$ele);
                        $dblink_UniProt{$keggaccn}=$ele;
                        next;
                    }
            
                }#end foreach     #finished parsing all dblinks    
        }#end if @dblinks
        if(@class)
        {
            foreach my $pathway (@class) {
    
                $pathway=~s/^\s+|\s+$//;
                my @arr = split (/\s+/,$pathway);
                my $pathway_id = $arr[0];
                shift @arr;
                my $pathway_name = join(" ", at arr);
                $pathway_name{$keggaccn}=$pathway_name;
                $pathway_id{$keggaccn}=$pathway_id;
                #print $pathway_id{$keggaccn},"\t",$pathway_name{$keggaccn},"\n";
                                    
            }
            
        }
        
        my @ecnumbers=();
        @ecnumbers = extractECnumbers(@description);
        if(@ecnumbers)
        {
                if (@ecnumbers!=0) 
                {
                    foreach my $ecn (@ecnumbers) 
                    {
                       $ecnumbers{$keggaccn}=$ecn;
                    }#end foreach
                }
                else {
                    #print "ECnumbers:\n";
                     }
        }
        
        
#         print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t",$dblink_NCBIGENEID{$keggaccn},
#                 "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers{$keggaccn}","\t",
#                  "p1:$pathway_id{$keggaccn}","\t","p2:$pathway_name{$keggaccn}","\n";
#         
                $dbh->do("INSERT INTO kegginfo VALUES (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)",
         undef,"NULL","NULL",$genomefile,$display_id,$keggaccn, at description,$ecnumbers{$keggaccn},
                  $pathway_id{$keggaccn},$pathway_name{$keggaccn},$crc64{$keggaccn},$dblink_KO{$keggaccn},
                 $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn},$dblink_NCBIGENEID{$keggaccn},
                 $dblink_UniProt{$keggaccn},$dblink_PROSITE{$keggaccn});
         

        $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)",
            undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn});
                        

        $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)",
            undef,"",$keggaccn,$ntseq{$keggaccn});
                
               
    }
     $t2=new Benchmark;
    $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile");
    $dbh->do("INSERT INTO timestable VALUES (?,?,?)",
    undef,"NULL",$genomefile,$time_used);
 
}


$dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)");
print "Index created on kegginfo\n";

$dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence (kg_id,keggaccn)");
print "Index created on keggaasequence\n";

$dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence (kg_id,keggaccn)");
print "Index created on keggntsequence\n";


print"Updating the tables................\n";

    
$dbh->do("update kegginfo,keggaasequence set keggaasequence.kg_id=kegginfo.kg_id 
         where 
                kegginfo.keggaccn=keggaasequence.keggaccn");
        print " keggaasequence kg_id\n";

$dbh->do("update kegginfo,keggntsequence set keggntsequence.kg_id=kegginfo.kg_id 
         where 
                kegginfo.keggaccn=keggntsequence.keggaccn");
        print " keggaasequence kg_id\n";


sub extractECnumbers ($) {
    #sample description lines
     #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 2.7.7.2]
    #ATP synthase F0 subunit c [EC:3.6.3.14]

    my @desc=shift;
    my $description = join ("", at desc);
    my @ecnumbers=();
    #print "parsing ec for $description..\n";
    #check if EC number exists
    if ($description=~/\[EC:/) {
        
        my @array = split (/\[EC:/,$description);
        $array[1]=~s/]//g;
        shift @array; #remove the annotation , only EC numbers remain
        foreach my $ele (@array) {
            $ele=~s/^\s+|\s+$//g;
            $ele= "EC:".$ele;
            push (@ecnumbers,$ele);
        }    
        return @ecnumbers;
    }
    else {
        #return an empty value
        return ;

    }

}


sub checkKeggFormat ($) {
=head2

checkKeggFormat

make sure that the file is a valid KEGG file
function checks the first two lines,
1st must start with ENTRY
2nd must start with DEFINITION

returns 0 or 1

=cut
    my $genomefile=shift;

    open (TEST,$genomefile) || die "Cannot open file $genomefile for reading \n";
    my $testline=<TEST>;
#print "$testline\n";
    if ($testline=~/^ENTRY/) {
        #continue
        #$testline=<TEST>;#double check
        #if ($testline=~/^NAME/) {
            #this looks like a valid kegg file
            return 1;
        #}
        #else {
        #    close TEST;
        #    return 0;
        #}
    }
    else {
        close TEST;    
        return 0;
    }

}

sub timeThis ($$$) 
{
    my ($start,$end,$message) = @_;
    my $td = timediff($end, $start);
    my $t = timestr($td);    
        print "$message : ",$t,"\n";
        my @array = split (/\s+/,$t);
#20 wallclock secs (14.23 usr +  0.84 sys = 15.07 CPU)
        return $array[0]; #return the no. of seconds.
}

   
---------------------------------
Looking for earth-friendly autos? 
 Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.  

From e-just at northwestern.edu  Mon Apr  2 10:12:33 2007
From: e-just at northwestern.edu (Eric Just)
Date: Mon, 2 Apr 2007 09:12:33 -0500
Subject: [Bioperl-l] Can't locate object method "seq_start" via package
	"Bio::DB::GenBank"
Message-ID: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>

Hello,

I am getting this error while running a bioperl script that I had been using
in bioperl 1.4.  On upgradeing to bioperl 1.5.2 I get the following fatal
error

Can't locate object method "seq_start" via package "Bio::DB::GenBank"

My script is as follows:


use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

my $gb = new Bio::DB::GenBank();

my $query = Bio::DB::Query::GenBank->new(
      -query   =>'txid44689[Organism:noexp]',
      -reldate => 60,
      -db      => 'nucleotide'

);

my $in = $gb->get_Stream_by_query($query);

while ( my $seq = $in->next_seq()) {
      print "do something";
      #....
}


I noticed that seq_start is created in the begin block of
Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have
expericence troubleshooting this kind of autoloaded method.  Any idea where
to start?

Thanks

Eric

From e-just at northwestern.edu  Mon Apr  2 10:15:28 2007
From: e-just at northwestern.edu (Eric Just)
Date: Mon, 2 Apr 2007 09:15:28 -0500
Subject: [Bioperl-l] Can't locate object method "seq_start" via package
	"Bio::DB::GenBank"
In-Reply-To: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>
References: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>
Message-ID: <fa1fe35c0704020715u1f14f273n100d4e21f848603d@mail.gmail.com>

Sorry about that.

As soon as I sent the email I found my problem ( an old NCBIHelper in my
inheritance path ).   There is no bug here.

Eric


On 4/2/07, Eric Just <e-just at northwestern.edu> wrote:
>
> Hello,
>
> I am getting this error while running a bioperl script that I had been
> using in bioperl 1.4.  On upgradeing to bioperl 1.5.2 I get the following
> fatal error
>
> Can't locate object method "seq_start" via package "Bio::DB::GenBank"
>
> My script is as follows:
>
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> my $gb = new Bio::DB::GenBank();
>
> my $query = Bio::DB::Query::GenBank->new(
>       -query   =>'txid44689[Organism:noexp]',
>       -reldate => 60,
>       -db      => 'nucleotide'
>
> );
>
> my $in = $gb->get_Stream_by_query($query);
>
> while ( my $seq = $in->next_seq()) {
>       print "do something";
>       #....
> }
>
>
>
> I noticed that seq_start is created in the begin block of
> Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have
> expericence troubleshooting this kind of autoloaded method.  Any idea where
> to start?
>
> Thanks
>
> Eric
>

From cjfields at uiuc.edu  Mon Apr  2 11:32:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 10:32:59 -0500
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>
References: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>
Message-ID: <38475C93-FB21-4BC4-BF5D-7F48493E8EE2@uiuc.edu>

Ambrose,

Data is persisting in your hashes (in particular DBLink objects),  
which is eating away at your memory.  If I take a sample KEGG gene  
file and simply parse it:

while (my $seq = $io->next_seq) {
     print $seq->accession,"\n";
}

there are no memory issues, but if I store the data in hashes  
declared outside the loop:

my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);

while (my $seq = $io->next_seq) {
     # store Bio::Seq data in hashes
}

I see problems with only one genome file with KEGG records.  You'll  
definitely run into memory issues if you are parsing many genome  
files, which you appear to be:

my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);

for my $genomefile (@genomelist) {
     while (my $seq = $io->next_seq) {
         # store Bio::Seq data in hashes
     }
}

Localizing the hashes to the genome or sequence loops should prevent  
the memory problem.

Note that the DBLink Annotation objects are overloaded so they act  
like a string ($ele =~ /^KO:/) but are actually  
Bio::Annotation::DBLink objects, something we will likely get rid of  
in the near future.

chris

On Apr 2, 2007, at 8:56 AM, Ambrose wrote:

>
>
> Hello ALL,
>
> I have the code below,which parses my kegg files.A host of the  
> files are parsed and the information is inserted into my databases  
> but unfortunate after the program runs for some hours it stops  
> showing the message out of memory.I assume that this happens  
> because the bioperl object is too big.Please just check the code below
>
> best regards Ambrose
>
>
> #!/usr/local/ActivePerl/bin/perl
> #
> #
>
> use strict;
> use Bio::SeqIO;
> use Bio::FASTASequence;
> use DBI;
> use Benchmark  qw(:all) ;
>
> my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1, 
> $pathway_id1,$pathway_name1,$ec_num);
> my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
> dblink_NCBIGENEID,%dblink_UniProt);
> my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
> my( @kg_id);
> my $db="gbdb";
> my $host="localhost";
> my $userid="root";
> my $passwd="ubuntu";
> my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/ 
> mysqld.sock";
> my ($t1,$t2);
> my $dbh = DBI->connect($connectionInfo,$userid,$passwd);
> my $time_used;
>
>
>
>  eval { $dbh->do("DROP TABLE kegginfo") };
>  print "Dropping kegginfo failed: $@\n" if $@;
>  $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL  
> AUTO_INCREMENT,
>                                    up_id INT UNSIGNED REFERENCES  
> uniprotinfo(up_id),
>                                                                    
> filename VARCHAR(50),
>                                                     kegg_id VARCHAR 
> (50),
>                                    keggaccn VARCHAR(50),
>                                                                    
> description VARCHAR(250),
>                                    ec_numbers VARCHAR(250),
>                                               pathway_id VARCHAR(250),
>                                               pathway_name VARCHAR 
> (250),
>                                               crc64 VARCHAR(50),
>                                    ko_id VARCHAR(50),
>                                    pfam_id VARCHAR(50),
>                                    ncbigi_id VARCHAR(50),
>                                    ncbigeneid_id VARCHAR(50),
>                                    uniprot_id VARCHAR(50),
>                                    prosite_id VARCHAR(50),
>                                    PRIMARY KEY (kg_id)
>                                  )");
>
>
> eval { $dbh->do("DROP TABLE keggntsequence") };
> print "Dropping keggntsequence failed: $@\n" if $@;
> $dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED  
> REFERENCES uniprotinfo(kg_id),
>                                                     keggaccn VARCHAR 
> (50),
>                                   nucleotidesequence text
>                                  )");
>
> eval { $dbh->do("DROP TABLE keggaasequence") };
> print "Dropping keggaasequence failed: $@\n" if $@;
> $dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED  
> REFERENCES uniprotinfo(kg_id),
>                                                     keggaccn VARCHAR 
> (50),
>                                                     crc64 VARCHAR(50),
>                                   aminoacidsequence text
>                                  )");
> eval { $dbh->do("DROP TABLE timestable") };
> print "Dropping timestable failed: $@\n" if $@;
> $dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT  
> NULL AUTO_INCREMENT,
>                                    genome VARCHAR(100),
>                                     totaltime_seconds int(100),
>                                                                    
> PRIMARY KEY(aut_id))");
>
>
>
> open (LIST, "genomes.list") || die "Cannot open input kegg genomes  
> file genomes.list\n $! \n";
> $t1=new Benchmark;
> my @genomelist = ();
> while (my $line=<LIST>) {
>     #ignore comment lines
>     if ($line !~ /^#/) {
>         chomp $line;
>
>         push (@genomelist, $line); #store the filename
>     }
> }
>
> close LIST;
> my $count=0;
> foreach my $genomefile (@genomelist) {
>
>     #in case the user fails to remove some strange files from
>     #the genomes.list file.. check for the KEGG format
>     my $check=checkKeggFormat($genomefile);
>     if ($check==0) {
>         #if file is not kegg, start with the next one...
>         print "ERROR: $genomefile doesn't look like a KEGG file to  
> me! \n";
>         #<stdin>;
>         next;
>     }
> #print $genomefile,"\n";
>     my $stream = Bio::SeqIO->new(-file => $genomefile, -format =>  
> 'KEGG');
>
>     while ( my $seq = $stream->next_seq() ) {
>
>         my $primary_id = $seq->primary_id;
>         my $display_id = $seq->display_id; #name
>         my $keggaccn   = $seq->accession; #accn
>         my @description = $seq->annotation->get_Annotations 
> ('description');
>
>         my @dblinks     = $seq->annotation->get_Annotations('dblink');
>         my @orthologs   = $seq->annotation->get_Annotations 
> ('ortholog');
>         my @orthologs   = grep {$_->database eq 'KO'} $seq- 
> >annotation->get_Annotations('dblink');
>         my @class       = $seq->annotation->get_Annotations 
> ('pathway');
>          $ntseq{$keggaccn} = $seq->seq;
>          $aaseq{$keggaccn} = $seq->translate->seq;
>          $aaseq{$keggaccn} =~s /\*$//;
>                  my $fasta = ">".$count."\n".$aaseq{$keggaccn};
>          my $newseq = Bio::FASTASequence->new($fasta);
>          $crc64{$keggaccn}=$newseq->getCrc64();
> #print $keggaccn,"crc64:$crc64{$keggaccn}\n";
>
>         $count++;
>         if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no  
> keggaccn\n";
>         next;}
>
>         if(@dblinks)
>         {
>                 my @dblink_KO=();
>                 my @dblink_Pfam=();
>                 my @dblink_PROSITE=();
>                 my @dblink_NCBIGI=();
>                 my @dblink_NCBIGENEID=();
>                 my @dblink_UniProt=();
>
>                 foreach my $ele (@dblinks) {
>                     if ($ele =~ /^KO:/){
>                         $ele=~s/KO://;
>                         push (@dblink_KO,$ele);
>                         $dblink_KO{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse Pfam: dblink
>                     if ($ele =~ /^Pfam:/){
>                         $ele=~s/Pfam://;
>                         push (@dblink_Pfam,$ele);
>                         $dblink_Pfam{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse PROSITE: dblink
>                     if ($ele =~ /^PROSITE:/){
>                         $ele=~s/PROSITE://;
>                         push (@dblink_PROSITE,$ele);
>                         $dblink_PROSITE{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse NCBI-GI: dblink
>                     if ($ele =~ /^NCBI-GI:/){
>                         $ele=~s/NCBI-GI://;
>                         push (@dblink_NCBIGI,$ele);
>                         $dblink_NCBIGI{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse NCBI-GeneID: dblink
>                     if ($ele =~ /^NCBI-GeneID:/){
>                         $ele=~s/NCBI-GeneID://;
>                         push (@dblink_NCBIGENEID,$ele);
>                         $dblink_NCBIGENEID{$keggaccn}=$ele;
>                         next;
>                         }
>                         #parse UniProt: dblink
>                     if ($ele =~ /^UniProt:/){
>                         $ele=~s/UniProt://;
>                         push (@dblink_UniProt,$ele);
>                         $dblink_UniProt{$keggaccn}=$ele;
>                         next;
>                     }
>
>                 }#end foreach     #finished parsing all dblinks
>         }#end if @dblinks
>         if(@class)
>         {
>             foreach my $pathway (@class) {
>
>                 $pathway=~s/^\s+|\s+$//;
>                 my @arr = split (/\s+/,$pathway);
>                 my $pathway_id = $arr[0];
>                 shift @arr;
>                 my $pathway_name = join(" ", at arr);
>                 $pathway_name{$keggaccn}=$pathway_name;
>                 $pathway_id{$keggaccn}=$pathway_id;
>                 #print $pathway_id{$keggaccn},"\t",$pathway_name 
> {$keggaccn},"\n";
>
>             }
>
>         }
>
>         my @ecnumbers=();
>         @ecnumbers = extractECnumbers(@description);
>         if(@ecnumbers)
>         {
>                 if (@ecnumbers!=0)
>                 {
>                     foreach my $ecn (@ecnumbers)
>                     {
>                        $ecnumbers{$keggaccn}=$ecn;
>                     }#end foreach
>                 }
>                 else {
>                     #print "ECnumbers:\n";
>                      }
>         }
>
>
> #         print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t", 
> $dblink_NCBIGENEID{$keggaccn},
> #                 "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers 
> {$keggaccn}","\t",
> #                  "p1:$pathway_id{$keggaccn}","\t","p2: 
> $pathway_name{$keggaccn}","\n";
> #
>                 $dbh->do("INSERT INTO kegginfo VALUES  
> (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)",
>          undef,"NULL","NULL",$genomefile,$display_id, 
> $keggaccn, at description,$ecnumbers{$keggaccn},
>                   $pathway_id{$keggaccn},$pathway_name{$keggaccn}, 
> $crc64{$keggaccn},$dblink_KO{$keggaccn},
>                  $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn}, 
> $dblink_NCBIGENEID{$keggaccn},
>                  $dblink_UniProt{$keggaccn},$dblink_PROSITE 
> {$keggaccn});
>
>
>         $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)",
>             undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn});
>
>
>         $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)",
>             undef,"",$keggaccn,$ntseq{$keggaccn});
>
>
>     }
>      $t2=new Benchmark;
>     $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile");
>     $dbh->do("INSERT INTO timestable VALUES (?,?,?)",
>     undef,"NULL",$genomefile,$time_used);
>
> }
>
>
> $dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)");
> print "Index created on kegginfo\n";
>
> $dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence  
> (kg_id,keggaccn)");
> print "Index created on keggaasequence\n";
>
> $dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence  
> (kg_id,keggaccn)");
> print "Index created on keggntsequence\n";
>
>
> print"Updating the tables................\n";
>
>
> $dbh->do("update kegginfo,keggaasequence set  
> keggaasequence.kg_id=kegginfo.kg_id
>          where
>                 kegginfo.keggaccn=keggaasequence.keggaccn");
>         print " keggaasequence kg_id\n";
>
> $dbh->do("update kegginfo,keggntsequence set  
> keggntsequence.kg_id=kegginfo.kg_id
>          where
>                 kegginfo.keggaccn=keggntsequence.keggaccn");
>         print " keggaasequence kg_id\n";
>
>
>
> sub extractECnumbers ($) {
>     #sample description lines
>      #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26  
> 2.7.7.2]
>     #ATP synthase F0 subunit c [EC:3.6.3.14]
>
>     my @desc=shift;
>     my $description = join ("", at desc);
>     my @ecnumbers=();
>     #print "parsing ec for $description..\n";
>     #check if EC number exists
>     if ($description=~/\[EC:/) {
>
>         my @array = split (/\[EC:/,$description);
>         $array[1]=~s/]//g;
>         shift @array; #remove the annotation , only EC numbers remain
>         foreach my $ele (@array) {
>             $ele=~s/^\s+|\s+$//g;
>             $ele= "EC:".$ele;
>             push (@ecnumbers,$ele);
>         }
>         return @ecnumbers;
>     }
>     else {
>         #return an empty value
>         return ;
>
>     }
>
> }
>
>
>
>
>
>
>
> sub checkKeggFormat ($) {
> =head2
>
> checkKeggFormat
>
> make sure that the file is a valid KEGG file
> function checks the first two lines,
> 1st must start with ENTRY
> 2nd must start with DEFINITION
>
> returns 0 or 1
>
> =cut
>     my $genomefile=shift;
>
>     open (TEST,$genomefile) || die "Cannot open file $genomefile  
> for reading \n";
>     my $testline=<TEST>;
> #print "$testline\n";
>     if ($testline=~/^ENTRY/) {
>         #continue
>         #$testline=<TEST>;#double check
>         #if ($testline=~/^NAME/) {
>             #this looks like a valid kegg file
>             return 1;
>         #}
>         #else {
>         #    close TEST;
>         #    return 0;
>         #}
>     }
>     else {
>         close TEST;
>         return 0;
>     }
>
> }
>
> sub timeThis ($$$)
> {
>     my ($start,$end,$message) = @_;
>     my $td = timediff($end, $start);
>     my $t = timestr($td);
>         print "$message : ",$t,"\n";
>         my @array = split (/\s+/,$t);
> #20 wallclock secs (14.23 usr +  0.84 sys = 15.07 CPU)
>         return $array[0]; #return the no. of seconds.
> }
>
>
>
>
> ---------------------------------
> Looking for earth-friendly autos?
>  Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Mon Apr  2 12:19:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 2 Apr 2007 11:19:51 -0500
Subject: [Bioperl-l] installation bioperl
Message-ID: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>

Hi Fahmi,

Please include the list on the reply so that others can comment, too.

Yes, it appears the machine you are installing on does not have an  
internet connection. You probably will want to resolve that problem  
before dealing with Bioperl. Alternatively, you could simply install  
and use Bioperl  on the machine which does have an internet connection.

If you really need to get Bioperl installed on that machine, however,  
probably the easiest way would be to find a machine that does have an  
internet connection, install CPAN::Mini, and use it to make a local  
mirror of CPAN. You could then copy that local mirror over to the  
machine without the internet connection and point that machine's cpan  
at the local mirror (read the CPAN documentation to find out how to  
do this). Also, the BioPerl install instructions list several  
external packages that you will need to use some parts of Bioperl  
(e.g. GD). Again, you can download those distributions using the  
machine with the internet connection and copy them over.

Dave


On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote:

> thank you for answer. I will give you the maximum of informations  
> inorder to be able to diagnostic the problem:
>
> i use the linux mandriva 2006
> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained  
> from the url:
> http://www.bioperl.org/wiki/Release_1.5.2
> afetr that i made these commands which i found in the url
> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph  
> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ')
>
> >gunzip bioperl-1.5.2_102.tar.gz
> >tar xvf bioperl-1.5.2_102.tar
> >cd bioperl-1.5.2_102
> after that i made the command
> >perl Build.PL
> i obtained the text
> this package requires Module::Build v0.2805 or greater to install  
> itself
> install Module::Build now from CPAN?[y]
> i pushed enter and i obtained many lines such as
> System call"/usr/bin/wget -0-"ftp://.perl.org/pub/CPAN/modules/ 
> modlist.data.gz">home/fahmi/.cpan/sources/modules/03modlist.data
> Not connected
> cant access URL ftp://ftp.perl.org/CPAN/modules/modlist.data.gz
> ...
> i'm trying to install bioperl whithout having internet connection  
> beacause i don't know whay linux didn't detect my ethernet card.
> please tell me what should i do.
> tahnk you for your collaboration.


From cjfields at uiuc.edu  Mon Apr  2 14:10:30 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 13:10:30 -0500
Subject: [Bioperl-l] Fwd: BLAST beta, URLAPI, and BioPerl (RemoteBlast users)
References: <CD04BF03C87B6240A342461CDE1DEC0304091DB4@NIHCESMLBX8.nih.gov>
Message-ID: <002E7937-10DF-43CE-96F6-71DC743C1314@uiuc.edu>

This may be of interest to anyone using RemoteBlast.

For anyone who uses RemoteBlast, the new changes to NCBI's BLAST  
interface shouldn't affect anything (Scott tested it out).  If there  
are any abnormalities with RemoteBlast queries over the next few  
weeks let us know.

chris

Begin forwarded message:

> From: "Mcginnis, Scott \(NIH/NLM/NCBI\) [E]"  
> <mcginnis at ncbi.nlm.nih.gov>
> Date: April 2, 2007 12:53:33 PM CDT
> To: "Chris Fields" <cjfields at uiuc.edu>
> Subject: RE: BLAST beta, URLAPI, and BioPerl
>
> Hi Chris:
>
> We are ready to make the new pages the defaults come April 16th. An  
> announcement is going out shortly. There are some very minor  
> changes to the URL API and I have listed them below. IT will be  
> part of the announcements. Please note we actually tested BioPerl  
> and it seems to me fine with the new pages. If you have a news on  
> your site or a mailing list you might want to pass this on.
>
> A Note About URLAPI
>
> The new BLAST pages support URLAPI, a protocol that scripts and
> programs use to run BLAST searches and retrieve results over
> HTTP. (For more on URLAPI, see
> http://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html). The following
> information only applies to you if you develop or are responsible
> for software that uses URLAPI.
>
> The new pages have been tested and produce correct results with
> the following URLAPI client programs:
>
> * the BioPERL RemoteBlast module
> * the NCBI demo script http://ncbi.nlm.nih.gov/blast/docs/web_blast.pl
> * various scripts used in-house at NCBI
>
> Users of URLAPI should be aware of the following minor
> changes. In the new interface:
>
> 1. The Request ID (RID) format will be shorter.  The new format
>     is 11 alphanumeric characters (e.g. RDEFEA5012) and will have no
>     internal structure. The previous RID format was 36 or more
>     characters long, including punctuation (e.g.,
>     1175172712-21345-42512597310.BLASTQ3).
>
> 2. BLAST reports will show masked regions as lower-case letters
>     by default (see
>     http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W6,
>     figure 2. The current default behavior is to show masked
>     regions as N's or X's. Users may recover the current behavior
>     by adding &MASK_CHAR=0 to the query string for a URLAPI
>     request.
>
> 3. BLAST reports will show alignments for 100 database sequences
>     by default. The current reports show only 50 alignments by
>     default.
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Mon 3/5/2007 11:50 AM
> To: Mcginnis, Scott (NIH/NLM/NCBI) [E]
> Subject: BLAST beta, URLAPI, and BioPerl
>
> The BioPerl project has several have several modules and parsers
> which currently parse XML/text/tabular BLAST output, as well as a
> module which is capable of posting BLAST queries via the URLAPI
> interface.  Will any of the BLAST changes affect these (particularly
> URLAPI)?
>
> Thanks!
>
> chris
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From steletch at jouy.inra.fr  Tue Apr  3 08:28:39 2007
From: steletch at jouy.inra.fr (=?ISO-8859-1?Q?St=E9phane_T=E9letch=E9a?=)
Date: Tue, 03 Apr 2007 14:28:39 +0200
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
Message-ID: <46124877.4020605@jouy.inra.fr>

Alex Lancaster a ?crit :
> Hello bioperl,
> 
> I'm new to the bioperl world, having just started a research position
> in which I need to manage a large bioperl-based codebase.  To this
> end, I'm working on packaging bioperl as an official Fedora Package
> (formerly "Fedora Extras") and I'm currently wading through and
> packaging the long laundry list of Perl dependencies (then I'm going
> to try and do the same for biopython).  You can see my some of my
> progress (including links to the reviews) here:
> 
> http://fedoraproject.org/wiki/AlexLancaster
> 
> Several issues have arisen during the packaging that I hope the
>

Nice, i was on my way to do it :-)
I'm a Mandriva packager and have been kindly "spushed" for maintaining 
the bioperl package for Mandriva.

You can have a look at the work already done by Mandriva at the addresses:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl/current/
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl-run/current/

(Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).

Feel free to contact me if you need more input for dependencies, since 
they are quite a lot.

Cheers,
St?phane

-- 
St?phane T?letch?a, PhD.                  http://www.steletch.org
Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig
INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901

From cjfields at uiuc.edu  Tue Apr  3 10:58:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 3 Apr 2007 09:58:44 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <46124877.4020605@jouy.inra.fr>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
Message-ID: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>

Once these are set up we should add a page to the bioperl wiki to  
describe them in more detail (along with Allen's Biopackages).

chris

On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:

> Alex Lancaster a ?crit :
>> Hello bioperl,
>>
>> I'm new to the bioperl world, having just started a research position
>> in which I need to manage a large bioperl-based codebase.  To this
>> end, I'm working on packaging bioperl as an official Fedora Package
>> (formerly "Fedora Extras") and I'm currently wading through and
>> packaging the long laundry list of Perl dependencies (then I'm going
>> to try and do the same for biopython).  You can see my some of my
>> progress (including links to the reviews) here:
>>
>> http://fedoraproject.org/wiki/AlexLancaster
>>
>> Several issues have arisen during the packaging that I hope the
>>
>
> Nice, i was on my way to do it :-)
> I'm a Mandriva packager and have been kindly "spushed" for maintaining
> the bioperl package for Mandriva.
>
> You can have a look at the work already done by Mandriva at the  
> addresses:
> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- 
> bioperl/current/
> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- 
> bioperl-run/current/
>
> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
>
> Feel free to contact me if you need more input for dependencies, since
> they are quite a lot.
>
> Cheers,
> St?phane
>
> -- 
> St?phane T?letch?a, PhD.                  http://www.steletch.org
> Unit? Math?matique Informatique et G?nome http:// 
> migale.jouy.inra.fr/mig
> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From allenday at gmail.com  Tue Apr  3 13:54:51 2007
From: allenday at gmail.com (Allen Day)
Date: Tue, 3 Apr 2007 10:54:51 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
	<67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
Message-ID: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>

You can link Biopackages now, it's been done for nearly 2 years.

-Allen

On 4/3/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Once these are set up we should add a page to the bioperl wiki to
> describe them in more detail (along with Allen's Biopackages).
>
> chris
>
> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:
>
> > Alex Lancaster a ?crit :
> >> Hello bioperl,
> >>
> >> I'm new to the bioperl world, having just started a research position
> >> in which I need to manage a large bioperl-based codebase.  To this
> >> end, I'm working on packaging bioperl as an official Fedora Package
> >> (formerly "Fedora Extras") and I'm currently wading through and
> >> packaging the long laundry list of Perl dependencies (then I'm going
> >> to try and do the same for biopython).  You can see my some of my
> >> progress (including links to the reviews) here:
> >>
> >> http://fedoraproject.org/wiki/AlexLancaster
> >>
> >> Several issues have arisen during the packaging that I hope the
> >>
> >
> > Nice, i was on my way to do it :-)
> > I'm a Mandriva packager and have been kindly "spushed" for maintaining
> > the bioperl package for Mandriva.
> >
> > You can have a look at the work already done by Mandriva at the
> > addresses:
> > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
> > bioperl/current/
> > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
> > bioperl-run/current/
> >
> > (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
> >
> > Feel free to contact me if you need more input for dependencies, since
> > they are quite a lot.
> >
> > Cheers,
> > St?phane
> >
> > --
> > St?phane T?letch?a, PhD.                  http://www.steletch.org
> > Unit? Math?matique Informatique et G?nome http://
> > migale.jouy.inra.fr/mig
> > INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> > 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Tue Apr  3 14:11:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 3 Apr 2007 13:11:19 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
	<67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
	<5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>
Message-ID: <0802E2EB-5E94-42D2-9CE1-B82DC103A5D1@uiuc.edu>

I added a small piece on Biopackages to the wiki installation page:

http://www.bioperl.org/wiki/Installing_BioPerl

We can move links to RPM (or similar) installations to their own page  
or section in the INSTALL docs when we have time.

chris

On Apr 3, 2007, at 12:54 PM, Allen Day wrote:

> You can link Biopackages now, it's been done for nearly 2 years.
>
> -Allen
>
> On 4/3/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> Once these are set up we should add a page to the bioperl wiki to
>> describe them in more detail (along with Allen's Biopackages).
>>
>> chris
>>
>> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:
>>
>>> Alex Lancaster a ?crit :
>>>> Hello bioperl,
>>>>
>>>> I'm new to the bioperl world, having just started a research  
>>>> position
>>>> in which I need to manage a large bioperl-based codebase.  To this
>>>> end, I'm working on packaging bioperl as an official Fedora Package
>>>> (formerly "Fedora Extras") and I'm currently wading through and
>>>> packaging the long laundry list of Perl dependencies (then I'm  
>>>> going
>>>> to try and do the same for biopython).  You can see my some of my
>>>> progress (including links to the reviews) here:
>>>>
>>>> http://fedoraproject.org/wiki/AlexLancaster
>>>>
>>>> Several issues have arisen during the packaging that I hope the
>>>>
>>>
>>> Nice, i was on my way to do it :-)
>>> I'm a Mandriva packager and have been kindly "spushed" for  
>>> maintaining
>>> the bioperl package for Mandriva.
>>>
>>> You can have a look at the work already done by Mandriva at the
>>> addresses:
>>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
>>> bioperl/current/
>>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
>>> bioperl-run/current/
>>>
>>> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
>>>
>>> Feel free to contact me if you need more input for dependencies,  
>>> since
>>> they are quite a lot.
>>>
>>> Cheers,
>>> St?phane
>>>
>>> --
>>> St?phane T?letch?a, PhD.                  http://www.steletch.org
>>> Unit? Math?matique Informatique et G?nome http://
>>> migale.jouy.inra.fr/mig
>>> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
>>> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Tue Apr  3 18:18:56 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 03 Apr 2007 23:18:56 +0100
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>	<1175258897.2668.21.camel@localhost.localdomain>	<6d648ierkz.fsf@delpy.biol.berkeley.edu>	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
Message-ID: <4612D2D0.7030202@sendu.me.uk>

Chris Fields wrote:
> On Mar 30, 2007, at 11:02 PM, Allen Day wrote:
> 
>> The majority of the Bioperl classes are file parsers, or manipulate
>> data that comes from the file parsers.  Yes there are exceptions like
>> the Eutils and Ensembl-intefacing classes, but they are the minority.
>> The types of files that are worked with are generally either A)
>> primary data sets such as genome data, or B) derivative data, such as
>> sequence alignments that are derived from primary data using an
>> algorithm.
>>
>> If we're in agreement that the primary data sets and
>> libraries/applications for producing derivative data should not be
>> present in Fedora Extras, then it follows that the Bioperl classes for
>> manipulating these primary and derivative data  should also not be
>> present in Fedora Extras as they are of little use without data to
>> manipulate.
>
> I respectfully disagree.

Likewise, but in a slightly different way: for myself and surely many 
others the primary data used either isn't publicly released or isn't in 
some major database and therefore won't be in any kind of repository. 
That doesn't mean I wouldn't want the parser for my files to be 
somewhere convenient.

From bix at sendu.me.uk  Tue Apr  3 18:09:27 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 03 Apr 2007 23:09:27 +0100
Subject: [Bioperl-l] installation bioperl
In-Reply-To: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>
References: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>
Message-ID: <4612D097.9060400@sendu.me.uk>

> On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote:
> 
>> thank you for answer. I will give you the maximum of informations  
>> inorder to be able to diagnostic the problem:
>>
>> i use the linux mandriva 2006
>> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained  
>> from the url:
>> http://www.bioperl.org/wiki/Release_1.5.2
>> afetr that i made these commands which i found in the url
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph  
>> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ')
[snip]
>> i'm trying to install bioperl whithout having internet connection  
>> beacause i don't know whay linux didn't detect my ethernet card.
>> please tell me what should i do.
>> tahnk you for your collaboration.

David's suggestion was a good one, but quite a lot (and possibly all you 
need) of BioPerl is usable just with the bioperl-1.5.2_102.tar.gz file 
you already have.

Just follow the 'hard way' instructions:
http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_MODULES_THE_HARD_WAY

Actually, its not that hard. Just extract the files from the .tat.gz and 
  have your perl lib point at the resulting Bio directory.

From t.r-a_ckright1 at tiscali.co.uk  Wed Apr  4 08:00:12 2007
From: t.r-a_ckright1 at tiscali.co.uk (Michael Pain)
Date: Wed, 4 Apr 2007 13:00:12 +0100
Subject: [Bioperl-l]  Re: read it immediately
Message-ID: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>

I have received three dics but i can not access the files as no ID or pasword was included in the package,I have paid for all this! Can you sort it out.

Regards Michael Pain

From thiago.venancio at gmail.com  Wed Apr  4 14:14:04 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Wed, 4 Apr 2007 15:14:04 -0300
Subject: [Bioperl-l] read it immediately
In-Reply-To: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>
References: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>
Message-ID: <44255ea80704041114pc284522tef2d3a3944763b90@mail.gmail.com>

I think you emailed the wrong list...

On 4/4/07, Michael Pain <t.r-a_ckright1 at tiscali.co.uk> wrote:
>
> I have received three dics but i can not access the files as no ID or
> pasword was included in the package,I have paid for all this! Can you sort
> it out.
>
> Regards Michael Pain
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From gdorjee at hotmail.com  Wed Apr  4 14:17:57 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 11:17:57 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
Message-ID: <9842643.post@talk.nabble.com>


hi all,
can anyone plz help me out with this problem that i've been dealing with for
quite a while now. following is a part of my script that's not working for
some reason. it is suppose to get the sequence from 'result/fasta.faa' and
do the blast.

###my script ###########
......
my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/database/nr',
                                                 _READMETHOD => 'Blast'
                                                  );
$factory->outfile("result/out.blast");
my $blastreport = $factory->blastall($queryin);
.....

when i paste the protein sequence into the textarea of my html page and save
the same as 'result/fasta.faa', so that the above script would do the blast,
i get the following error: 

Software error:
------------- EXCEPTION  -------------
MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
STACK Bio::Tools::Run::StandAloneBlast::blastpgp
/usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
--------------------------------------
i would appreciate your help.
i would also like to add that the 'result/fasta.faa' has the sequence saved
in it.

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9842643
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gowthaman.ramasamy at sbri.org  Wed Apr  4 14:57:09 2007
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Wed, 4 Apr 2007 11:57:09 -0700
Subject: [Bioperl-l] How to patch something in installed bioperl module
Message-ID: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>


Hi List,
I am advised to patch (comment out some lines and add some) GFF.pm bioperl module.
How do i go about it?.
I have the latest Bioperl 1.5.2 version installed....via CPAN

I find GFF.pm in the following location...
/root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm


Do i have to recompile it after editing........
I am completely clue less......I have not done this earlier.....
Can any one help me to do this.

Many thanks in advance........

gowthaman


From dmessina at wustl.edu  Wed Apr  4 15:42:43 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 4 Apr 2007 14:42:43 -0500
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>

The code snippet worked fine for me. I believe the problem is that  
'result/fasta.faa' is not getting passed to your code properly. You  
might try specifying a complete path to your input and output file --  
relative paths, especially through a web app, can be tricky.

> when i paste the protein sequence into the textarea of my html page  
> and save
> the same as 'result/fasta.faa', so that the above script would do  
> the blast,

I'm not sure from what you wrote -- did you try running your script  
on the command line (having created 'result/fasta.faa' manually  
first)? If that is working for you, then the problem is with getting  
the data from the webpage into the script, not with the blasting part.

Dave

This is what I did:

  % ls test.pl testp*
test.pl       testp.fa

% formatdb -i testp.fa

% ls test.pl testp*
test.pl       testp.fa      testp.fa.phr  testp.fa.pin  testp.fa.psq

% perl test.pl testp.fa
%  head -10 out.blast
BLASTP 2.2.10 [Oct-19-2004]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens]
          (235 letters)


Your code: I changed only the input filename and the input database  
name, and saved the script as test.pl
-----------------------
#!/usr/bin/perl

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' =>
'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
'blastp',
                                                  'database' =>
'testp.fa',
                                                  _READMETHOD => 'Blast'
                                                   );
$factory->outfile("out.blast");
my $blastreport = $factory->blastall($queryin);
------------------------------------------------------------------------ 
-----------

From gdorjee at hotmail.com  Wed Apr  4 17:44:27 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 14:44:27 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>
References: <9842643.post@talk.nabble.com>
	<35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>
Message-ID: <9846257.post@talk.nabble.com>


Thanks for your reply Dave. I don't think that there's anything wrong with
the open(OUTPUT,">result/fasta.faa"); line as I could get the 'fasta.faa'
file with the sequence in it. I see it. It looks like the blast is not being
able to read from the result/fasta.faa. 
^ ^* 


Dave Messina-2 wrote:
> 
> The code snippet worked fine for me. I believe the problem is that  
> 'result/fasta.faa' is not getting passed to your code properly. You  
> might try specifying a complete path to your input and output file --  
> relative paths, especially through a web app, can be tricky.
> 
>> when i paste the protein sequence into the textarea of my html page  
>> and save
>> the same as 'result/fasta.faa', so that the above script would do  
>> the blast,
> 
> I'm not sure from what you wrote -- did you try running your script  
> on the command line (having created 'result/fasta.faa' manually  
> first)? If that is working for you, then the problem is with getting  
> the data from the webpage into the script, not with the blasting part.
> 
> Dave
> 
> This is what I did:
> 
>   % ls test.pl testp*
> test.pl       testp.fa
> 
> % formatdb -i testp.fa
> 
> % ls test.pl testp*
> test.pl       testp.fa      testp.fa.phr  testp.fa.pin  testp.fa.psq
> 
> % perl test.pl testp.fa
> %  head -10 out.blast
> BLASTP 2.2.10 [Oct-19-2004]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens]
>           (235 letters)
> 
> 
> Your code: I changed only the input filename and the input database  
> name, and saved the script as test.pl
> -----------------------
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> use Bio::SeqIO;
> use Bio::Tools::Run::StandAloneBlast;
> 
> my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' =>
> 'Fasta');
> my $queryin = $Seq_in->next_seq();
> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
> 'blastp',
>                                                   'database' =>
> 'testp.fa',
>                                                   _READMETHOD => 'Blast'
>                                                    );
> $factory->outfile("out.blast");
> my $blastreport = $factory->blastall($queryin);
> ------------------------------------------------------------------------ 
> -----------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9846257
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 20:17:10 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 10:17:10 +1000
Subject: [Bioperl-l] How to patch something in installed bioperl module
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>
Message-ID: <a79f6a4b0704041717q160be28eu472d32d3cd704eba@mail.gmail.com>

> I am advised to patch (comment out some lines and add some) GFF.pm bioperl module.
> How do i go about it?.

First, make a backup of the original file.
Then just edit the original (add/remove lines).

> I have the latest Bioperl 1.5.2 version installed....via CPAN
> I find GFF.pm in the following location...
> /root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm

This is not where it is installed. That is where the CPAN program
uncompressed it to before installing. It is more likely in a directory
like this:
/usr/lib/perl5/site_perl/5.8.5/Bio/Tools/GFF.pm
But it depends on how your Perl setup arranges things!

> Do i have to recompile it after editing........

No.

--Torsten

From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 20:22:37 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 10:22:37 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>

> Software error:
> ------------- EXCEPTION  -------------
> MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50

> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => 'Fasta');

Does this still happen if you give the full path to the FASTA file?
eg. -file => /usr/local/apache2/htdocs/result/fasta.faa
(I'm guessing what the full path is here)

--Torsten

From gilbertd at cricket.bio.indiana.edu  Wed Apr  4 20:59:23 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Wed, 4 Apr 2007 19:59:23 -0500 (EST)
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
Message-ID: <200704050059.l350xNF07452@cricket.bio.indiana.edu>


Dear Bioperl list,

There is a small bug in what I think is the current Bio::Tools::GFF.pm,
that blocks output of Target attributes (in gff3 at least).  See a patch
here

http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF

-- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/

From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 21:34:17 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 11:34:17 +1000
Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports
Message-ID: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>

Dear all,

I have been migrating all our BLAST infrastructure to use the XML
output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
before, and encountered some issues I hope you can help me with:

1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
Bio::Search::Result::GenericResult object. This means I can not use
the PSI-BLAST functions like iterations() and psiblast() provided by
Bio::Search::Result::BlastResult. I'm guessing this is because the the
XML output reports itself as a plain BLASTP output:
<BlastOutput_program>blastp</BlastOutput_program>

How do I determine if it is a PSI-BLAST report?

2. Usually a PSI-BLAST report has multiple Iterations. The XML output
has <Iteration> tags but it took me a while to figure out that these
get mapped to Bio::SearchIO::Result objects accessible via
Bio::SearchIO->next_result().

Is this the proper way to process the iterations?

3. I also notice that only the first result (iteration) has the
query_name set. Subsequent ones are empty:
RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
query=MyProtein , db=uniprot_sprot
RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
, db=uniprot_sprot

Is this a bug or expected?

I'm guessing a lot of these problems are simply due to limitations of
the NCBI BLAST XML DTD?

--Torsten

From gdorjee at hotmail.com  Wed Apr  4 20:59:08 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 17:59:08 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>
Message-ID: <9848412.post@talk.nabble.com>


hi Torsten,
Yes, it still gives me the same error even if I give the full path to the
fasta file. Following is how I did: 

####### part of my script #######
my $Seq_in = Bio::SeqIO->new (-file =>
'/export/home/local/apache2/htdocs/result/fasta.faa', -format => 'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/dorjee/database/nrpart',
                                                 _READMETHOD => 'Blast'
                                                   );
$factory->outfile("/export/home/local/apache2/htdocs/result/out.blast");
my $blastreport = $factory->blastall($queryin);
....

thanks man.


Torsten Seemann wrote:
> 
>> Software error:
>> ------------- EXCEPTION  -------------
>> MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
>> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
> 
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
>> 'Fasta');
> 
> Does this still happen if you give the full path to the FASTA file?
> eg. -file => /usr/local/apache2/htdocs/result/fasta.faa
> (I'm guessing what the full path is here)
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9848412
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 22:57:09 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 12:57:09 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>

DeeGee,

Please add the following lines to help deduce the problem:

> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
> 'Fasta');

die "could not open fasta" if not defined $Seq_in;

> my $queryin = $Seq_in->next_seq();

die "could not get seq" if not defined $queryin;

Does anything happen now?

...

Some other comments:

> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
> STACK Bio::Tools::Run::StandAloneBlast::blastpgp

I'm not sure why it is in the blastpgp() method when you chose
$factory->blastall() ?

>                                                  _READMETHOD => 'Blast'

I don't think this is required anymore in modern Bioperl. Are you
using 1.5.x or bioperl-live ?

> when i paste the protein sequence into the textarea of my html page and
> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50

So this is a CGI script?
Does the script run as user 'apache' or 'httpd', or as yourself via SuEXEC?
Does 'apache' have permissions to READ/WRITE the result/ directory?

--Torsten

From cjfields at uiuc.edu  Thu Apr  5 00:14:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Apr 2007 23:14:46 -0500
Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports
In-Reply-To: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>
References: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>
Message-ID: <8EA4D933-9B99-485E-9CEA-AB39297F90B4@uiuc.edu>

On Apr 4, 2007, at 8:34 PM, Torsten Seemann wrote:

> Dear all,
>
> I have been migrating all our BLAST infrastructure to use the XML
> output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
> in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
> before, and encountered some issues I hope you can help me with:
>
> 1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
> Bio::Search::Result::GenericResult object. This means I can not use
> the PSI-BLAST functions like iterations() and psiblast() provided by
> Bio::Search::Result::BlastResult. I'm guessing this is because the the
> XML output reports itself as a plain BLASTP output:
> <BlastOutput_program>blastp</BlastOutput_program>
>
> How do I determine if it is a PSI-BLAST report?

I don't know if you can very easily, though I haven't tried myself.   
If I remember correctly there wasn't a substantial difference in the  
XML output between regular BLAST XML and PSI-BLAST XML.  We could add  
a parameter to the parser to treat the report as PSI-BLAST.

> 2. Usually a PSI-BLAST report has multiple Iterations. The XML output
> has <Iteration> tags but it took me a while to figure out that these
> get mapped to Bio::SearchIO::Result objects accessible via
> Bio::SearchIO->next_result().
>
> Is this the proper way to process the iterations?

The problem is in the way that NCBI now outputs multiple-query BLAST  
XML reports, which apparently changed sometime in the last year w/o  
notice.  This was also a problem with other Bio* parsers (I remember  
seeing something about it on the BioPython list).  Previously  
multiquery BLAST requests were output like single XML reports  
concatenated together, each with their own XML declaration, etc.  Now  
they are treated like iterations (query 1 = iteration 1, query 2 =  
iteration 2, etc) all in one long BLAST report.  There's an example  
of one in the SearchIO tests which I added to CVS in Jan-Feb,  
post-1.5.2.  The current parser handles both old and new cases.

The current behavior of the parser is to parse everything up front,  
building up the ResultI's and then returning them one-by-one upon  
next_result(), which is horrible on memory if you have tons of XML to  
wade through.  I will probably change that to carve the data up into  
report-sized chunks of XML and parse them piecemeal, but I haven't  
had time to work on it yet.

> 3. I also notice that only the first result (iteration) has the
> query_name set. Subsequent ones are empty:
> RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
> query=MyProtein , db=uniprot_sprot
> RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
> , db=uniprot_sprot
>
> Is this a bug or expected?

If you are using 1.5.2 then there is a bug related to that which was  
fixed in CVS a few months back (related to the multiquery issue  
above).  If it isn't let me know.

> I'm guessing a lot of these problems are simply due to limitations of
> the NCBI BLAST XML DTD?
>
> --Torsten

To tell the truth I'm not sure.  One would think they could add some  
designation to the report for PSI-BLAST!

chris

From cjfields at uiuc.edu  Thu Apr  5 13:40:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 12:40:41 -0500
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
Message-ID: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>

Roy Chaudhuri has raised an interesting question in a bug report  
filed regarding 'bless'-ing objects into another (similar) class.   
The bug report on this is here:

http://bugzilla.open-bio.org/show_bug.cgi?id=2262

The following code (from the bug report) illustrates the problem.   
Note some of this is taken from the Bio::Seq::Meta::Array POD, though  
the example sequence object is a LocatableSeq (PrimarySeqI) and not a  
SeqI:

use Bio::SeqIO;
use Bio::Seq::Meta::Array;
# $seq isa Bio::SeqI
my $seq=Bio::SeqIO->new(-fh=>\*ARGV, -format=>'genbank')->next_seq;
# $seq is still a Bio::SeqI
bless $seq, 'Bio::Seq::Meta::Array';
Bio::SeqIO->new(-format=>'genbank')->write_seq($seq);

This produces sequence output missing sequence data, a definition,  
and other odds and ends.  $seq is first a Bio::Seq::RichSeq and is  
blessed into a Bio::Seq::Meta::Array; both times $seq remains  
Bio::SeqI.  However, Bio::Seq::Meta::Array has an odd inheritance  
tree which also makes it a Bio::PrimarySeqI and a Bio::Seq::MetaI (ick):

use base qw(Bio::LocatableSeq Bio::Seq Bio::Seq::MetaI);

Bio::LocatableSeq has a seq() method inherited from Bio::PrimarySeq,  
for instance, so using $seq->seq() invokes Bio::PrimarySeq::seq()  
instead of Bio::Seq::seq().  No problem in most cases as long as  
PrimarySeqI is blessed into another PrimarySeqI, but if one blesses a  
Bio::SeqI into a Bio::Seq::Meta::Array (as in the example) then  
PrimarySeq::seq() expects a raw sequence and gets none (since the  
data is stored internally as a PrimarySeq in a different location)  
and no sequence is output.  This happens similarly for other stored  
object data.

I'm not sure why Bio::Seq::Meta::Array is set up this way.  Do we  
want to support using 'bless $obj, Class' with Bio::SeqI/PrimarySeqI,  
or should Bio::Seq::Meta::Array be changed so that it follows one  
interface or the other?

chris

From hlapp at gmx.net  Thu Apr  5 14:27:39 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 5 Apr 2007 14:27:39 -0400
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
In-Reply-To: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
Message-ID: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>


On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:

> Do we want to support using 'bless $obj, Class'

This smacks of over-clever programming and looks like a sure way to  
obfuscate what you're doing. I'm not sure why we need to support this  
construct.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Apr  5 14:44:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 13:44:38 -0500
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
In-Reply-To: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
	<421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
Message-ID: <F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>

I tend to agree on that front as it seems too prone to subtle issues  
with inheritance (as the bug demonstrates).

Related to that, do we want to have Bio::Seq::Meta::Array implement  
either PrimarySeqI or SeqI?  Having it implement both is definitely  
not working as expected.

chris

On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote:

>
> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:
>
>> Do we want to support using 'bless $obj, Class'
>
> This smacks of over-clever programming and looks like a sure way to  
> obfuscate what you're doing. I'm not sure why we need to support  
> this construct.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mkiwala at watson.wustl.edu  Thu Apr  5 15:11:22 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Thu, 05 Apr 2007 14:11:22 -0500
Subject: [Bioperl-l] Mixed bless-ings with
	Bio::Seq/Bio::PrimarySeq	(Bio::Seq::Meta::Array)
In-Reply-To: <F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>	<421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
	<F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>
Message-ID: <461549DA.90709@watson.wustl.edu>

My vote is for SeqI.

I was using the SeqWithQuality class and more recently switched over to 
Bio::Seq::Quality as we are upgrading from 1.4 to 1.5.2. The sequences 
I'm working with are destined for GenBank and have features and quality 
values. I've written a module (that I call GenBank::Tbl2Asn) that 
accepts a Bio::Seq::Quality with features and runs tbl2asn on it to 
produce a file that we send to GenBank. I don't know of any other class 
that suites my needs better than Bio::Seq::Quality inheriting from 
Bio::SeqI.


Chris Fields wrote:
> I tend to agree on that front as it seems too prone to subtle issues  
> with inheritance (as the bug demonstrates).
>
> Related to that, do we want to have Bio::Seq::Meta::Array implement  
> either PrimarySeqI or SeqI?  Having it implement both is definitely  
> not working as expected.
>
> chris
>
> On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote:
>
>   
>> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:
>>
>>     
>>> Do we want to support using 'bless $obj, Class'
>>>       
>> This smacks of over-clever programming and looks like a sure way to  
>> obfuscate what you're doing. I'm not sure why we need to support  
>> this construct.
>>
>> 	-hilmar
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   

From gdorjee at hotmail.com  Thu Apr  5 17:09:14 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Thu, 5 Apr 2007 14:09:14 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
Message-ID: <9864004.post@talk.nabble.com>


Thanks again, Torsten. I tried (die "could not get seq" if not defined
$queryin;) as you suggested, and now I get the following error message:

Software error:
could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.

Does this mean that next_seq() method in 'my $queryin =
$Seq_in->next_seq();' has some problem? How can I fix it? I would appreciate
your help.
Cheers!


Torsten Seemann wrote:
> 
> DeeGee,
> 
> Please add the following lines to help deduce the problem:
> 
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
>> 'Fasta');
> 
> die "could not open fasta" if not defined $Seq_in;
> 
>> my $queryin = $Seq_in->next_seq();
> 
> die "could not get seq" if not defined $queryin;
> 
> Does anything happen now?
> 
> ...
> 
> Some other comments:
> 
>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>
>> 'blastp',
>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
> 
> I'm not sure why it is in the blastpgp() method when you chose
> $factory->blastall() ?
> 
>>                                                  _READMETHOD => 'Blast'
> 
> I don't think this is required anymore in modern Bioperl. Are you
> using 1.5.x or bioperl-live ?
> 
>> when i paste the protein sequence into the textarea of my html page and
>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
> 
> So this is a CGI script?
> Does the script run as user 'apache' or 'httpd', or as yourself via
> SuEXEC?
> Does 'apache' have permissions to READ/WRITE the result/ directory?
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9864004
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Thu Apr  5 19:32:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 18:32:55 -0500
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9864004.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
Message-ID: <3ED7F1E9-FE21-4796-99AC-0CD0EA418563@uiuc.edu>


On Apr 5, 2007, at 4:09 PM, DeeGee wrote:

>
> Thanks again, Torsten. I tried (die "could not get seq" if not defined
> $queryin;) as you suggested, and now I get the following error  
> message:
>
> Software error:
> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
>
> Does this mean that next_seq() method in 'my $queryin =
> $Seq_in->next_seq();' has some problem? How can I fix it? I would  
> appreciate
> your help.
> Cheers!

This indicates there is likely some problem with your sequence file  
(either it isn't fasta or something else is wrong), but w/o actually  
seeing it we can't be sure.  I can't be sure but I don't think it is  
a next_seq() issue.  Also, if there are problems accessing the file  
the stream object should throw an error so I don't think it is that  
either...

chris

>
> Torsten Seemann wrote:
>>
>> DeeGee,
>>
>> Please add the following lines to help deduce the problem:
>>
>>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '- 
>>> format' =>
>>> 'Fasta');
>>
>> die "could not open fasta" if not defined $Seq_in;
>>
>>> my $queryin = $Seq_in->next_seq();
>>
>> die "could not get seq" if not defined $queryin;
>>
>> Does anything happen now?
>>
>> ...
>>
>> Some other comments:
>>
>>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>
>>> 'blastp',
>>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
>>
>> I'm not sure why it is in the blastpgp() method when you chose
>> $factory->blastall() ?
>>
>>>                                                  _READMETHOD =>  
>>> 'Blast'
>>
>> I don't think this is required anymore in modern Bioperl. Are you
>> using 1.5.x or bioperl-live ?
>>
>>> when i paste the protein sequence into the textarea of my html  
>>> page and
>>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
>>
>> So this is a CGI script?
>> Does the script run as user 'apache' or 'httpd', or as yourself via
>> SuEXEC?
>> Does 'apache' have permissions to READ/WRITE the result/ directory?
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9864004
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From torsten.seemann at infotech.monash.edu.au  Thu Apr  5 20:40:32 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 6 Apr 2007 10:40:32 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9864004.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
Message-ID: <a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>

Dorjee,

> thanks alot for your reply again. as per your suggestion (using 'die "could
> not get seq" if not defined $queryin;'), i now get the following error
> message:
> Software error:
> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
> i've attached the script. could you plz have a look at it and see where am i
> going wrong.
> cheers mate!

This strongly suggests that your FASTA file is not actually in FASTA format.
http://en.wikipedia.org/wiki/Fasta_format

Does it work if you pass it to blastall on the command line?
eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr

> Saier Lab.
> 858-534-2457

Are you working at UCSD?

--Torsten

From gdorjee at hotmail.com  Thu Apr  5 23:26:16 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Thu, 5 Apr 2007 20:26:16 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
Message-ID: <9867402.post@talk.nabble.com>


hi Torsten,  
blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works
perfectly fine on the command line, and the 'fasta.faa' is in fasta format:

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

it seems like i'm just one bloody step away from success. ^ ^* can't figure
out the prob. 
thanks for your help.


Torsten Seemann wrote:
> 
> Dorjee,
> 
>> thanks alot for your reply again. as per your suggestion (using 'die
>> "could
>> not get seq" if not defined $queryin;'), i now get the following error
>> message:
>> Software error:
>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
>> i've attached the script. could you plz have a look at it and see where
>> am i
>> going wrong.
>> cheers mate!
> 
> This strongly suggests that your FASTA file is not actually in FASTA
> format.
> http://en.wikipedia.org/wiki/Fasta_format
> 
> Does it work if you pass it to blastall on the command line?
> eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
> 
>> Saier Lab.
>> 858-534-2457
> 
> Are you working at UCSD?
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9867402
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From tuco at pasteur.fr  Fri Apr  6 09:33:08 2007
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Fri, 06 Apr 2007 15:33:08 +0200
Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior
Message-ID: <46164C14.8040701@pasteur.fr>

Hi folks,

I have a strange behavior from Bio::SeqIO::embl.
When I read an EMBL file as an input and write to another one, the tags
in the output file (EMBL format) are not in the same order as the original
file.
Is it a normal and expecting result ?

I anyone want to test it as a perl on line here is the code :

perl -MBio::SeqIO -e '$i = Bio::SeqIO->new(-file => "file.embl", -format 
=> "EMBL"); $o = Bio::SeqIO->new(-file => ">new.embl", -format => 
"EMBL"); while($e = $i->next_seq()){ $o->write_seq($e);  }'

I checked in the embl.pm code but was enable to find where this behavior 
came from.

If someone has the solution or any clue.

Thanks

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Softwares and data banks
Pasteur Insititue
tuco at_ pasteur dot fr	
-------------------------


From dmessina at wustl.edu  Fri Apr  6 11:09:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Fri, 6 Apr 2007 10:09:51 -0500
Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior
In-Reply-To: <46164C14.8040701@pasteur.fr>
References: <46164C14.8040701@pasteur.fr>
Message-ID: <7C67D287-DE2A-488A-8636-01EFF468368D@wustl.edu>

> Is it a normal and expecting result ?

Yes, unfortunately. Due to the complexity of the parsing, it is  
surprisingly difficult to "round-trip" some sequence file formats.

http://www.bioperl.org/wiki/HOWTO:SeqIO#Caveats


Dave

From jason at bioperl.org  Fri Apr  6 11:42:41 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 6 Apr 2007 08:42:41 -0700
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9867402.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
Message-ID: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>

When/How are are you writing your sequences to this file result.faa?   
are you using seqIO or bioperl to write the sequence  to a file?
I'm wondering if this is I/O buffering problem.

On Apr 5, 2007, at 8:26 PM, DeeGee wrote:

>
> hi Torsten,
> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr  
> works
> perfectly fine on the command line, and the 'fasta.faa' is in fasta  
> format:
>
>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV 
> SPSMTVASSQ
> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA 
> GTAPGAEGPA
> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF 
> RRKEHLRRHR
> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH 
> QRIHGRAAAS
> AQGAVAPGPDGGGPFPPWPLG
>
> it seems like i'm just one bloody step away from success. ^ ^*  
> can't figure
> out the prob.
> thanks for your help.
>
>
> Torsten Seemann wrote:
>>
>> Dorjee,
>>
>>> thanks alot for your reply again. as per your suggestion (using 'die
>>> "could
>>> not get seq" if not defined $queryin;'), i now get the following  
>>> error
>>> message:
>>> Software error:
>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl  
>>> line 50.
>>> i've attached the script. could you plz have a look at it and see  
>>> where
>>> am i
>>> going wrong.
>>> cheers mate!
>>
>> This strongly suggests that your FASTA file is not actually in FASTA
>> format.
>> http://en.wikipedia.org/wiki/Fasta_format
>>
>> Does it work if you pass it to blastall on the command line?
>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ 
>> database/nr
>>
>>> Saier Lab.
>>> 858-534-2457
>>
>> Are you working at UCSD?
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9867402
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0001.bin 

From bernd.web at gmail.com  Fri Apr  6 14:00:18 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 6 Apr 2007 20:00:18 +0200
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
Message-ID: <716af09c0704061100n1555915bw18050639d25cbf89@mail.gmail.com>

Hi Dorjee,

Do you now use complete file paths everywhere (instead of some
relative paths that were in your script).  Did you check all read and
execute permission (turn r, x on for group and others)? And regarding
the fasta file I'd suggest closing the filehandle after you printed
the fasta sequence to the file.

open(OUTPUT,">result/fasta.faa"); #don't use this relative path and
use the "die" as was suggested earlier.
.... your other code lines
print OUTPUT
"$desc\n$seqo\n";
close(OUTPUT); #close the file.

Also check if your complete script runs from the command-line as to be
sure your problems are not related to the webserver enviroment.


BTW I do think you do not want to parse your fasta file like you do:
if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;}
$fasta_file=~s/[\n\r]//g;
if ($fasta_file =~ /([A-Z]{10}.+)/){$seqo=$1;}

$seqo will contain the description as well, so your sequence starts
with the description.
BioPerl provides code for fasta file parsing too ;-) If you really
want to stick to your code you can catch the $desc and $seqo in one
RegExp, or replace this line:
if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;}
with
if ($fasta_file =~ s/^(\>.+)\s+//){$desc=$1;}


I hope you will get your script working now.

Regards,
Bernd

On 4/6/07, Jason Stajich <jason at bioperl.org> wrote:
> When/How are are you writing your sequences to this file result.faa?  are
> you using seqIO or bioperl to write the sequence  to a file?
> I'm wondering if this is I/O buffering problem.
>
>
>
> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>
>
> hi Torsten,
> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works
> perfectly fine on the command line, and the 'fasta.faa' is in fasta format:
>
>
> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
> AQGAVAPGPDGGGPFPPWPLG
>
> it seems like i'm just one bloody step away from success. ^ ^* can't figure
> out the prob.
> thanks for your help.
>
>
> Torsten Seemann wrote:
>
> Dorjee,
>
>
> thanks alot for your reply again. as per your suggestion (using 'die
> "could
> not get seq" if not defined $queryin;'), i now get the following error
> message:
> Software error:
> could not get seq at
> /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
> i've attached the script. could you plz have a look at it and see where
> am i
> going wrong.
> cheers mate!
>
> This strongly suggests that your FASTA file is not actually in FASTA
> format.
> http://en.wikipedia.org/wiki/Fasta_format
>
> Does it work if you pass it to blastall on the command line?
> eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>
>
> Saier Lab.
> 858-534-2457
>
> Are you working at UCSD?
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/blastall-problem-tf3527412.html#a9867402
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.htmlhttp://fungalgenomes.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From gdorjee at hotmail.com  Fri Apr  6 13:39:38 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Fri, 6 Apr 2007 10:39:38 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
Message-ID: <9875685.post@talk.nabble.com>


Following is the part of my script, which is in the 'htdocs' directory:

####### part of my script #############
#generate a new CGI object from the input to the CGI script
my $query=new CGI;

open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");

print STDOUT $query->header();
print STDOUT $query->start_html(-title=>"Response from blast",
-BGCOLOR=>"#FFFFFF");
print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";

#gets the sequence from the html textarea with ?post? method
my $fasta_file=$query->param('sequence');
print OUTPUT $fasta_file;

#Local blast of the input sequence against nr database
my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
'Fasta');
die "could not open fasta" if not defined $Seq_in;
my $queryin = $Seq_in->next_seq();
die "could not get seq" if not defined $queryin;
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/dorjee/database/nr',
                                                 _READMETHOD => 'Blast'
                                                   );
$factory->outfile("result/out.blast");
my $blastreport = $factory->blastall($queryin);
.....

Thank you.


Jason Stajich-3 wrote:
> 
> When/How are are you writing your sequences to this file result.faa?   
> are you using seqIO or bioperl to write the sequence  to a file?
> I'm wondering if this is I/O buffering problem.
> 
> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
> 
>>
>> hi Torsten,
>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr  
>> works
>> perfectly fine on the command line, and the 'fasta.faa' is in fasta  
>> format:
>>
>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV 
>> SPSMTVASSQ
>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA 
>> GTAPGAEGPA
>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF 
>> RRKEHLRRHR
>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH 
>> QRIHGRAAAS
>> AQGAVAPGPDGGGPFPPWPLG
>>
>> it seems like i'm just one bloody step away from success. ^ ^*  
>> can't figure
>> out the prob.
>> thanks for your help.
>>
>>
>> Torsten Seemann wrote:
>>>
>>> Dorjee,
>>>
>>>> thanks alot for your reply again. as per your suggestion (using 'die
>>>> "could
>>>> not get seq" if not defined $queryin;'), i now get the following  
>>>> error
>>>> message:
>>>> Software error:
>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl  
>>>> line 50.
>>>> i've attached the script. could you plz have a look at it and see  
>>>> where
>>>> am i
>>>> going wrong.
>>>> cheers mate!
>>>
>>> This strongly suggests that your FASTA file is not actually in FASTA
>>> format.
>>> http://en.wikipedia.org/wiki/Fasta_format
>>>
>>> Does it work if you pass it to blastall on the command line?
>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ 
>>> database/nr
>>>
>>>> Saier Lab.
>>>> 858-534-2457
>>>
>>> Are you working at UCSD?
>>>
>>> --Torsten
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/blastall- 
>> problem-tf3527412.html#a9867402
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
> 
> 
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9875685
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Fri Apr  6 14:40:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 6 Apr 2007 11:40:42 -0700
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9875685.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
	<9875685.post@talk.nabble.com>
Message-ID: <A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>

Looks like you need to deal with buffering:

http://perl.plover.com/FAQs/Buffering.html

So you need to add this:
close(OUTPUT);

Alternatively you can build a sequence object and pass that in to the  
BLAST factory, then you don't have to mess around with creating  
temporary files or run into this sort of problem.

-jason
On Apr 6, 2007, at 10:39 AM, DeeGee wrote:

>
> Following is the part of my script, which is in the 'htdocs'  
> directory:
>
> ####### part of my script #############
> #generate a new CGI object from the input to the CGI script
> my $query=new CGI;
>
> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");
>
> print STDOUT $query->header();
> print STDOUT $query->start_html(-title=>"Response from blast",
> -BGCOLOR=>"#FFFFFF");
> print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";
>
> #gets the sequence from the html textarea with ?post? method
> my $fasta_file=$query->param('sequence');
> print OUTPUT $fasta_file;
>
close(OUTPUT);
> #Local blast of the input sequence against nr database
> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
> 'Fasta');
> die "could not open fasta" if not defined $Seq_in;
> my $queryin = $Seq_in->next_seq();
> die "could not get seq" if not defined $queryin;
> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
> 'blastp',
>                                                  'database' =>
> '/export/home/dorjee/database/nr',
>                                                  _READMETHOD =>  
> 'Blast'
>                                                    );
> $factory->outfile("result/out.blast");
> my $blastreport = $factory->blastall($queryin);
> .....
>
> Thank you.
>
>
>
> Jason Stajich-3 wrote:
>>
>> When/How are are you writing your sequences to this file result.faa?
>> are you using seqIO or bioperl to write the sequence  to a file?
>> I'm wondering if this is I/O buffering problem.
>>
>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>>
>>>
>>> hi Torsten,
>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>>> works
>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta
>>> format:
>>>
>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA 
>>> SV
>>> SPSMTVASSQ
>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP 
>>> LA
>>> GTAPGAEGPA
>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK 
>>> AF
>>> RRKEHLRRHR
>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL 
>>> RH
>>> QRIHGRAAAS
>>> AQGAVAPGPDGGGPFPPWPLG
>>>
>>> it seems like i'm just one bloody step away from success. ^ ^*
>>> can't figure
>>> out the prob.
>>> thanks for your help.
>>>
>>>
>>> Torsten Seemann wrote:
>>>>
>>>> Dorjee,
>>>>
>>>>> thanks alot for your reply again. as per your suggestion (using  
>>>>> 'die
>>>>> "could
>>>>> not get seq" if not defined $queryin;'), i now get the following
>>>>> error
>>>>> message:
>>>>> Software error:
>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl
>>>>> line 50.
>>>>> i've attached the script. could you plz have a look at it and see
>>>>> where
>>>>> am i
>>>>> going wrong.
>>>>> cheers mate!
>>>>
>>>> This strongly suggests that your FASTA file is not actually in  
>>>> FASTA
>>>> format.
>>>> http://en.wikipedia.org/wiki/Fasta_format
>>>>
>>>> Does it work if you pass it to blastall on the command line?
>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/
>>>> database/nr
>>>>
>>>>> Saier Lab.
>>>>> 858-534-2457
>>>>
>>>> Are you working at UCSD?
>>>>
>>>> --Torsten
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/blastall-
>>> problem-tf3527412.html#a9867402
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Miller Research Fellow
>> University of California, Berkeley
>> lab: 510.642.8441
>> http://pmb.berkeley.edu/~taylor/people/js.html
>> http://fungalgenomes.org/
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9875685
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment.bin 

From MEC at stowers-institute.org  Fri Apr  6 16:34:37 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 6 Apr 2007 15:34:37 -0500
Subject: [Bioperl-l] Bio/DB/SeqFeature/Store/DBI/mysql.pm patched
Message-ID: <CED81D34E37D5043A1211565277A51E507E22BAF@exchkc02.stowers-institute.org>

Lincoln,

I just commited a patch to Bio/DB/SeqFeature/Store/DBI/mysql.pm which
avoids potential problem which, unless fixed, can generates warnings
that look like this:

prepare_cached(SELECT f.id,f.object
  FROM feature as f, typelist AS tl
  WHERE (   tl.id=f.typeid
   AND   (tl.tag LIKE ?)
)
  
) statement handle DBI::st=HASH(0x16f61c0) still Active at
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line
1427
DBD::mysql::st fetchrow_array failed: fetch() without execute() at
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line
1416.

... as well as other downstream abberent program behaviour.  

I encounterd what the DBI manpage suggests might happen: "The results
will certainly not be what you expect"

This can happen, for example, when you open an iterator using
Bio::DB::SeqFeature::Store->get_seq_stream, and then while iterating,
perform other queries against the store.  My understanding of the DBI
doc is that this should only occur if the 2nd iterator is for the same
sql statement identically parameterized as the 1st, but I have not
proven beyond a doubt that this is what Bio::DB::SeqFeature::Store is
doing the way I am using it.  Nonetheless, the patch fixes my pipeline.

Cheers,

Malcolm


From gdorjee at hotmail.com  Fri Apr  6 18:27:54 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Fri, 6 Apr 2007 15:27:54 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
	<9875685.post@talk.nabble.com>
	<A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>
Message-ID: <9879110.post@talk.nabble.com>


I added the line: 
close(OUTPUT);
and now following error comes up, where 'out.blast' is supposed to be the
blast result file, but it not being created. 

Software error:
------------- EXCEPTION  -------------
MSG: Could not open /export/home/dorjee/result/out.blast: No such file or
directory
STACK Bio::Root::IO::_initialize_io /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:167
STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:53

--------------------------------------


Jason Stajich-3 wrote:
> 
> Looks like you need to deal with buffering:
> 
> http://perl.plover.com/FAQs/Buffering.html
> 
> So you need to add this:
> close(OUTPUT);
> 
> Alternatively you can build a sequence object and pass that in to the  
> BLAST factory, then you don't have to mess around with creating  
> temporary files or run into this sort of problem.
> 
> -jason
> On Apr 6, 2007, at 10:39 AM, DeeGee wrote:
> 
>>
>> Following is the part of my script, which is in the 'htdocs'  
>> directory:
>>
>> ####### part of my script #############
>> #generate a new CGI object from the input to the CGI script
>> my $query=new CGI;
>>
>> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");
>>
>> print STDOUT $query->header();
>> print STDOUT $query->start_html(-title=>"Response from blast",
>> -BGCOLOR=>"#FFFFFF");
>> print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";
>>
>> #gets the sequence from the html textarea with ?post? method
>> my $fasta_file=$query->param('sequence');
>> print OUTPUT $fasta_file;
>>
> close(OUTPUT);
>> #Local blast of the input sequence against nr database
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
>> 'Fasta');
>> die "could not open fasta" if not defined $Seq_in;
>> my $queryin = $Seq_in->next_seq();
>> die "could not get seq" if not defined $queryin;
>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
>> 'blastp',
>>                                                  'database' =>
>> '/export/home/dorjee/database/nr',
>>                                                  _READMETHOD =>  
>> 'Blast'
>>                                                    );
>> $factory->outfile("result/out.blast");
>> my $blastreport = $factory->blastall($queryin);
>> .....
>>
>> Thank you.
>>
>>
>>
>> Jason Stajich-3 wrote:
>>>
>>> When/How are are you writing your sequences to this file result.faa?
>>> are you using seqIO or bioperl to write the sequence  to a file?
>>> I'm wondering if this is I/O buffering problem.
>>>
>>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>>>
>>>>
>>>> hi Torsten,
>>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>>>> works
>>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta
>>>> format:
>>>>
>>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA 
>>>> SV
>>>> SPSMTVASSQ
>>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP 
>>>> LA
>>>> GTAPGAEGPA
>>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK 
>>>> AF
>>>> RRKEHLRRHR
>>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL 
>>>> RH
>>>> QRIHGRAAAS
>>>> AQGAVAPGPDGGGPFPPWPLG
>>>>
>>>> it seems like i'm just one bloody step away from success. ^ ^*
>>>> can't figure
>>>> out the prob.
>>>> thanks for your help.
>>>>
>>>>
>>>> Torsten Seemann wrote:
>>>>>
>>>>> Dorjee,
>>>>>
>>>>>> thanks alot for your reply again. as per your suggestion (using  
>>>>>> 'die
>>>>>> "could
>>>>>> not get seq" if not defined $queryin;'), i now get the following
>>>>>> error
>>>>>> message:
>>>>>> Software error:
>>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl
>>>>>> line 50.
>>>>>> i've attached the script. could you plz have a look at it and see
>>>>>> where
>>>>>> am i
>>>>>> going wrong.
>>>>>> cheers mate!
>>>>>
>>>>> This strongly suggests that your FASTA file is not actually in  
>>>>> FASTA
>>>>> format.
>>>>> http://en.wikipedia.org/wiki/Fasta_format
>>>>>
>>>>> Does it work if you pass it to blastall on the command line?
>>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/
>>>>> database/nr
>>>>>
>>>>>> Saier Lab.
>>>>>> 858-534-2457
>>>>>
>>>>> Are you working at UCSD?
>>>>>
>>>>> --Torsten
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/blastall-
>>>> problem-tf3527412.html#a9867402
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> Miller Research Fellow
>>> University of California, Berkeley
>>> lab: 510.642.8441
>>> http://pmb.berkeley.edu/~taylor/people/js.html
>>> http://fungalgenomes.org/
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/blastall- 
>> problem-tf3527412.html#a9875685
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
> 
> 
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9879110
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gilbertd at cricket.bio.indiana.edu  Fri Apr  6 23:31:29 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Fri, 6 Apr 2007 22:31:29 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704070331.l373VTI22000@cricket.bio.indiana.edu>


Dear Bioperlers,

There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta
files have fixed line widths, but that isn't a requirement of Fasta
format. The documentation notes this package requirement, but I was
bitten by this, and I'd guess not many people check their data (esp.
if from someone else) to see it meets this requirement.

Simple tools can easily produce fasta with ragged line formatting:
e.g. genome assemblers that paste together contig fasta with spacers
to make assemblies.

It would be nice if B:D:Fasta would check and die when it can't handle
this ragged input.  Here is a suggested addition:

  package Bio::DB::Fasta;

=head1 DESCRIPTION
  
  Entries may have any line length up to 65,536 characters, and
  different line lengths are allowed in the same file.  However, within
  a sequence entry, all lines must be the same length except for the
  last.  
+ An error will be thrown if this is not the case.

=cut

  use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want 
  
  sub calculate_offsets {
  
     my ($offset,$id,$linelength,$type,$firstline,$count,$termination_length,%offsets);
  +  my ($l3_len,$l2_len,$l_len)=(0,0,0);
  
         $self->_check_linelength($linelength);
  +      ($l3_len,$l2_len,$l_len)=(0,0,0);
       } else {
  +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to check every line :(
  +      if(DIE_ON_MISSMATCHED_LINES &&
  +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
  +         my $fap= substr($_,0,20)."..";
  +         $self->throw("Each line of the fasta entry must be the same length except the last.
  +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
  +         }
  
         $linelength ||= length($_);
  
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/

From hlapp at gmx.net  Sat Apr  7 12:42:13 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 12:42:13 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
References: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
Message-ID: <05D43C56-8B30-41C9-8C35-2CD77419DE7F@gmx.net>

Wouldn't it be easier (and more robust) to just reformat the file to  
meet the constant line width requirement? The code required to do  
that should be fewer lines than your addition below, I think.

For example, one could do a fast first-pass through the file simply  
checking that all sequence lines not followed by a description line  
or eof have the same length, stopping at the first line that fails  
the test. If unequal lengths, use Bio::SeqIO to read and write back  
out the fasta file, then continue as for well-formatted files.

	-hilmar

On Apr 6, 2007, at 11:31 PM, Don Gilbert wrote:

>
> Dear Bioperlers,
>
> There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta
> files have fixed line widths, but that isn't a requirement of Fasta
> format. The documentation notes this package requirement, but I was
> bitten by this, and I'd guess not many people check their data (esp.
> if from someone else) to see it meets this requirement.
>
> Simple tools can easily produce fasta with ragged line formatting:
> e.g. genome assemblers that paste together contig fasta with spacers
> to make assemblies.
>
> It would be nice if B:D:Fasta would check and die when it can't handle
> this ragged input.  Here is a suggested addition:
>
>   package Bio::DB::Fasta;
>
> =head1 DESCRIPTION
>
>   Entries may have any line length up to 65,536 characters, and
>   different line lengths are allowed in the same file.  However,  
> within
>   a sequence entry, all lines must be the same length except for the
>   last.
> + An error will be thrown if this is not the case.
>
> =cut
>
>   use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want
>
>   sub calculate_offsets {
>
>      my ($offset,$id,$linelength,$type,$firstline,$count, 
> $termination_length,%offsets);
>   +  my ($l3_len,$l2_len,$l_len)=(0,0,0);
>
>          $self->_check_linelength($linelength);
>   +      ($l3_len,$l2_len,$l_len)=(0,0,0);
>        } else {
>   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); #  
> need to check every line :(
>   +      if(DIE_ON_MISSMATCHED_LINES &&
>   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
>   +         my $fap= substr($_,0,20)."..";
>   +         $self->throw("Each line of the fasta entry must be the  
> same length except the last.
>   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
>   +         }
>
>          $linelength ||= length($_);
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr  7 17:13:24 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 17:13:24 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704071711.l37HBB823983@cricket.bio.indiana.edu>
References: <200704071711.l37HBB823983@cricket.bio.indiana.edu>
Message-ID: <8177CF47-558F-4891-97B5-69F327EF8A4A@gmx.net>

What I was suggesting was the indexer automatically does the  
reformatting, i.e., to have touch/change the input data if necessary  
(and obviously one would be able to turn this feature off when the  
correctness of the input formatting is known).

Are you suggesting that this automatic reformatting isn't possible?

	-hilmar

On Apr 7, 2007, at 1:11 PM, Don Gilbert wrote:

>
>
> Hilmar,
>
> I have added reformatting where appropriate (in code that installs the
> files for indexing by Bio::DB::Fasta).  What I'm suggesting is a patch
> to Bio::DB::Fasta to warn and die when the documented fixed width
> that Bio::DB::Fasta requires isn't met.  I.e., keep other folks from
> being bitten by this hard to identify requirement.  Then when they
> see that this indexer is failing on inappropriate inputs, they also  
> can reformat
> their Fasta to meet this requirement, and not continue to use the  
> software with
> bad results.  The operation of Bio::DB::Fasta is reading a sequence  
> stream
> and it doesn't touch/change the input data, so it would be hard to  
> patch it
> to re-format the input data.
>
> - Don
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr  7 21:00:51 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 21:00:51 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>
References: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>
Message-ID: <B8009E72-30C5-479B-B7B9-456E859B80CB@gmx.net>

Since you'd have to reformat it though, how would you do it then  
(presumably offline)?

	-hilmar

On Apr 7, 2007, at 8:06 PM, Don Gilbert wrote:

>
>
> Hilmar,
>
> Yes, basically automatic reformatting isn't possible. If you are
> indexing a large genome of fasta data, I'd not want a bioperl script
> to rewrite that data, or create a new version, automatically.
>
> - Don

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From gilbertd at cricket.bio.indiana.edu  Sat Apr  7 13:11:11 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Sat, 7 Apr 2007 12:11:11 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704071711.l37HBB823983@cricket.bio.indiana.edu>


Hilmar,

I have added reformatting where appropriate (in code that installs the 
files for indexing by Bio::DB::Fasta).  What I'm suggesting is a patch
to Bio::DB::Fasta to warn and die when the documented fixed width
that Bio::DB::Fasta requires isn't met.  I.e., keep other folks from
being bitten by this hard to identify requirement.  Then when they
see that this indexer is failing on inappropriate inputs, they also can reformat 
their Fasta to meet this requirement, and not continue to use the software with
bad results.  The operation of Bio::DB::Fasta is reading a sequence stream
and it doesn't touch/change the input data, so it would be hard to patch it
to re-format the input data.

- Don

-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/

From gilbertd at cricket.bio.indiana.edu  Sat Apr  7 20:06:34 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Sat, 7 Apr 2007 19:06:34 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>


Hilmar,

Yes, basically automatic reformatting isn't possible. If you are
indexing a large genome of fasta data, I'd not want a bioperl script
to rewrite that data, or create a new version, automatically.

- Don

From gdorjee at hotmail.com  Mon Apr  9 00:18:39 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 8 Apr 2007 21:18:39 -0700 (PDT)
Subject: [Bioperl-l] parse blast report for the best evalue
Message-ID: <9898358.post@talk.nabble.com>


hi all, 
i'm trying to parse a blast report using Bio::SearchIO as follows, but since
this blast report is generated with many against many (database) fasta
sequences, there're many individual blast reports (one for each of the
sequence from the query file). i was wondering if there is a way to get only
the best hit (with best evalue) from each one of them.

##### part of my script ######
my $in = new Bio::SearchIO(-format => 'blast',  -file   => $blast_report);
while( my $result = $in->next_result ) {
        while( my $hit = $result->next_hit ) {
              ...........

thanks.


-- 
View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9898358
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From staffa at niehs.nih.gov  Mon Apr  9 11:43:19 2007
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Mon, 09 Apr 2007 11:43:19 -0400
Subject: [Bioperl-l] Retrieve mRNA from Genome
Message-ID: <C23FD757.3FAB%staffa@niehs.nih.gov>

I have been retrieving sub-sequence from Genbank genomic records by use of
Bio::SeqIO
and ->get_SeqFeatures, ->start ->end ,
but now I'm looking for a quick way to extract CDS or mRNA from
a multi-segmented annotation, e.g.
     mRNA          
join(72458..72791,84573..84613,93279..94419,94481..94656,
                     94719..94992,95056..95350,95438..95553,95614..96056)

Is there such a method?
Please point me to appropriate documentation.


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From Kevin.M.Brown at asu.edu  Mon Apr  9 12:19:19 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 9 Apr 2007 09:19:19 -0700
Subject: [Bioperl-l] Retrieve mRNA from Genome
In-Reply-To: <C23FD757.3FAB%staffa@niehs.nih.gov>
References: <C23FD757.3FAB%staffa@niehs.nih.gov>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCAED7@EX02.asurite.ad.asu.edu>

I believe that is what the spliced_seq method is for

$feat->spliced_seq    # the "joined" sequence, when there are
                      # multiple sub-locations

http://www.bioperl.org/wiki/Bptutorial.pl 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Staffa, Nick (NIH/NIEHS)
> Sent: Monday, April 09, 2007 8:43 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Retrieve mRNA from Genome
> 
> I have been retrieving sub-sequence from Genbank genomic 
> records by use of Bio::SeqIO and ->get_SeqFeatures, ->start 
> ->end , but now I'm looking for a quick way to extract CDS or 
> mRNA from a multi-segmented annotation, e.g.
>      mRNA          
> join(72458..72791,84573..84613,93279..94419,94481..94656,
>                      
> 94719..94992,95056..95350,95438..95553,95614..96056)
> 
> Is there such a method?
> Please point me to appropriate documentation.


From cjfields at uiuc.edu  Mon Apr  9 12:50:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 9 Apr 2007 11:50:05 -0500
Subject: [Bioperl-l] parse blast report for the best evalue
In-Reply-To: <9898358.post@talk.nabble.com>
References: <9898358.post@talk.nabble.com>
Message-ID: <C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>

You should probably use sort_hits() with a coderef that sorts by  
evalue to ensure that you retrieve the best evalue (significance()  
for hits) (see POD for Bio::Search::Result::ResultI).  You could then  
do something like:

my $hit;

unless ($result->no_hits_found) {
    # pass coderef to sort by evalue
    $result->sort_hits(\&sort_by_evalue);
    # retrieve first (best) hit
    $hit = $result->next_hit;
}

# do whatever you want with the best Hit

If you plan on retaining data from hits over a ton of different  
reports it may be best (memory-wise) to only retain the data you want  
for each hit instead of retaining the actual object.  For instance,  
if you only care about the description and evalue set up a simple  
data structure to house what you want by the query data instead of  
retaining all the extra stuff in the Hit object you don't need (all  
the HSP data, etc).

chris

On Apr 8, 2007, at 11:18 PM, DeeGee wrote:

>
> hi all,
> i'm trying to parse a blast report using Bio::SearchIO as follows,  
> but since
> this blast report is generated with many against many (database) fasta
> sequences, there're many individual blast reports (one for each of the
> sequence from the query file). i was wondering if there is a way to  
> get only
> the best hit (with best evalue) from each one of them.
>
> ##### part of my script ######
> my $in = new Bio::SearchIO(-format => 'blast',  -file   =>  
> $blast_report);
> while( my $result = $in->next_result ) {
>         while( my $hit = $result->next_hit ) {
>               ...........
>
> thanks.
>
>
> -- 
> View this message in context: http://www.nabble.com/parse-blast- 
> report-for-the-best-evalue-tf3545784.html#a9898358
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr  9 15:40:02 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 9 Apr 2007 12:40:02 -0700 (PDT)
Subject: [Bioperl-l] parse blast report for the best evalue
In-Reply-To: <C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>
References: <9898358.post@talk.nabble.com>
	<C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>
Message-ID: <9907757.post@talk.nabble.com>


thank you, Chris.
^ ^*

Chris Fields wrote:
> 
> You should probably use sort_hits() with a coderef that sorts by  
> evalue to ensure that you retrieve the best evalue (significance()  
> for hits) (see POD for Bio::Search::Result::ResultI).  You could then  
> do something like:
> 
> my $hit;
> 
> unless ($result->no_hits_found) {
>     # pass coderef to sort by evalue
>     $result->sort_hits(\&sort_by_evalue);
>     # retrieve first (best) hit
>     $hit = $result->next_hit;
> }
> 
> # do whatever you want with the best Hit
> 
> If you plan on retaining data from hits over a ton of different  
> reports it may be best (memory-wise) to only retain the data you want  
> for each hit instead of retaining the actual object.  For instance,  
> if you only care about the description and evalue set up a simple  
> data structure to house what you want by the query data instead of  
> retaining all the extra stuff in the Hit object you don't need (all  
> the HSP data, etc).
> 
> chris
> 
> On Apr 8, 2007, at 11:18 PM, DeeGee wrote:
> 
>>
>> hi all,
>> i'm trying to parse a blast report using Bio::SearchIO as follows,  
>> but since
>> this blast report is generated with many against many (database) fasta
>> sequences, there're many individual blast reports (one for each of the
>> sequence from the query file). i was wondering if there is a way to  
>> get only
>> the best hit (with best evalue) from each one of them.
>>
>> ##### part of my script ######
>> my $in = new Bio::SearchIO(-format => 'blast',  -file   =>  
>> $blast_report);
>> while( my $result = $in->next_result ) {
>>         while( my $hit = $result->next_hit ) {
>>               ...........
>>
>> thanks.
>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/parse-blast- 
>> report-for-the-best-evalue-tf3545784.html#a9898358
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9907757
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bosborne11 at verizon.net  Tue Apr 10 09:55:37 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 10 Apr 2007 09:55:37 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
Message-ID: <C2410F99.DA34%bosborne11@verizon.net>

OK, applied.


On 4/6/07 11:31 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu> wrote:

>   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to
> check every line :(
>   +      if(DIE_ON_MISSMATCHED_LINES &&
>   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
>   +         my $fap= substr($_,0,20)."..";
>   +         $self->throw("Each line of the fasta entry must be the same length
> except the last.
>   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
>   +         }


From MEC at stowers-institute.org  Tue Apr 10 12:21:45 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 10 Apr 2007 11:21:45 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option
Message-ID: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>

Lincoln,

In `perldoc Bio::DB::SeqFeature::Store` I read:

"Caching requires the Tie::Cacher module to be installed. If the module
is not installed, then caching will silently be disabled."

I am wondering about the design motivation for silently disabling
caching when Tie::Cacher is not installed.  Perhaps at least emitting a
warning when -cache is requested and Tie::Cacher is not present is a
good idea?

I am writing a code that depends upon caching (i.e. upon the equality of
in-memory objects).

Do you advise that I don't depend upon Tie::Cacher working?  I
understand that it will NOT work as hoped if the cache is too small for
my application.

Thanks,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
 

From cjfields at uiuc.edu  Tue Apr 10 12:31:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 11:31:43 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
Message-ID: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>

At the moment we do not have a comprehensive list up on the wiki.  I  
have been slowly working (alphabetically!) to switch them over, so  
any help would be appreciated.

I have CC'd this to the main mail list for anyone else interested.

chris

On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:

> Hey guys,
>
> I noticed there's an open task regarding moving testing code to use
> Test::More etc and that Chris and Nathan are already on to it. Is
> there any kind of wiki page that you keep track of which modules you
> are already working on? I am new to this and want to contribute,
> having a fair amount of unit testing from work, but don't want to step
> over other people's work and avoid duplication as well.
> Any pointers where i could get started would be much appreciated :-)
>
> Thanks,
> Spiros
>
> ps. apologies if this is not the correct list to post this, just
> seemed the most intuitive choice.
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From spiros at lokku.com  Tue Apr 10 12:34:49 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Tue, 10 Apr 2007 17:34:49 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
Message-ID: <bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>

Okay, awesome, thank you for the info. I'll get started and see how it goes!

Spiros

On 4/10/07, Chris Fields <cjfields at uiuc.edu> wrote:
> At the moment we do not have a comprehensive list up on the wiki.  I
> have been slowly working (alphabetically!) to switch them over, so
> any help would be appreciated.
>
> I have CC'd this to the main mail list for anyone else interested.
>
> chris
>
> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>
> > Hey guys,
> >
> > I noticed there's an open task regarding moving testing code to use
> > Test::More etc and that Chris and Nathan are already on to it. Is
> > there any kind of wiki page that you keep track of which modules you
> > are already working on? I am new to this and want to contribute,
> > having a fair amount of unit testing from work, but don't want to step
> > over other people's work and avoid duplication as well.
> > Any pointers where i could get started would be much appreciated :-)
> >
> > Thanks,
> > Spiros
> >
> > ps. apologies if this is not the correct list to post this, just
> > seemed the most intuitive choice.
> > _______________________________________________
> > Bioperl-guts-l mailing list
> > Bioperl-guts-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

From cjfields at uiuc.edu  Tue Apr 10 12:34:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 11:34:12 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>
Message-ID: <0D396A53-9911-4304-88FE-CCD6884A2699@uiuc.edu>


On Apr 10, 2007, at 11:21 AM, Cook, Malcolm wrote:

> Lincoln,
>
> In `perldoc Bio::DB::SeqFeature::Store` I read:
>
> "Caching requires the Tie::Cacher module to be installed. If the  
> module
> is not installed, then caching will silently be disabled."
>
> I am wondering about the design motivation for silently disabling
> caching when Tie::Cacher is not installed.  Perhaps at least  
> emitting a
> warning when -cache is requested and Tie::Cacher is not present is a
> good idea?

...

Maybe this should be added to the optional BioPerl dependencies?   
It's not listed in Build.PL in CVS...

chris

From cjfields at uiuc.edu  Tue Apr 10 13:22:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 12:22:33 -0500
Subject: [Bioperl-l] ] moving tests to use Test::More
In-Reply-To: <bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>
Message-ID: <DFAA7C75-BC52-4027-9816-5970404D1558@uiuc.edu>

When moving tests over be particularly careful of 'ok' tests which  
should be 'is'; a few older tests have display messages which make  
things tricky.  Use 'isa_ok', 'use_ok', 'require_ok', 'like', etc.  
where appropriate.

Also, we are not supporting TODO blocks at this time due to the  
upgrade needed for Test::Harness (which isn't necessary for BioPerl  
functionality).  Just use a skip block with a message if you run into  
something, like this (from RNA_SearchIO.t):

SKIP: {
     skip('Working on meta string building; TODO', 3);
     is($hsp->meta, 'blahblahblah', "HSP meta");
     # two more tests...
}

Thanks for helping out!

chris

On Apr 10, 2007, at 11:34 AM, Spiros Denaxas wrote:

> Okay, awesome, thank you for the info. I'll get started and see how  
> it goes!
>
> Spiros
...


From gopu_36 at yahoo.com  Tue Apr 10 03:42:26 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT)
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome
Message-ID: <9915265.post@talk.nabble.com>


Hi,
I am one of the newbee venturingout bioperl for my research purposes. I have
a whole genome sequence of a pathogen. I am trying to break them into
non-overlapping 1000bps subsequences. For example if my whole genome
sequence is 400000 bps length, then I should be beak them into 4000
subsequences of each 1000 bps and they should be non-overlapping but at the
same time continous. To be precise, my first substring would be from 1 to
1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone
help me. 
I tried with the following code but it gives me only the first substring and
rest are not! I would appreciate very much if someone could help me!
.........
.
.
my $start =1;
my $finish =100;
my $inseq  = Bio::SeqIO->new(-file => "$in_file");
while( my $seq = $inseq->next_seq ) {
	
	my $cleseq = $seq->seq();
	
	$seqlength = $seq->length();
	if ($finish<$seqlength){	
	print "The length of the sequence is $seqlength\n";	
	my $ordseq = $cleseq->subseq($start,$finish);
          push(@seq_array,$ordseq);
          $start=+100;
          $finish=+100;
          $counter++;
          next;          	             
       } 
}
-- 
View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gopu_36 at yahoo.com  Tue Apr 10 03:42:26 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT)
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome
Message-ID: <9915265.post@talk.nabble.com>


Hi,
I am one of the newbee venturingout bioperl for my research purposes. I have
a whole genome sequence of a pathogen. I am trying to break them into
non-overlapping 1000bps subsequences. For example if my whole genome
sequence is 400000 bps length, then I should be beak them into 4000
subsequences of each 1000 bps and they should be non-overlapping but at the
same time continous. To be precise, my first substring would be from 1 to
1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone
help me. 
I tried with the following code but it gives me only the first substring and
rest are not! I would appreciate very much if someone could help me!
.........
.
.
my $start =1;
my $finish =100;
my $inseq  = Bio::SeqIO->new(-file => "$in_file");
while( my $seq = $inseq->next_seq ) {
	
	my $cleseq = $seq->seq();
	
	$seqlength = $seq->length();
	if ($finish<$seqlength){	
	print "The length of the sequence is $seqlength\n";	
	my $ordseq = $cleseq->subseq($start,$finish);
          push(@seq_array,$ordseq);
          $start=+100;
          $finish=+100;
          $counter++;
          next;          	             
       } 
}
-- 
View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bix at sendu.me.uk  Tue Apr 10 16:10:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 10 Apr 2007 21:10:35 +0100
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
 genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <461BEF3B.3080708@sendu.me.uk>

gopu_36 wrote:
> Hi,
> I am one of the newbee venturingout bioperl for my research purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences.
[snip]
> I tried with the following code but it gives me only the first substring and
> rest are not! I would appreciate very much if someone could help me!
[snip]
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	             
>        } 
> }

Unless I've misunderstood, there are a few problems here.

I'm guessing $in_file is a file containing the entire genome sequence as 
a single sequence. This means your while loop will only loop once. To do 
what you want you then need another loop that acts on the single $seq 
object you're going to get. You don't need $cleseq, and in fact your 
script ought to crash on the $cleseq->subseq line because $cleseq is a 
string which has no subseq() method. $seq->subseq is what you want.

I didn't look at the remaining code.


Hope that helps,
Sendu.

From cjfields at uiuc.edu  Tue Apr 10 16:22:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 15:22:15 -0500
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
	genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <88E9CC63-48FD-444B-877D-12BB1D944214@uiuc.edu>

There is a script in the BioPerl scripts directory which does this,  
with optional overlaps (split_seq.PLS).  It's in /scripts/seq.

chris

On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:

>
> Hi,
> I am one of the newbee venturingout bioperl for my research  
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping  
> but at the
> same time continous. To be precise, my first substring would be  
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could  
> anyone
> help me.
> I tried with the following code but it gives me only the first  
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	
>        }
> }
> -- 
> View this message in context: http://www.nabble.com/extract- 
> nonoverlapping-subsequences-from-a-whole-genome- 
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Apr 10 16:57:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 15:57:20 -0500
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
	genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <18529D36-C772-474A-9CE6-A29FA0C59ABA@uiuc.edu>

Okay, I was bored!  This is a little shorter than that script:

my $seqin = Bio::SeqIO->new(-format => 'fasta',
                             -file => shift);

my $seqout = Bio::SeqIO->new(-format => 'fasta',
                             -file => '>split.fas');

while( my $seq = $seqin->next_seq ) {
     my $seqlength = $seq->length();
     print STDERR "Length is $seqlength\n";
     my $start = 1;
     my $end = 100;
     my $desc = $seq->description;
     CHUNK:
     while ($end <= $seqlength){
         my $ordseq = $seq->trunc($start,$end);
         $ordseq->description("$start-$end $desc");
         $seqout->write_seq($ordseq);
         last CHUNK if $end >= $seqlength;
         $start += 100;
         $end = ($end + 100 > $seqlength) ? $seqlength : $end + 100;
     }
}

chris

On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:

>
> Hi,
> I am one of the newbee venturingout bioperl for my research  
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping  
> but at the
> same time continous. To be precise, my first substring would be  
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could  
> anyone
> help me.
> I tried with the following code but it gives me only the first  
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	
>        }
> }
> -- 
> View this message in context: http://www.nabble.com/extract- 
> nonoverlapping-subsequences-from-a-whole-genome- 
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Tue Apr 10 18:01:37 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 10 Apr 2007 18:01:37 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <C2410F99.DA34%bosborne11@verizon.net>
References: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
	<C2410F99.DA34%bosborne11@verizon.net>
Message-ID: <6dce9a0b0704101501y15b96e20w89c4b9ef4abc1b48@mail.gmail.com>

I'm happy I didn't catch this thread until just now, but my preferred course
of action was to do exactly what Brian did and accept the patch.

Lincoln

On 4/10/07, Brian Osborne <bosborne11 at verizon.net> wrote:
>
> OK, applied.
>
>
> On 4/6/07 11:31 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu>
> wrote:
>
> >   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need
> to
> > check every line :(
> >   +      if(DIE_ON_MISSMATCHED_LINES &&
> >   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
> >   +         my $fap= substr($_,0,20)."..";
> >   +         $self->throw("Each line of the fasta entry must be the same
> length
> > except the last.
> >   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
> >   +         }
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From heikki at sanbi.ac.za  Wed Apr 11 05:14:27 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 11 Apr 2007 11:14:27 +0200
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
Message-ID: <200704111114.27839.heikki@sanbi.ac.za>

What is going on here? Can anyone remember doing this?

	-Heikki 

Please can I ask what is the purpose of the line @pos = sort @pos; in
the select_noncont subroutine of SimpleAlign.pm.

 
In previous versions this line was not present and I could use the
function to reorder the alignment e.g in an alignment with 5 sequences I
could reorder it to put the second sequence last using
$aln->select_noncont(1,3,4,5,2). The sort function stops this, but even
if the idea is to sort numerically this dos not work since the sort
function as is will put 10 before 2, so that
->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .

 
Many thanks

 
Anthony

From cjfields at uiuc.edu  Wed Apr 11 08:33:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 07:33:42 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za>
References: <200704111114.27839.heikki@sanbi.ac.za>
Message-ID: <F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>

Don't know when this was added.  Maybe we should make the sorting  
optional?  In other words, pass an optional 'nosort' string as the  
first arg, defaulting to numerical sort.

Either way the sort needs to be changed by the looks of it.  I'll  
verify the bug and commit today.

chris

On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote:

> What is going on here? Can anyone remember doing this?
>
> 	-Heikki
>
> Please can I ask what is the purpose of the line @pos = sort @pos; in
> the select_noncont subroutine of SimpleAlign.pm.
>
>
>
> In previous versions this line was not present and I could use the
> function to reorder the alignment e.g in an alignment with 5  
> sequences I
> could reorder it to put the second sequence last using
> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but  
> even
> if the idea is to sort numerically this dos not work since the sort
> function as is will put 10 before 2, so that
> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>
>
>
> Many thanks
>
>
>
> Anthony
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lzlgboy at gmail.com  Wed Apr 11 08:48:30 2007
From: lzlgboy at gmail.com (kenzy ken)
Date: Wed, 11 Apr 2007 20:48:30 +0800
Subject: [Bioperl-l] How to Remove root node from a tree, ???
Message-ID: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>

Hi all:
    I write a script which used the Bio::Tree module. I want to remove some
nodes from the tree, so I used "$tree->remove_Node($node_object);method . It
works ok, but when I remove root node, problem happened. It seens that this
method can not remove root node, so ,if you guys have any idea about how to
remove the root ,it will be very appreciated.

-- 
??????
Chen,Kenian
===========================
School of Life Science, Sun Yat-Sen University
===========================
Xingang Xilu 135
Guangzhou, Guangdong 510275
P. R. China
===========================
Phone: (86) 20-84113677; (86) 20-34474683;
Fax: (86) 20-34022356
===========================
Email:lzlgboy at gmail.com;
chenkn at mail2.sysu.edu.cn


From cjfields at uiuc.edu  Wed Apr 11 09:13:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 08:13:40 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>
Message-ID: <9DE1A554-4F33-45D1-9043-732FEB86ECD5@uiuc.edu>

I confirmed this; it is now fixed in CVS.  I have also added the  
option to prevent sorting if needed:

$aln2 = $aln->select_noncont(6,7,8,9,10,1,2,3,4,5);

sorts numerically by default.

$aln2 = $aln->select_noncont('nosort',6,7,8,9,10,1,2,3,4,5);

prevents sorting.  I have added a few tests to SimpleAlign.t for  
these.  It doesn't change the default behavior so shouldn't break  
anything.

chris

On Apr 11, 2007, at 7:33 AM, Chris Fields wrote:

> Don't know when this was added.  Maybe we should make the sorting
> optional?  In other words, pass an optional 'nosort' string as the
> first arg, defaulting to numerical sort.
>
> Either way the sort needs to be changed by the looks of it.  I'll
> verify the bug and commit today.
>
> chris
>
> On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote:
>
>> What is going on here? Can anyone remember doing this?
>>
>> 	-Heikki
>>
>> Please can I ask what is the purpose of the line @pos = sort @pos; in
>> the select_noncont subroutine of SimpleAlign.pm.
>>
>>
>>
>> In previous versions this line was not present and I could use the
>> function to reorder the alignment e.g in an alignment with 5
>> sequences I
>> could reorder it to put the second sequence last using
>> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but
>> even
>> if the idea is to sort numerically this dos not work since the sort
>> function as is will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>
>>
>>
>> Many thanks
>>
>>
>>
>> Anthony
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 11 09:21:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 11 Apr 2007 14:21:25 +0100
Subject: [Bioperl-l] How to Remove root node from a tree, ???
In-Reply-To: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>
References: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>
Message-ID: <461CE0D5.9040001@sendu.me.uk>

kenzy ken wrote:
> Hi all:
>    I write a script which used the Bio::Tree module. I want to remove some
> nodes from the tree, so I used "$tree->remove_Node($node_object);method 
> . It
> works ok, but when I remove root node, problem happened. It seens that this
> method can not remove root node, so ,if you guys have any idea about how to
> remove the root ,it will be very appreciated.

You'll have to re-root the tree to some other node in the tree. See the 
reroot() method.

(I don't think Bio::Tree::Tree objects can be unrooted.)

From emeric.sevin at univ-rennes1.fr  Wed Apr 11 09:32:38 2007
From: emeric.sevin at univ-rennes1.fr (Emeric Sevin)
Date: Wed, 11 Apr 2007 15:32:38 +0200
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
Message-ID: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>

Hi everybody,

I'm sorry to bug, but either I missed something so obvious nobody 
bothered to answer, either I'm being a little boycotted here...
A little help would be very much appreciated

Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :

> Hello,
>
> I am new to this community, and apologize if this subject has been 
> posted before.
>
> I want to print out only selected results from a multiple 
> blast-alignments results file. Problem is, the algorithm used is 
> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the actual 
> writing task yields "unclean" warnings. Although an ouput is actually 
> written, the writer (Bio::SearchIO::Writer::TextResultWriter) seems to 
> be disturbed by the fact rpsblast DBs are not labeled with 
> "protein"/"nucleic"/"translated".
> Does anybody know of an easy fix to that bug, or of another way to 
> come around it?
>
> Thank you very much
>
> Emeric SEVIN
> Universit? de Rennes 1_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1110 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/9784f194/attachment.bin 

From cjfields at uiuc.edu  Wed Apr 11 10:44:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 09:44:27 -0500
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
Message-ID: <D0E54B3C-A345-4A90-9571-25144622265D@uiuc.edu>

We could ignore this post... oh the irony!  ;>

It has nothing to do with ignoring you.  Read this:

http://en.wikipedia.org/wiki/Warnock's_Dilemma

Basically your question probably fell on deaf ears b/c no one has  
time to look into it and post a fix.  Realize that BioPerl is, for  
the large part, a volunteer effort and we all have $jobs to worry  
about.  If you want you are more than welcome to file a bug on this  
(if it isn't already filed), which is the best way to make sure  
something is done:

http://www.bioperl.org/wiki/Bugs
http://bugzilla.open-bio.org/

chris


On Apr 11, 2007, at 8:32 AM, Emeric Sevin wrote:

> Hi everybody,
>
> I'm sorry to bug, but either I missed something so obvious nobody  
> bothered to answer, either I'm being a little boycotted here...
> A little help would be very much appreciated
>
> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>
>> Hello,
>>
>> I am new to this community, and apologize if this subject has been  
>> posted before.
>>
>> I want to print out only selected results from a multiple blast- 
>> alignments results file. Problem is, the algorithm used is  
>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the  
>> actual writing task yields "unclean" warnings. Although an ouput  
>> is actually written, the writer  
>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by  
>> the fact rpsblast DBs are not labeled with  
>> "protein"/"nucleic"/"translated".
>> Does anybody know of an easy fix to that bug, or of another way to  
>> come around it?
>>
>> Thank you very much
>>
>> Emeric SEVIN
>> Universit? de Rennes 1_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Wed Apr 11 10:30:11 2007
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Wed, 11 Apr 2007 15:30:11 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
Message-ID: <461CF0F3.1010708@sheffield.ac.uk>

It should be easy enough to find those t/*.t files that have "use Test;" 
or "require Test;" This should provide a list of files still needing to 
be converted over to Test::More. As discussed previously, it may also be 
useful to use Test::Exception to test for situations where 
exceptions/warnings are thrown. If you add additional tests using this 
module, you should add the Test::Exception module to t/lib/

Good luck, and feel free to mail the list with questions/comments etc.

Nath


Chris Fields wrote:
> At the moment we do not have a comprehensive list up on the wiki.  I  
> have been slowly working (alphabetically!) to switch them over, so  
> any help would be appreciated.
>
> I have CC'd this to the main mail list for anyone else interested.
>
> chris
>
> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>
>   
>> Hey guys,
>>
>> I noticed there's an open task regarding moving testing code to use
>> Test::More etc and that Chris and Nathan are already on to it. Is
>> there any kind of wiki page that you keep track of which modules you
>> are already working on? I am new to this and want to contribute,
>> having a fair amount of unit testing from work, but don't want to step
>> over other people's work and avoid duplication as well.
>> Any pointers where i could get started would be much appreciated :-)
>>
>> Thanks,
>> Spiros
>>
>> ps. apologies if this is not the correct list to post this, just
>> seemed the most intuitive choice.
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From spiros at lokku.com  Wed Apr 11 10:56:22 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Wed, 11 Apr 2007 15:56:22 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <461CF0F3.1010708@sheffield.ac.uk>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
Message-ID: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>

Yep! I have some rough stats I have at home, I will post them later on
tonight. Roughly, if i remember correctly, 50% of the tests were still
using Test, all the others were using Test::More.

More to follow later on,
Spiros

On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> It should be easy enough to find those t/*.t files that have "use Test;"
> or "require Test;" This should provide a list of files still needing to
> be converted over to Test::More. As discussed previously, it may also be
> useful to use Test::Exception to test for situations where
> exceptions/warnings are thrown. If you add additional tests using this
> module, you should add the Test::Exception module to t/lib/
>
> Good luck, and feel free to mail the list with questions/comments etc.
>
> Nath
>
>
> Chris Fields wrote:
> > At the moment we do not have a comprehensive list up on the wiki.  I
> > have been slowly working (alphabetically!) to switch them over, so
> > any help would be appreciated.
> >
> > I have CC'd this to the main mail list for anyone else interested.
> >
> > chris
> >
> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> >
> >
> >> Hey guys,
> >>
> >> I noticed there's an open task regarding moving testing code to use
> >> Test::More etc and that Chris and Nathan are already on to it. Is
> >> there any kind of wiki page that you keep track of which modules you
> >> are already working on? I am new to this and want to contribute,
> >> having a fair amount of unit testing from work, but don't want to step
> >> over other people's work and avoid duplication as well.
> >> Any pointers where i could get started would be much appreciated :-)
> >>
> >> Thanks,
> >> Spiros
> >>
> >> ps. apologies if this is not the correct list to post this, just
> >> seemed the most intuitive choice.
> >> _______________________________________________
> >> Bioperl-guts-l mailing list
> >> Bioperl-guts-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> >>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>

From Kevin.M.Brown at asu.edu  Wed Apr 11 11:14:07 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 11 Apr 2007 08:14:07 -0700
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za>
References: <200704111114.27839.heikki@sanbi.ac.za>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>

> What is going on here? Can anyone remember doing this?
> 
> 	-Heikki 
> 
> Please can I ask what is the purpose of the line @pos = sort 
> @pos; in the select_noncont subroutine of SimpleAlign.pm.
> 
>  
> 
> In previous versions this line was not present and I could 
> use the function to reorder the alignment e.g in an alignment 
> with 5 sequences I could reorder it to put the second 
> sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
> function stops this, but even if the idea is to sort 
> numerically this dos not work since the sort function as is 
> will put 10 before 2, so that
> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .

Not sure why 10 would come before 2 since perl would interpret that list
as a series of integers even if they were entered as strings and do the
sort.


From spiros at lokku.com  Wed Apr 11 11:51:27 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Wed, 11 Apr 2007 16:51:27 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <bba689ec0704110851qb1aa272m5db4e01356f28e92@mail.gmail.com>

This looks like the case of cmp vs <=> I think !

my @array = (1,10,2,3,4,5,6,7,8,9) ;
print join(",", @array), "\n";
my @sorted1 = sort(@array) ;
print join(",", @sorted1), "\n";
my @sorted2 = (sort { $a <=> $b } @array);
print join(",", @sorted2), "\n";

idaru:/tmp spiros$ perl koko.pl
1,10,2,3,4,5,6,7,8,9 # normal array
1,10,2,3,4,5,6,7,8,9 # sorted with sort
1,2,3,4,5,6,7,8,9,10 # sorted with <=>

Spiros


On 4/11/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> > What is going on here? Can anyone remember doing this?
> >
> >       -Heikki
> >
> > Please can I ask what is the purpose of the line @pos = sort
> > @pos; in the select_noncont subroutine of SimpleAlign.pm.
> >
> >
> >
> > In previous versions this line was not present and I could
> > use the function to reorder the alignment e.g in an alignment
> > with 5 sequences I could reorder it to put the second
> > sequence last using $aln->select_noncont(1,3,4,5,2). The sort
> > function stops this, but even if the idea is to sort
> > numerically this dos not work since the sort function as is
> > will put 10 before 2, so that
> > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From ak at ebi.ac.uk  Wed Apr 11 11:58:52 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Wed, 11 Apr 2007 16:58:52 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <20070411155852.GC24537@ebi.ac.uk>

On Wed, Apr 11, 2007 at 08:14:07AM -0700, Kevin Brown wrote:
> > What is going on here? Can anyone remember doing this?
> > 
> > 	-Heikki 
> > 
> > Please can I ask what is the purpose of the line @pos = sort 
> > @pos; in the select_noncont subroutine of SimpleAlign.pm.
> > 
> >  
> > 
> > In previous versions this line was not present and I could 
> > use the function to reorder the alignment e.g in an alignment 
> > with 5 sequences I could reorder it to put the second 
> > sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
> > function stops this, but even if the idea is to sort 
> > numerically this dos not work since the sort function as is 
> > will put 10 before 2, so that
> > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> 
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.

Really?

$ perl -e 'print join(" ", sort(1..20)), "\n"';
1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9


-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
-------------------*=<>=*-------------------

From mkiwala at watson.wustl.edu  Wed Apr 11 11:51:35 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Wed, 11 Apr 2007 10:51:35 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <461D0407.8050105@watson.wustl.edu>

Kevin Brown wrote:
>> What is going on here? Can anyone remember doing this?
>>
>> 	-Heikki 
>>
>> Please can I ask what is the purpose of the line @pos = sort 
>> @pos; in the select_noncont subroutine of SimpleAlign.pm.
>>
>>  
>>
>> In previous versions this line was not present and I could 
>> use the function to reorder the alignment e.g in an alignment 
>> with 5 sequences I could reorder it to put the second 
>> sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
>> function stops this, but even if the idea is to sort 
>> numerically this dos not work since the sort function as is 
>> will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>     
>
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.
>
>   
Because, according to the documentation for Perl's sort function, 
sorting occurs "in standard string comparison order" unless the user 
specifies another comparison function to use.


From cjfields at uiuc.edu  Wed Apr 11 12:45:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 11:45:11 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
Message-ID: <FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>

We should probably place something on the wiki to prevent overlaps  
(i.e. make sure no two devs are working on the same tests).  I  
planned on working on the G's last night but got bogged down.

Spiros, if you haven't already go ahead and create a list on a wiki  
page for tracking.  We can lay claim to them by tagging with our sigs  
and cross them off once complete.

chris

On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:

> Yep! I have some rough stats I have at home, I will post them later on
> tonight. Roughly, if i remember correctly, 50% of the tests were still
> using Test, all the others were using Test::More.
>
> More to follow later on,
> Spiros
>
> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
>> It should be easy enough to find those t/*.t files that have "use  
>> Test;"
>> or "require Test;" This should provide a list of files still  
>> needing to
>> be converted over to Test::More. As discussed previously, it may  
>> also be
>> useful to use Test::Exception to test for situations where
>> exceptions/warnings are thrown. If you add additional tests using  
>> this
>> module, you should add the Test::Exception module to t/lib/
>>
>> Good luck, and feel free to mail the list with questions/comments  
>> etc.
>>
>> Nath
>>
>>
>> Chris Fields wrote:
>> > At the moment we do not have a comprehensive list up on the  
>> wiki.  I
>> > have been slowly working (alphabetically!) to switch them over, so
>> > any help would be appreciated.
>> >
>> > I have CC'd this to the main mail list for anyone else interested.
>> >
>> > chris
>> >
>> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>> >
>> >
>> >> Hey guys,
>> >>
>> >> I noticed there's an open task regarding moving testing code to  
>> use
>> >> Test::More etc and that Chris and Nathan are already on to it. Is
>> >> there any kind of wiki page that you keep track of which  
>> modules you
>> >> are already working on? I am new to this and want to contribute,
>> >> having a fair amount of unit testing from work, but don't want  
>> to step
>> >> over other people's work and avoid duplication as well.
>> >> Any pointers where i could get started would be much  
>> appreciated :-)
>> >>
>> >> Thanks,
>> >> Spiros
>> >>
>> >> ps. apologies if this is not the correct list to post this, just
>> >> seemed the most intuitive choice.
>> >> _______________________________________________
>> >> Bioperl-guts-l mailing list
>> >> Bioperl-guts-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>> >>
>> >
>> > Christopher Fields
>> > Postdoctoral Researcher
>> > Lab of Dr. Robert Switzer
>> > Dept of Biochemistry
>> > University of Illinois Urbana-Champaign
>> >
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 11 12:09:54 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 11 Apr 2007 17:09:54 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <461D0852.9070802@sendu.me.uk>

Kevin Brown wrote:
>>  but even if the idea is to sort
>> numerically this dos not work since the sort function as is 
>> will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> 
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.

The default sort for sort() is { $a cmp $b } (standard string comparison 
order): 10 comes before 2.

The fix was to explicitly say sort { $a <=> $b } for a numeric sort.

From cjfields at uiuc.edu  Wed Apr 11 12:46:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 11:46:46 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <461D0407.8050105@watson.wustl.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
	<461D0407.8050105@watson.wustl.edu>
Message-ID: <7001A1A4-5CF4-4C70-8EFA-94AF0D16864C@uiuc.edu>

I have confirmed the bug and fixed this in CVS.  Kevin's right; sort  
defaults to string comparison if no subroutine or sort block is  
specified.

perldoc -f sort:

sort SUBNAME LIST
sort BLOCK LIST
sort LIST
...
If SUBNAME or BLOCK is omitted, "sort"s in standard string com-
parison order.
...

chris

On Apr 11, 2007, at 10:51 AM, Michael Kiwala wrote:

> Kevin Brown wrote:
>>> What is going on here? Can anyone remember doing this?
>>>
>>> 	-Heikki
>>>
>>> Please can I ask what is the purpose of the line @pos = sort
>>> @pos; in the select_noncont subroutine of SimpleAlign.pm.
>>>
>>>
>>>
>>> In previous versions this line was not present and I could
>>> use the function to reorder the alignment e.g in an alignment
>>> with 5 sequences I could reorder it to put the second
>>> sequence last using $aln->select_noncont(1,3,4,5,2). The sort
>>> function stops this, but even if the idea is to sort
>>> numerically this dos not work since the sort function as is
>>> will put 10 before 2, so that
>>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the  
>>> sequences in
>>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>>
>>
>> Not sure why 10 would come before 2 since perl would interpret  
>> that list
>> as a series of integers even if they were entered as strings and  
>> do the
>> sort.
>>
>>
> Because, according to the documentation for Perl's sort function,
> sorting occurs "in standard string comparison order" unless the user
> specifies another comparison function to use.


From heikki at sanbi.ac.za  Wed Apr 11 12:39:57 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 11 Apr 2007 18:39:57 +0200
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
Message-ID: <200704111839.58940.heikki@sanbi.ac.za>

A bit more than half is still using Test:

~/src/bioperl/core/t>  perl -lne 'print $1 if /use +(Test[^\sO;]*);/' *t | 
sort | uniq -c | sort -nr
    147 Test
     97 Test::More


Feel free to add scripts and functionality into core/maintenance directory of 
bioperl-live if you want to keep track of things in modules and tests.

	-Heikki


On Wednesday 11 April 2007 16:56:22 Spiros Denaxas wrote:
> Yep! I have some rough stats I have at home, I will post them later on
> tonight. Roughly, if i remember correctly, 50% of the tests were still
> using Test, all the others were using Test::More.
>
> More to follow later on,
> Spiros
>
> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> > It should be easy enough to find those t/*.t files that have "use Test;"
> > or "require Test;" This should provide a list of files still needing to
> > be converted over to Test::More. As discussed previously, it may also be
> > useful to use Test::Exception to test for situations where
> > exceptions/warnings are thrown. If you add additional tests using this
> > module, you should add the Test::Exception module to t/lib/
> >
> > Good luck, and feel free to mail the list with questions/comments etc.
> >
> > Nath
> >
> > Chris Fields wrote:
> > > At the moment we do not have a comprehensive list up on the wiki.  I
> > > have been slowly working (alphabetically!) to switch them over, so
> > > any help would be appreciated.
> > >
> > > I have CC'd this to the main mail list for anyone else interested.
> > >
> > > chris
> > >
> > > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> > >> Hey guys,
> > >>
> > >> I noticed there's an open task regarding moving testing code to use
> > >> Test::More etc and that Chris and Nathan are already on to it. Is
> > >> there any kind of wiki page that you keep track of which modules you
> > >> are already working on? I am new to this and want to contribute,
> > >> having a fair amount of unit testing from work, but don't want to step
> > >> over other people's work and avoid duplication as well.
> > >> Any pointers where i could get started would be much appreciated :-)
> > >>
> > >> Thanks,
> > >> Spiros
> > >>
> > >> ps. apologies if this is not the correct list to post this, just
> > >> seemed the most intuitive choice.
> > >> _______________________________________________
> > >> Bioperl-guts-l mailing list
> > >> Bioperl-guts-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From stewarta at nmrc.navy.mil  Wed Apr 11 14:40:18 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Wed, 11 Apr 2007 14:40:18 -0400
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
Message-ID: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>

First of all, mucho kudos to those who revamped this module.  It  
works really nice.  I have a couple thoughts..

* The .predict file from Glimmer provides frame and score information  
which could be parsed and included in the generated feature prediction

* It'd be nice to include the orfID somewhere on the feature  
prediction..  maybe the seqID ? (these could be post-processed into  
locus_tags for those using Glimmer as a preliminary annotation tool)

* Options to set the source and primary tags to something other than  
the default (ie) Glimmer3.X and 'transcript'.  This could always be  
done post-Bio::Tools::Glimmer, though, of course.

* This section..

         elsif (
                # Glimmer 2.X prediction
                (/^\s+(\d+)\s+      # gene num
                 (\d+)\s+(\d+)\s+   # start, end
                 \[([\+\-])\d{1}\s+ # strand
                 /ox ) ||
                # Glimmer 3.X prediction
                (/\w+(\d+)\s+       # orf (numeric portion)
                 (\d+)\s+(\d+)\s+   # start, end
                 ([\+\-])\d{1}\s+   # strand
                /ox)) {
	    my ($genenum,$start,$end,$strand) =
		( $1,$2,$3,$4 );

...isn't picking up more than the last digit in the orf-number.  Not  
sure if that's intentional.  A sample of the feature output using - 
 >gff_string shows up as ...

test-pseudocontig       Glimmer_3.X     transcript      1018     
8       .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      1134     
1736    .       +       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      1832     
2596    .       +       .       Group GenePrediction_4
test-pseudocontig       Glimmer_3.X     transcript      2710     
3225    .       +       .       Group GenePrediction_5
test-pseudocontig       Glimmer_3.X     transcript      3246     
4016    .       +       .       Group GenePrediction_6
test-pseudocontig       Glimmer_3.X     transcript      4177     
5064    .       +       .       Group GenePrediction_7
test-pseudocontig       Glimmer_3.X     transcript      5083     
5673    .       +       .       Group GenePrediction_8
test-pseudocontig       Glimmer_3.X     transcript      6001     
7275    .       +       .       Group GenePrediction_9
test-pseudocontig       Glimmer_3.X     transcript      7530     
8081    .       +       .       Group GenePrediction_0
test-pseudocontig       Glimmer_3.X     transcript      8785     
8117    .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      9423     
8788    .       -       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      10088    
9549    .       -       .       Group GenePrediction_3

...which was parsed originally from...

orf00001     1018        8  -2     2.95
orf00002     1134     1736  +3     2.91
orf00004     1832     2596  +2     2.93
orf00005     2710     3225  +1     2.90
orf00006     3246     4016  +3     2.93
orf00007     4177     5064  +1     2.94
orf00008     5083     5673  +1     2.91
orf00009     6001     7275  +1     2.96
orf00010     7530     8081  +3     2.58
orf00011     8785     8117  -2     2.92
orf00012     9423     8788  -1     2.81
orf00013    10088     9549  -3     2.90

* It'd also be nice if you could somehow set the string that is  
placed in front of the orf-number in the line...

                  '-tag'         => { 'Group' => "GenePrediction_ 
$genenum"},

...seeing as how these tag/values can't seem to be changed manually  
anymore without getting into AnnotationCollection stuff, which is no  
longer a simple matter of changing a tag/value string.  (By the way,  
where can I find a list of AnnotationCollectionI compliant objects?)


Any thoughts on the suggestions?  (I don't mind taking a stab at  
incorporating them into the code.. I've never submitted anything to  
BioPerl before)


-Andrew


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From cjfields at uiuc.edu  Wed Apr 11 15:53:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 14:53:54 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
Message-ID: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>

I'm posting this to the mail list in case anyone has any ideas on  
what is going on...

I have noticed an odd (read: annoying) rash of spam on the wiki.   
Jason also ran some spam reversions, so maybe he can chime in.   
Essentially it looks like the responsible spambots 'correct' the wiki  
text and links, so that '+' is being removed and URI-encoded symbols  
in links are reverted to symbols.  Unfortunately the removal occurs  
in all text, so places where '+' is intended (for instance, raw text  
for showing example record formats) are also changed.  My guess is  
we'll need to block the IP address or add to the blacklist if possible.

Between Jason and I we have blocked ~9 spambots and counting.   
Couldn't find anything via Google yet...

chris

From torsten.seemann at infotech.monash.edu.au  Wed Apr 11 20:33:02 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 12 Apr 2007 10:33:02 +1000
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
Message-ID: <a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>

Andrew,

>                 # Glimmer 3.X prediction
>                 (/\w+(\d+)\s+       # orf (numeric portion)
> ...isn't picking up more than the last digit in the orf-number.  Not
> sure if that's intentional.  A sample of the feature output using -
>  >gff_string shows up as ...

I think that regexp should be \w+?(\d+)

ie. the \w+ should be non-greedy, otherwise it will swallow up all but
one of the following \d+ (as \d is a subset of \w)

I've CC:ed this to Mark Johnson who made the recent changes to this module.

Thanks for your feedback,

--Torsten Seemann

From spiros at lokku.com  Wed Apr 11 21:08:47 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 02:08:47 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
Message-ID: <bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>

Good idea Chris. Just got back home so will probably do it tomorrow
morning or so.

Spiros

On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> We should probably place something on the wiki to prevent overlaps
> (i.e. make sure no two devs are working on the same tests).  I
> planned on working on the G's last night but got bogged down.
>
> Spiros, if you haven't already go ahead and create a list on a wiki
> page for tracking.  We can lay claim to them by tagging with our sigs
> and cross them off once complete.
>
> chris
>
> On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
>
> > Yep! I have some rough stats I have at home, I will post them later on
> > tonight. Roughly, if i remember correctly, 50% of the tests were still
> > using Test, all the others were using Test::More.
> >
> > More to follow later on,
> > Spiros
> >
> > On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> >> It should be easy enough to find those t/*.t files that have "use
> >> Test;"
> >> or "require Test;" This should provide a list of files still
> >> needing to
> >> be converted over to Test::More. As discussed previously, it may
> >> also be
> >> useful to use Test::Exception to test for situations where
> >> exceptions/warnings are thrown. If you add additional tests using
> >> this
> >> module, you should add the Test::Exception module to t/lib/
> >>
> >> Good luck, and feel free to mail the list with questions/comments
> >> etc.
> >>
> >> Nath
> >>
> >>
> >> Chris Fields wrote:
> >> > At the moment we do not have a comprehensive list up on the
> >> wiki.  I
> >> > have been slowly working (alphabetically!) to switch them over, so
> >> > any help would be appreciated.
> >> >
> >> > I have CC'd this to the main mail list for anyone else interested.
> >> >
> >> > chris
> >> >
> >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> >> >
> >> >
> >> >> Hey guys,
> >> >>
> >> >> I noticed there's an open task regarding moving testing code to
> >> use
> >> >> Test::More etc and that Chris and Nathan are already on to it. Is
> >> >> there any kind of wiki page that you keep track of which
> >> modules you
> >> >> are already working on? I am new to this and want to contribute,
> >> >> having a fair amount of unit testing from work, but don't want
> >> to step
> >> >> over other people's work and avoid duplication as well.
> >> >> Any pointers where i could get started would be much
> >> appreciated :-)
> >> >>
> >> >> Thanks,
> >> >> Spiros
> >> >>
> >> >> ps. apologies if this is not the correct list to post this, just
> >> >> seemed the most intuitive choice.
> >> >> _______________________________________________
> >> >> Bioperl-guts-l mailing list
> >> >> Bioperl-guts-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> >> >>
> >> >
> >> > Christopher Fields
> >> > Postdoctoral Researcher
> >> > Lab of Dr. Robert Switzer
> >> > Dept of Biochemistry
> >> > University of Illinois Urbana-Champaign
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

From Kevin.M.Brown at asu.edu  Thu Apr 12 11:24:15 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 12 Apr 2007 08:24:15 -0700
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <461D0407.8050105@watson.wustl.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
	<461D0407.8050105@watson.wustl.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>

> >> What is going on here? Can anyone remember doing this?

> >> Please can I ask what is the purpose of the line @pos = 
> sort @pos; in 
> >> the select_noncont subroutine of SimpleAlign.pm.
> >>
> >>  
> >>
> >> In previous versions this line was not present and I could use the 
> >> function to reorder the alignment e.g in an alignment with 5 
> >> sequences I could reorder it to put the second sequence last using 
> >> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but 
> >> even if the idea is to sort numerically this dos not work 
> since the 
> >> sort function as is will put 10 before 2, so that
> >> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the 
> sequences in
> >> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> >>     
> >
> > Not sure why 10 would come before 2 since perl would interpret that 
> > list as a series of integers even if they were entered as 
> strings and 
> > do the sort.
> >
> >   
> Because, according to the documentation for Perl's sort 
> function, sorting occurs "in standard string comparison 
> order" unless the user specifies another comparison function to use.

OK, guess I never realized that since I've used just "sort @array" and
gotten things back how I expected them to be.


From bix at sendu.me.uk  Thu Apr 12 11:58:53 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 12 Apr 2007 16:58:53 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>	<461D0407.8050105@watson.wustl.edu>
	<1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>
Message-ID: <461E573D.1060906@sendu.me.uk>

Kevin Brown wrote:
>> Because, according to the documentation for Perl's sort 
>> function, sorting occurs "in standard string comparison 
>> order" unless the user specifies another comparison function to use.
> 
> OK, guess I never realized that since I've used just "sort @array" and
> gotten things back how I expected them to be.

If you were sorting numbers, getting the order wrong either didn't 
matter or you didn't notice the problem. Not realizing sort won't do 
what you expect in this case is a common source of bugs.

It might be worth it for you (and anyone else) to go through your old 
code to make sure you haven't been bitten.

From johnsonm at gmail.com  Thu Apr 12 13:26:33 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 12:26:33 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
Message-ID: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>

    I'd call that a buggy regexp.  Sounds like a good (but minimal)
fix.  Torsten, I don't have cvs write access, I think you do, can you
fix that up?  Andrew, can you file that as a bug:

http://bugzilla.bioperl.org/

    Everything else sounds like enhancements.  I'm not necessarily
opposed, but a little discussion is probably in order before putting
any tickets in for any of that.  Also, I'm not sure when I'll be able
to spare some time to work on the module.  It was easy to justify
spending time from my day job getting the module up to where is now,
as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
quite well for my purposes.  Again, I'm not opposed to further
enhancements, but If I'm going to work on any of them, they'll have to
fit into everything else I'm doing and it could be a while.  However,
there's no reason somebody else can't do what I did.  Discuss the
changes here, work out a plan, implement it, send along the diff(s)
attached to a bug in bugzilla.  Next thing you know, your changes are
in cvs.  8)

On 4/11/07, Torsten Seemann <torsten.seemann at infotech.monash.edu.au> wrote:
> Andrew,
>
> >                 # Glimmer 3.X prediction
> >                 (/\w+(\d+)\s+       # orf (numeric portion)
> > ...isn't picking up more than the last digit in the orf-number.  Not
> > sure if that's intentional.  A sample of the feature output using -
> >  >gff_string shows up as ...
>
> I think that regexp should be \w+?(\d+)
>
> ie. the \w+ should be non-greedy, otherwise it will swallow up all but
> one of the following \d+ (as \d is a subset of \w)
>
> I've CC:ed this to Mark Johnson who made the recent changes to this module.
>
> Thanks for your feedback,
>
> --Torsten Seemann

From cjfields at uiuc.edu  Thu Apr 12 14:11:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 13:11:33 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
Message-ID: <7314C1CD-8AD5-4400-A495-6C8124833D0D@uiuc.edu>

Agreed; anyone can suggest code enhancements and bug fixes and submit  
patches for these:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

You'll see a long list of unimplemented enhancement requests in  
Bugzilla.  These are the ones where no patch is given; you'll find  
that very few are willing to go through the effort to work on them  
unless there is something in it for them!  Enhancement requests that  
come with patches and tests tend to get committed fairly rapidly  
(sometimes within hours).

chris

On Apr 12, 2007, at 12:26 PM, Mark Johnson wrote:

>     I'd call that a buggy regexp.  Sounds like a good (but minimal)
> fix.  Torsten, I don't have cvs write access, I think you do, can you
> fix that up?  Andrew, can you file that as a bug:
>
> http://bugzilla.bioperl.org/
>
>     Everything else sounds like enhancements.  I'm not necessarily
> opposed, but a little discussion is probably in order before putting
> any tickets in for any of that.  Also, I'm not sure when I'll be able
> to spare some time to work on the module.  It was easy to justify
> spending time from my day job getting the module up to where is now,
> as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
> quite well for my purposes.  Again, I'm not opposed to further
> enhancements, but If I'm going to work on any of them, they'll have to
> fit into everything else I'm doing and it could be a while.  However,
> there's no reason somebody else can't do what I did.  Discuss the
> changes here, work out a plan, implement it, send along the diff(s)
> attached to a bug in bugzilla.  Next thing you know, your changes are
> in cvs.  8)
>
> On 4/11/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>> Andrew,
>>
>>>                 # Glimmer 3.X prediction
>>>                 (/\w+(\d+)\s+       # orf (numeric portion)
>>> ...isn't picking up more than the last digit in the orf-number.  Not
>>> sure if that's intentional.  A sample of the feature output using -
>>>> gff_string shows up as ...
>>
>> I think that regexp should be \w+?(\d+)
>>
>> ie. the \w+ should be non-greedy, otherwise it will swallow up all  
>> but
>> one of the following \d+ (as \d is a subset of \w)
>>
>> I've CC:ed this to Mark Johnson who made the recent changes to  
>> this module.
>>
>> Thanks for your feedback,
>>
>> --Torsten Seemann


From stewarta at nmrc.navy.mil  Thu Apr 12 14:35:00 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 12 Apr 2007 14:35:00 -0400
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
Message-ID: <DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>

I'm willing to do the coding and testing, I'm just not familiar with  
the submission process yet (I see there's a HOWTO now, nice).   Let's  
discuss first.

So to reiterate, I'm suggesting that the module also parse out the  
frame and score information from Glimmer output.  I take back my  
suggestion of overriding the source / primary tags through the module  
as this can easily be done post-parser.  Other annotations can also  
be edited post-parser easily enough.

Reasons for:  Parsing everything out of the output and letting the  
user determine what's useful or not.

Reasons against:  Extra information may not be relevant to the format  
of the generated feature type?


-Andrew


On Apr 12, 2007, at 1:26 PM, Mark Johnson wrote:

>    I'd call that a buggy regexp.  Sounds like a good (but minimal)
> fix.  Torsten, I don't have cvs write access, I think you do, can you
> fix that up?  Andrew, can you file that as a bug:
>
> http://bugzilla.bioperl.org/
>
>    Everything else sounds like enhancements.  I'm not necessarily
> opposed, but a little discussion is probably in order before putting
> any tickets in for any of that.  Also, I'm not sure when I'll be able
> to spare some time to work on the module.  It was easy to justify
> spending time from my day job getting the module up to where is now,
> as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
> quite well for my purposes.  Again, I'm not opposed to further
> enhancements, but If I'm going to work on any of them, they'll have to
> fit into everything else I'm doing and it could be a while.  However,
> there's no reason somebody else can't do what I did.  Discuss the
> changes here, work out a plan, implement it, send along the diff(s)
> attached to a bug in bugzilla.  Next thing you know, your changes are
> in cvs.  8)
>
> On 4/11/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>> Andrew,
>>
>> >                 # Glimmer 3.X prediction
>> >                 (/\w+(\d+)\s+       # orf (numeric portion)
>> > ...isn't picking up more than the last digit in the orf-number.   
>> Not
>> > sure if that's intentional.  A sample of the feature output using -
>> >  >gff_string shows up as ...
>>
>> I think that regexp should be \w+?(\d+)
>>
>> ie. the \w+ should be non-greedy, otherwise it will swallow up all  
>> but
>> one of the following \d+ (as \d is a subset of \w)
>>
>> I've CC:ed this to Mark Johnson who made the recent changes to  
>> this module.
>>
>> Thanks for your feedback,
>>
>> --Torsten Seemann


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From johnsonm at gmail.com  Thu Apr 12 15:11:18 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 14:11:18 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
	<DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>
Message-ID: <ebf5eb170704121211s19062ac8hb9b510d440fcfe44@mail.gmail.com>

> So to reiterate, I'm suggesting that the module also parse out the frame and
> score information from Glimmer output.  I take back my suggestion of
> overriding the source / primary tags through the module as this can easily
> be done post-parser.  Other annotations can also be edited post-parser
> easily enough.

The reason the source tags are what they are for my addition(s) is
that the output from glimmer2/glimmer3 does not include a version
string.  You can figure out the major version from the output
formatting, but that's about it.  Also, being my first significant
contribution, I wasn't out to break new ground.  I did what some of
the other gene predictors seem to do, and what the existing code
already did.  Maybe there should be a method to pass in the exact
version if you know it.  Further than that, I think the Glimmer module
should stay consistent with what the other gene predictors do.  No
reason, though, that they couldn't *all* be enhanced similarly, if you
want to be able to further control the source tag.  8)

Part of the reason I didn't parse out the frame / score info for
either glimmer2 or glimmer3 was that I didn't need it.  The other part
being that my regexp kung-fu is nothing special.  This sounds like a
no-brainer to me.  Extend the regexps to capture it and tag it (and
the tests).

As far as the ORFs go, I guess you could just use
Bio::SeqFeature::Generic to represent them.  I haven't been keeping
track of the relevant feature/annotation interfaces, but maybe there
should be some kind of relation between the ORFs and predictions?

The glimmer3 detail file is a little trickier.  The least disruptive
thing to do, interface wise, might be to specify that as a seperate
input via an argument to the constructor.  Then you've got *two* input
files, and are going to have to override the automagic stuff that
expects one input file and takes care of it all.

As far as process, I just got on the list and started pestering
people, and they haven't thrown me off yet.  8)  I'm afraid that
you're going to find that while people are happy to discuss
implementation details, when it comes time to fire up the editor,
you're usually on your own, if it's an enhancement.

I'd love to work on Bioperl more, but so far, it's only been for what
I need for my job.

From spiros at lokku.com  Thu Apr 12 15:16:39 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 20:16:39 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
	<bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
Message-ID: <bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>

Hey guys,

I have added a link as per Chris's nice suggestion for keeping track
on whats going on regarding the migration:
http://www.bioperl.org/wiki/TestMoreProgress
There's also a link to this page from the project priority list.
However, adding our signature for each module etc , in my humble
opinion, seems tedious. May i suggest we just split up the list in
'starting letter sections' and each one does his part.
I volunteer to work on all tests starting with the letter R down to
the bottom of the list.

Let me know if this makes sense or not. I will work on
removing/flagging all the files that have already been migrated on
that list as well.

-spiros

On 4/12/07, Spiros Denaxas <spiros at lokku.com> wrote:
> Good idea Chris. Just got back home so will probably do it tomorrow
> morning or so.
>
> Spiros
>
> On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> > We should probably place something on the wiki to prevent overlaps
> > (i.e. make sure no two devs are working on the same tests).  I
> > planned on working on the G's last night but got bogged down.
> >
> > Spiros, if you haven't already go ahead and create a list on a wiki
> > page for tracking.  We can lay claim to them by tagging with our sigs
> > and cross them off once complete.
> >
> > chris
> >
> > On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
> >
> > > Yep! I have some rough stats I have at home, I will post them later on
> > > tonight. Roughly, if i remember correctly, 50% of the tests were still
> > > using Test, all the others were using Test::More.
> > >
> > > More to follow later on,
> > > Spiros
> > >
> > > On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> > >> It should be easy enough to find those t/*.t files that have "use
> > >> Test;"
> > >> or "require Test;" This should provide a list of files still
> > >> needing to
> > >> be converted over to Test::More. As discussed previously, it may
> > >> also be
> > >> useful to use Test::Exception to test for situations where
> > >> exceptions/warnings are thrown. If you add additional tests using
> > >> this
> > >> module, you should add the Test::Exception module to t/lib/
> > >>
> > >> Good luck, and feel free to mail the list with questions/comments
> > >> etc.
> > >>
> > >> Nath
> > >>
> > >>
> > >> Chris Fields wrote:
> > >> > At the moment we do not have a comprehensive list up on the
> > >> wiki.  I
> > >> > have been slowly working (alphabetically!) to switch them over, so
> > >> > any help would be appreciated.
> > >> >
> > >> > I have CC'd this to the main mail list for anyone else interested.
> > >> >
> > >> > chris
> > >> >
> > >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> > >> >
> > >> >
> > >> >> Hey guys,
> > >> >>
> > >> >> I noticed there's an open task regarding moving testing code to
> > >> use
> > >> >> Test::More etc and that Chris and Nathan are already on to it. Is
> > >> >> there any kind of wiki page that you keep track of which
> > >> modules you
> > >> >> are already working on? I am new to this and want to contribute,
> > >> >> having a fair amount of unit testing from work, but don't want
> > >> to step
> > >> >> over other people's work and avoid duplication as well.
> > >> >> Any pointers where i could get started would be much
> > >> appreciated :-)
> > >> >>
> > >> >> Thanks,
> > >> >> Spiros
> > >> >>
> > >> >> ps. apologies if this is not the correct list to post this, just
> > >> >> seemed the most intuitive choice.
> > >> >> _______________________________________________
> > >> >> Bioperl-guts-l mailing list
> > >> >> Bioperl-guts-l at lists.open-bio.org
> > >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> > >> >>
> > >> >
> > >> > Christopher Fields
> > >> > Postdoctoral Researcher
> > >> > Lab of Dr. Robert Switzer
> > >> > Dept of Biochemistry
> > >> > University of Illinois Urbana-Champaign
> > >> >
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > Bioperl-l mailing list
> > >> > Bioperl-l at lists.open-bio.org
> > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >> >
> > >>
> > >>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> >
>

From marian.thieme at lycos.de  Wed Apr 11 12:02:14 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Apr 2007 16:02:14 +0000
Subject: [Bioperl-l] Affys ReseqChip
Message-ID: <188661178017404@lycos-europe.com>

An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/bc2eb3aa/attachment-0001.html 

From johnsonm at gmail.com  Thu Apr 12 15:35:35 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 14:35:35 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
Message-ID: <ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>

Looks like MediaWiki has some built in functionality:

    http://meta.wikimedia.org/wiki/Anti-spam_Features
    http://www.mediawiki.org/wiki/Extension:ConfirmEdit

I'm not sure I'd call what they're doing spam, more like vandalism,
but either way, I don't see the point (though I only looked at a
couple examples via Recent Changes).

If they're indeed bots, maybe it's time to enable Captchas? Depending
on who they are and what their goals are, that may get rid of them
completely or just slow them down.

On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> I'm posting this to the mail list in case anyone has any ideas on
> what is going on...
>
> I have noticed an odd (read: annoying) rash of spam on the wiki.
> Jason also ran some spam reversions, so maybe he can chime in.
> Essentially it looks like the responsible spambots 'correct' the wiki
> text and links, so that '+' is being removed and URI-encoded symbols
> in links are reverted to symbols.  Unfortunately the removal occurs
> in all text, so places where '+' is intended (for instance, raw text
> for showing example record formats) are also changed.  My guess is
> we'll need to block the IP address or add to the blacklist if possible.
>
> Between Jason and I we have blocked ~9 spambots and counting.
> Couldn't find anything via Google yet...
>
> chris

From cjfields at uiuc.edu  Thu Apr 12 15:44:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 14:44:28 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
Message-ID: <BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>


On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:

> Looks like MediaWiki has some built in functionality:
>
>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>
> I'm not sure I'd call what they're doing spam, more like vandalism,
> but either way, I don't see the point (though I only looked at a
> couple examples via Recent Changes).
>
> If they're indeed bots, maybe it's time to enable Captchas? Depending
> on who they are and what their goals are, that may get rid of them
> completely or just slow them down.

Already done; Mauricio installed ConfirmEdit yesterday after a bit of  
off-list discussion (thanks again Mauricio!).

If you create a new account you'll encounter a simple captcha (it  
isn't configured for each edit yet).  We may implement confirmations  
per edit or install picture captchas at a later point, dep. on how  
well the current system works.

We may start granting anyone interested in maintaining the wiki sysop  
privs which makes handling spam easier.  If so we'll probably  
announce something along those lines here first.

chris


From cjfields at uiuc.edu  Thu Apr 12 15:48:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 14:48:41 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
	<bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
	<bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>
Message-ID: <3B4500DD-CAB6-4FD6-ABF9-A0160981F7E3@uiuc.edu>

Sounds good!  I'll finish up the P's (halfway through now...) and  
move on to other things; got plenty to do, believe me!

Appreciate all the help, Spiros!

chris

On Apr 12, 2007, at 2:16 PM, Spiros Denaxas wrote:

> Hey guys,
>
> I have added a link as per Chris's nice suggestion for keeping track
> on whats going on regarding the migration:
> http://www.bioperl.org/wiki/TestMoreProgress
> There's also a link to this page from the project priority list.
> However, adding our signature for each module etc , in my humble
> opinion, seems tedious. May i suggest we just split up the list in
> 'starting letter sections' and each one does his part.
> I volunteer to work on all tests starting with the letter R down to
> the bottom of the list.
>
> Let me know if this makes sense or not. I will work on
> removing/flagging all the files that have already been migrated on
> that list as well.
>
> -spiros
>
> On 4/12/07, Spiros Denaxas <spiros at lokku.com> wrote:
>> Good idea Chris. Just got back home so will probably do it tomorrow
>> morning or so.
>>
>> Spiros
>>
>> On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> We should probably place something on the wiki to prevent overlaps
>>> (i.e. make sure no two devs are working on the same tests).  I
>>> planned on working on the G's last night but got bogged down.
>>>
>>> Spiros, if you haven't already go ahead and create a list on a wiki
>>> page for tracking.  We can lay claim to them by tagging with our  
>>> sigs
>>> and cross them off once complete.
>>>
>>> chris
>>>
>>> On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
>>>
>>>> Yep! I have some rough stats I have at home, I will post them  
>>>> later on
>>>> tonight. Roughly, if i remember correctly, 50% of the tests were  
>>>> still
>>>> using Test, all the others were using Test::More.
>>>>
>>>> More to follow later on,
>>>> Spiros
>>>>
>>>> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
>>>>> It should be easy enough to find those t/*.t files that have "use
>>>>> Test;"
>>>>> or "require Test;" This should provide a list of files still
>>>>> needing to
>>>>> be converted over to Test::More. As discussed previously, it may
>>>>> also be
>>>>> useful to use Test::Exception to test for situations where
>>>>> exceptions/warnings are thrown. If you add additional tests using
>>>>> this
>>>>> module, you should add the Test::Exception module to t/lib/
>>>>>
>>>>> Good luck, and feel free to mail the list with questions/comments
>>>>> etc.
>>>>>
>>>>> Nath
>>>>>
>>>>>
>>>>> Chris Fields wrote:
>>>>>> At the moment we do not have a comprehensive list up on the
>>>>> wiki.  I
>>>>>> have been slowly working (alphabetically!) to switch them  
>>>>>> over, so
>>>>>> any help would be appreciated.
>>>>>>
>>>>>> I have CC'd this to the main mail list for anyone else  
>>>>>> interested.
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>>>>>>
>>>>>>
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> I noticed there's an open task regarding moving testing code to
>>>>> use
>>>>>>> Test::More etc and that Chris and Nathan are already on to  
>>>>>>> it. Is
>>>>>>> there any kind of wiki page that you keep track of which
>>>>> modules you
>>>>>>> are already working on? I am new to this and want to contribute,
>>>>>>> having a fair amount of unit testing from work, but don't want
>>>>> to step
>>>>>>> over other people's work and avoid duplication as well.
>>>>>>> Any pointers where i could get started would be much
>>>>> appreciated :-)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Spiros
>>>>>>>
>>>>>>> ps. apologies if this is not the correct list to post this, just
>>>>>>> seemed the most intuitive choice.
>>>>>>> _______________________________________________
>>>>>>> Bioperl-guts-l mailing list
>>>>>>> Bioperl-guts-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>>>>>>>
>>>>>>
>>>>>> Christopher Fields
>>>>>> Postdoctoral Researcher
>>>>>> Lab of Dr. Robert Switzer
>>>>>> Dept of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>
>>>>>
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From spiros at lokku.com  Thu Apr 12 16:19:18 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 21:19:18 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
Message-ID: <bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>

Nice idea, i saw it a bit before. However, any chance of implementing
white lists with regular and/or trusted users to skip it each time we
add something to the wiki ?

Spiros

On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>
> > Looks like MediaWiki has some built in functionality:
> >
> >    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >
> > I'm not sure I'd call what they're doing spam, more like vandalism,
> > but either way, I don't see the point (though I only looked at a
> > couple examples via Recent Changes).
> >
> > If they're indeed bots, maybe it's time to enable Captchas? Depending
> > on who they are and what their goals are, that may get rid of them
> > completely or just slow them down.
>
> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> off-list discussion (thanks again Mauricio!).
>
> If you create a new account you'll encounter a simple captcha (it
> isn't configured for each edit yet).  We may implement confirmations
> per edit or install picture captchas at a later point, dep. on how
> well the current system works.
>
> We may start granting anyone interested in maintaining the wiki sysop
> privs which makes handling spam easier.  If so we'll probably
> announce something along those lines here first.
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From Jonathan_Epstein at nih.gov  Thu Apr 12 16:22:40 2007
From: Jonathan_Epstein at nih.gov (Jonathan Epstein)
Date: Thu, 12 Apr 2007 16:22:40 -0400
Subject: [Bioperl-l] Affys ReseqChip
In-Reply-To: <188661178017404@lycos-europe.com>
References: <188661178017404@lycos-europe.com>
Message-ID: <6.2.3.4.2.20070412161407.04a38b60@mail.nih.gov>

This sounds great to me.

Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this.  But I suggest that you proceed full-speed-ahead, and we can sort this out in the future.

Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach.

Jonathan


At 12:02 PM 4/11/2007, Marian Thieme wrote:
>Hi,
>
>I am working on a piece of software, which is aimed to analyse the outcome of Affymetrix DNA Resequencing Arrays. (In particular Mitochip V2). The main goal of the software is to take into account for the redundant fragments. The software is able to align the redundant fragments to the entire sequence and in particular to call bases which arent called by the entire sequence and to detect insertions/deletion, depending on the design of the redundant frags.
>
>I would be glad to distribute the software to the bioperl package or otherwise, if anybody is interested I can give the code and/or further develop some features.
>
>Marian

Jonathan Epstein                                Jonathan_Epstein at nih.gov
Head, Unit on Biologic Computation              (301)402-4563
Office of the Scientific Director               Bldg 31, Room 2A47
Nat. Inst. of Child Health & Human Development  31 Center Drive
National Institutes of Health                   Bethesda, MD 20892  

From spiros at lokku.com  Thu Apr 12 17:35:43 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 22:35:43 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EA4FA.8010504@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
	<461EA4FA.8010504@campus.iztacala.unam.mx>
Message-ID: <bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>

Mauricio, thanks for your response. I actually edited a page several
times today and i got the captcha. More specifically, it was displayed
because "the page i edited contained external links" which is true
since i included a {{CPAN}} link.

Spiros

On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> The chance of having white lists exists but as far as I tested last
> night, the captcha is working only at the Create Account pages, not at
> the time of applying changes to wiki content (I tested as a regular user
> and not as a wiki admin).
>
> The idea at this moment is only to block automated methods for account
> creation (bots). Registered users who haven't been blocked and/or have
> confirmed their email wouldn't be bothered while adding/changing wiki
> content.
>
> Regards,
> Mauricio.
>
> Spiros Denaxas wrote:
> > Nice idea, i saw it a bit before. However, any chance of implementing
> > white lists with regular and/or trusted users to skip it each time we
> > add something to the wiki ?
> >
> > Spiros
> >
> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
> >>
> >>> Looks like MediaWiki has some built in functionality:
> >>>
> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >>>
> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
> >>> but either way, I don't see the point (though I only looked at a
> >>> couple examples via Recent Changes).
> >>>
> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
> >>> on who they are and what their goals are, that may get rid of them
> >>> completely or just slow them down.
> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> >> off-list discussion (thanks again Mauricio!).
> >>
> >> If you create a new account you'll encounter a simple captcha (it
> >> isn't configured for each edit yet).  We may implement confirmations
> >> per edit or install picture captchas at a later point, dep. on how
> >> well the current system works.
> >>
> >> We may start granting anyone interested in maintaining the wiki sysop
> >> privs which makes handling spam easier.  If so we'll probably
> >> announce something along those lines here first.
> >>
> >> chris
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>


From arareko at campus.iztacala.unam.mx  Thu Apr 12 17:30:34 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 12 Apr 2007 16:30:34 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
Message-ID: <461EA4FA.8010504@campus.iztacala.unam.mx>

The chance of having white lists exists but as far as I tested last 
night, the captcha is working only at the Create Account pages, not at 
the time of applying changes to wiki content (I tested as a regular user 
and not as a wiki admin).

The idea at this moment is only to block automated methods for account 
creation (bots). Registered users who haven't been blocked and/or have 
confirmed their email wouldn't be bothered while adding/changing wiki 
content.

Regards,
Mauricio.

Spiros Denaxas wrote:
> Nice idea, i saw it a bit before. However, any chance of implementing
> white lists with regular and/or trusted users to skip it each time we
> add something to the wiki ?
> 
> Spiros
> 
> On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>>
>>> Looks like MediaWiki has some built in functionality:
>>>
>>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>>>
>>> I'm not sure I'd call what they're doing spam, more like vandalism,
>>> but either way, I don't see the point (though I only looked at a
>>> couple examples via Recent Changes).
>>>
>>> If they're indeed bots, maybe it's time to enable Captchas? Depending
>>> on who they are and what their goals are, that may get rid of them
>>> completely or just slow them down.
>> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
>> off-list discussion (thanks again Mauricio!).
>>
>> If you create a new account you'll encounter a simple captcha (it
>> isn't configured for each edit yet).  We may implement confirmations
>> per edit or install picture captchas at a later point, dep. on how
>> well the current system works.
>>
>> We may start granting anyone interested in maintaining the wiki sysop
>> privs which makes handling spam easier.  If so we'll probably
>> announce something along those lines here first.
>>
>> chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From arareko at campus.iztacala.unam.mx  Thu Apr 12 17:53:51 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 12 Apr 2007 16:53:51 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>	
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>	
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
Message-ID: <461EAA6F.1090805@campus.iztacala.unam.mx>

I've reconfigured the extension to display captchas exclusively for 
account creation and disabled it when adding URLs in pages. Don't know 
why this didn't happened to me while testing last night...

Please try do it again to see if the change works. Thanks for pointing 
this out Spiros :)

Mauricio.

Spiros Denaxas wrote:
> Mauricio, thanks for your response. I actually edited a page several
> times today and i got the captcha. More specifically, it was displayed
> because "the page i edited contained external links" which is true
> since i included a {{CPAN}} link.
> 
> Spiros
> 
> On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> 
> wrote:
>> The chance of having white lists exists but as far as I tested last
>> night, the captcha is working only at the Create Account pages, not at
>> the time of applying changes to wiki content (I tested as a regular user
>> and not as a wiki admin).
>>
>> The idea at this moment is only to block automated methods for account
>> creation (bots). Registered users who haven't been blocked and/or have
>> confirmed their email wouldn't be bothered while adding/changing wiki
>> content.
>>
>> Regards,
>> Mauricio.
>>
>> Spiros Denaxas wrote:
>> > Nice idea, i saw it a bit before. However, any chance of implementing
>> > white lists with regular and/or trusted users to skip it each time we
>> > add something to the wiki ?
>> >
>> > Spiros
>> >
>> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>> >>
>> >>> Looks like MediaWiki has some built in functionality:
>> >>>
>> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>> >>>
>> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
>> >>> but either way, I don't see the point (though I only looked at a
>> >>> couple examples via Recent Changes).
>> >>>
>> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
>> >>> on who they are and what their goals are, that may get rid of them
>> >>> completely or just slow them down.
>> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
>> >> off-list discussion (thanks again Mauricio!).
>> >>
>> >> If you create a new account you'll encounter a simple captcha (it
>> >> isn't configured for each edit yet).  We may implement confirmations
>> >> per edit or install picture captchas at a later point, dep. on how
>> >> well the current system works.
>> >>
>> >> We may start granting anyone interested in maintaining the wiki sysop
>> >> privs which makes handling spam easier.  If so we'll probably
>> >> announce something along those lines here first.
>> >>
>> >> chris
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>> -- 
>> MAURICIO HERRERA CUADRA
>> arareko at campus.iztacala.unam.mx
>> Laboratorio de Gen?tica
>> Unidad de Morfofisiolog?a y Funci?n
>> Facultad de Estudios Superiores Iztacala, UNAM
>>
>>
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From spiros at lokku.com  Thu Apr 12 18:11:46 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 23:11:46 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
	<461EAA6F.1090805@campus.iztacala.unam.mx>
Message-ID: <bba689ec0704121511y135f0da0j26d520a11dd3ffa1@mail.gmail.com>

You're welcome Mauricio. Its all cool now, works without the captcha
for internal edits. Thanks for changing it over :-)

-spiros

On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> I've reconfigured the extension to display captchas exclusively for
> account creation and disabled it when adding URLs in pages. Don't know
> why this didn't happened to me while testing last night...
>
> Please try do it again to see if the change works. Thanks for pointing
> this out Spiros :)
>
> Mauricio.
>
> Spiros Denaxas wrote:
> > Mauricio, thanks for your response. I actually edited a page several
> > times today and i got the captcha. More specifically, it was displayed
> > because "the page i edited contained external links" which is true
> > since i included a {{CPAN}} link.
> >
> > Spiros
> >
> > On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx>
> > wrote:
> >> The chance of having white lists exists but as far as I tested last
> >> night, the captcha is working only at the Create Account pages, not at
> >> the time of applying changes to wiki content (I tested as a regular user
> >> and not as a wiki admin).
> >>
> >> The idea at this moment is only to block automated methods for account
> >> creation (bots). Registered users who haven't been blocked and/or have
> >> confirmed their email wouldn't be bothered while adding/changing wiki
> >> content.
> >>
> >> Regards,
> >> Mauricio.
> >>
> >> Spiros Denaxas wrote:
> >> > Nice idea, i saw it a bit before. However, any chance of implementing
> >> > white lists with regular and/or trusted users to skip it each time we
> >> > add something to the wiki ?
> >> >
> >> > Spiros
> >> >
> >> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
> >> >>
> >> >>> Looks like MediaWiki has some built in functionality:
> >> >>>
> >> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >> >>>
> >> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
> >> >>> but either way, I don't see the point (though I only looked at a
> >> >>> couple examples via Recent Changes).
> >> >>>
> >> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
> >> >>> on who they are and what their goals are, that may get rid of them
> >> >>> completely or just slow them down.
> >> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> >> >> off-list discussion (thanks again Mauricio!).
> >> >>
> >> >> If you create a new account you'll encounter a simple captcha (it
> >> >> isn't configured for each edit yet).  We may implement confirmations
> >> >> per edit or install picture captchas at a later point, dep. on how
> >> >> well the current system works.
> >> >>
> >> >> We may start granting anyone interested in maintaining the wiki sysop
> >> >> privs which makes handling spam easier.  If so we'll probably
> >> >> announce something along those lines here first.
> >> >>
> >> >> chris
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Bioperl-l mailing list
> >> >> Bioperl-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >>
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >> --
> >> MAURICIO HERRERA CUADRA
> >> arareko at campus.iztacala.unam.mx
> >> Laboratorio de Gen?tica
> >> Unidad de Morfofisiolog?a y Funci?n
> >> Facultad de Estudios Superiores Iztacala, UNAM
> >>
> >>
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>


From cjfields at uiuc.edu  Thu Apr 12 18:02:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 17:02:51 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>	
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>	
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
	<461EAA6F.1090805@campus.iztacala.unam.mx>
Message-ID: <E1139262-84C3-4282-8E9D-643BF91A3656@uiuc.edu>

You disabled yourself as sysop last night, IIRC.  Don't know; could  
be what Spiros suggested, eg. adding external links trips it.

chris

On Apr 12, 2007, at 4:53 PM, Mauricio Herrera Cuadra wrote:

> I've reconfigured the extension to display captchas exclusively for  
> account creation and disabled it when adding URLs in pages. Don't  
> know why this didn't happened to me while testing last night...
>
> Please try do it again to see if the change works. Thanks for  
> pointing this out Spiros :)
>
> Mauricio.
>
> Spiros Denaxas wrote:
>> Mauricio, thanks for your response. I actually edited a page several
>> times today and i got the captcha. More specifically, it was  
>> displayed
>> because "the page i edited contained external links" which is true
>> since i included a {{CPAN}} link.
>> Spiros
>> On 4/12/07, Mauricio Herrera Cuadra  
>> <arareko at campus.iztacala.unam.mx> wrote:
>>> The chance of having white lists exists but as far as I tested last
>>> night, the captcha is working only at the Create Account pages,  
>>> not at
>>> the time of applying changes to wiki content (I tested as a  
>>> regular user
>>> and not as a wiki admin).
>>>
>>> The idea at this moment is only to block automated methods for  
>>> account
>>> creation (bots). Registered users who haven't been blocked and/or  
>>> have
>>> confirmed their email wouldn't be bothered while adding/changing  
>>> wiki
>>> content.
>>>
>>> Regards,
>>> Mauricio.
>>>
>>> Spiros Denaxas wrote:
>>> > Nice idea, i saw it a bit before. However, any chance of  
>>> implementing
>>> > white lists with regular and/or trusted users to skip it each  
>>> time we
>>> > add something to the wiki ?
>>> >
>>> > Spiros
>>> >
>>> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>>> >>
>>> >>> Looks like MediaWiki has some built in functionality:
>>> >>>
>>> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>>> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>>> >>>
>>> >>> I'm not sure I'd call what they're doing spam, more like  
>>> vandalism,
>>> >>> but either way, I don't see the point (though I only looked at a
>>> >>> couple examples via Recent Changes).
>>> >>>
>>> >>> If they're indeed bots, maybe it's time to enable Captchas?  
>>> Depending
>>> >>> on who they are and what their goals are, that may get rid of  
>>> them
>>> >>> completely or just slow them down.
>>> >> Already done; Mauricio installed ConfirmEdit yesterday after a  
>>> bit of
>>> >> off-list discussion (thanks again Mauricio!).
>>> >>
>>> >> If you create a new account you'll encounter a simple captcha (it
>>> >> isn't configured for each edit yet).  We may implement  
>>> confirmations
>>> >> per edit or install picture captchas at a later point, dep. on  
>>> how
>>> >> well the current system works.
>>> >>
>>> >> We may start granting anyone interested in maintaining the  
>>> wiki sysop
>>> >> privs which makes handling spam easier.  If so we'll probably
>>> >> announce something along those lines here first.
>>> >>
>>> >> chris
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Bioperl-l mailing list
>>> >> Bioperl-l at lists.open-bio.org
>>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >>
>>> > _______________________________________________
>>> > Bioperl-l mailing list
>>> > Bioperl-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >
>>>
>>> -- 
>>> MAURICIO HERRERA CUADRA
>>> arareko at campus.iztacala.unam.mx
>>> Laboratorio de Gen?tica
>>> Unidad de Morfofisiolog?a y Funci?n
>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>
>>>
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Apr 13 04:30:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 13 Apr 2007 09:30:50 +0100
Subject: [Bioperl-l] GenericHit->start/end needs tiled hsps?
Message-ID: <461F3FBA.2010101@sendu.me.uk>

Hi all,

I want to double-check my thinking regarding 
Bio::Search::Hit::GenericHit->start() and end(). Right now the docs 
claim that hsps of the hit object must be tiled before the answer can be 
produced. The code is implemented in that way 
(Bio::Search::SearchUtils::tile_hsps($self)).

Yet as far as I can see, all you need to do is loop through all hsps and 
pick out the smallest start and largest end respectively in terms of 
subject and query.

This comes up because I have a blast report where a single hit contains 
over 80000 hsps and the tiling takes over an hour (I gave up on it, 
don't know how long it really takes). The simple loop through hsps takes 
seconds or less.

Now in this situation the answer isn't especially useful (to me). An 
alternative way of fixing the problem would be to re-write the tiling 
algorithm (again) to somehow make it hundreds of times faster, then 
provide some way in start() and end() for the user to request the start 
and end of the best contig, or other contig of choice. Easier said than 
done though!


What do people think?

From marian.thieme at lycos.de  Fri Apr 13 06:12:51 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Fri, 13 Apr 2007 10:12:51 +0000
Subject: [Bioperl-l] Affys ReseqChip
Message-ID: <18866117804894@lycos-europe.com>

Hi,

To provide a better understanding of the matter and to assess the approach I will shortly present 
1.) the problem and 2.) my approach.


1.)
given: fragments (string of certain length) with description of location within some reference sequence. For instance:

- redundant fragment: acgtnna--gcta (deletion: pos12, pos13)
- start position: 5
- end position: 17
- and some suited reference sequence

Fragments are assumed to be mappable 1:1 to reference sequence and can contain gaps and n's, the latter indicates that the base wasnt determined maybe because of failed hybridization or something like this.
Thus we dont need to cope with insertions/deletions in terms of only parsing an array design file (description of all insertions and deletions in each redundant fragment) and according to that description inserting gaps in the reference sequence and in the fragments if required.
So from my point of view and in the case of the affy mitochip v2 we only need to process the description file rather than calculating an alignment via dynamic programming matrix.


2.)
My current approach is like the following 5 steps:

1.) input reference sequence and redundant fragments into SeqIO object.

2.) calculate a hash with all insertions defined by length and position and
3.) insert the longest insertion of each position in the appropriate fragments and in the reference sequence. And hence insert as many gaps as given by

length(max_insertion(position_x))-length(insertion(fragment_y, position _x))

to each fragment/reference sequence.
(This is done by iterating over each sequence in the SeqIO and insert gaps according to insertion hash) and

4.) Create SimpleAlign object with LocatableSeq objects

5.) Afterwards we can do some statistical analysis and calc some consensus base for each column in the SimpleAlignment. (I use a Statistics module from cpan).

Unfortunatly I didnt manage to find some method that is giving me the set of bases (column) for a given position in the alignment (did I overlooked something ? is SimpleAlign not appropriate? ), so I iterate for each position (base) of the reference sequence and for each fragments which covers that particular position.


Marian


Jonathan Epstein schrieb:

> This sounds great to me.
>
> Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this.  But I suggest that you proceed full-speed-ahead, and we can sort this out in the future.
>
> Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach.
>
> Jonathan

Stelle Deine Fragen bei Lycos iQ -  http://iq.lycos.de/qa/ask/

From thiago.venancio at gmail.com  Fri Apr 13 15:05:12 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Fri, 13 Apr 2007 16:05:12 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
Message-ID: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>

Hi all.

What is the best way to extract coding region from a nucleotide sequence
based on a BLASTX or TBLASTX comparisons ?

Thanks in advance.

Thiago
-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================

From jason at bioperl.org  Fri Apr 13 16:05:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:05:42 -0700
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
Message-ID: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>

Depends on how far away the query protein is, but I don't trust BLAST  
for the actual alignment.  Find the boundaries, add a little slop,  
and refine the alignment of protein to genome with a good alignment  
program designed to like genewise or exonerate or even FASTX/Y.

-jason
On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:

> Hi all.
>
> What is the best way to extract coding region from a nucleotide  
> sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
> -- 
> "The way to get started is to quit talking and begin doing."
>       Walt Disney
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From jason at bioperl.org  Fri Apr 13 16:13:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:13:07 -0700
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
Message-ID: <7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>

I think it just needs an edit the code in the to_string which checks  
for the type of algorithm.  You'd need to add to the if/elsif cascade  
and add something for the RPSBLAST type and codes the query and  
target dbs and query and target sequence types properly.  This would  
be very trivial to code in, have you tried adding this to see if it  
works?

if you submit a bug with and example report we'd be able to make  
appropriate changes faster.

-jason
On Apr 11, 2007, at 6:32 AM, Emeric Sevin wrote:

> Hi everybody,
>
> I'm sorry to bug, but either I missed something so obvious nobody  
> bothered to answer, either I'm being a little boycotted here...
> A little help would be very much appreciated
>
> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>
>> Hello,
>>
>> I am new to this community, and apologize if this subject has been  
>> posted before.
>>
>> I want to print out only selected results from a multiple blast- 
>> alignments results file. Problem is, the algorithm used is  
>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the  
>> actual writing task yields "unclean" warnings. Although an ouput  
>> is actually written, the writer  
>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by  
>> the fact rpsblast DBs are not labeled with  
>> "protein"/"nucleic"/"translated".
>> Does anybody know of an easy fix to that bug, or of another way to  
>> come around it?
>>
>> Thank you very much
>>
>> Emeric SEVIN
>> Universit? de Rennes 1_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From thiago.venancio at gmail.com  Fri Apr 13 16:20:32 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Fri, 13 Apr 2007 17:20:32 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
Message-ID: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>

Thanks Jason.

I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
comparisons and want to extract some translated coding regions for further
multiple aligmnent and phylogenetic analysis.

Best.

Thiago

On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Depends on how far away the query protein is, but I don't trust BLAST for
> the actual alignment.  Find the boundaries, add a little slop, and refine
> the alignment of protein to genome with a good alignment program designed to
> like genewise or exonerate or even FASTX/Y.
> -jason
> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>
> Hi all.
>
> What is the best way to extract coding region from a nucleotide sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>

From jason at bioperl.org  Fri Apr 13 16:47:50 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:47:50 -0700
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
	<44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
Message-ID: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>

Hi -

There are some tools that do this for you -- I've listed a few from a  
google search or from what I remember reading.  It would be great If  
you (and others!) are willing to contribute a little of the info of  
what you find that works for you to the wiki, that would be great as  
well.   A little HOWTO would be cool - here or on openwetware.org.

Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml
EST-PAC:  doi: http://dx.doi.org/10.1186/1751-0473-1-2

Ewan Birney's estwise as part of wise package also can help if you  
have a likely protein from BLAST you want to align to the est -  
estwise can handle frameshifts, but can be too slow for some people.   
Exonerate's protein2dna model may also work here, but I haven't tried  
it.

-jason
On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote:

> Thanks Jason.
>
> I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
> comparisons and want to extract some translated coding regions for  
> further
> multiple aligmnent and phylogenetic analysis.
>
> Best.
>
> Thiago
>
> On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> Depends on how far away the query protein is, but I don't trust  
>> BLAST for
>> the actual alignment.  Find the boundaries, add a little slop, and  
>> refine
>> the alignment of protein to genome with a good alignment program  
>> designed to
>> like genewise or exonerate or even FASTX/Y.
>> -jason
>> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>>
>> Hi all.
>>
>> What is the best way to extract coding region from a nucleotide  
>> sequence
>> based on a BLASTX or TBLASTX comparisons ?
>>
>> Thanks in advance.
>>
>> Thiago
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>> http://jason.open-bio.org/
>>
>>
>>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From gopu_36 at yahoo.com  Fri Apr 13 12:48:48 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Fri, 13 Apr 2007 09:48:48 -0700 (PDT)
Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score
Message-ID: <9982570.post@talk.nabble.com>


Can anyone help me to collect the value of the second best hit score
(ie)raw_score from the blast results which contains multiple queries? I have
used searchIO object to parse my blast report. I am only interested in the
second best hit/raw_score and not the first hit!

Thanks in advance!


-- 
View this message in context: http://www.nabble.com/How-to-parse-blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Sat Apr 14 13:53:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 14 Apr 2007 10:53:42 -0700
Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score
In-Reply-To: <9982570.post@talk.nabble.com>
References: <9982570.post@talk.nabble.com>
Message-ID: <67974DCD-B1F9-4286-86A4-5E4C4DBA3914@bioperl.org>

Try reading the HOWTO.

http://bioperl.org/wiki/HOWTO:SearchIO

On Apr 13, 2007, at 9:48 AM, gopu_36 wrote:

>
> Can anyone help me to collect the value of the second best hit score
> (ie)raw_score from the blast results which contains multiple  
> queries? I have
> used searchIO object to parse my blast report. I am only interested  
> in the
> second best hit/raw_score and not the first hit!
>
> Thanks in advance!
>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-parse- 
> blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment.bin 

From gdorjee at hotmail.com  Sat Apr 14 17:39:50 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
Message-ID: <9997343.post@talk.nabble.com>


hi all, 
can anyone please tell me why the following script gives me error like:
waiting... 5 units of time
Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.
cheers!

use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], 
                              -format => 'fasta');
my $query = $Seq_in->next_seq(); 

my $factory = Bio::Tools::Run::RemoteBlast->new(
                                '-prog'  => 'blastp',
                                '-data' => 'swissprot',
                                 _READMETHOD => "Blast"
                         );
my $blast_report = $factory->submit_blast($query);
my $max_number = 100;
my $trial = 0;


while ( my @rids = $factory->each_rid ) {

    print STDERR "\nSorry, maximum number of retries $max_number exceeded\n"
if $trial >= $max_number;
    last if $trial >= $max_number;
    $trial++;

    print STDERR "waiting... ".(5*$trial)." units of time\n" ;

    # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
    foreach my $rid ( @rids ) {
        my $rc = $factory->retrieve_blast($rid);
       if( !ref($rc) ) {
           if( $rc < 0 ) {
                # retrieve_blast returns -1 on error
               $factory->remove_rid($rid);
            }
            # retrieve_blast returns 0 on 'job not finished'
           sleep 5*$trial;
        } else {

            #---- Blast done ----
            $factory->remove_rid($rid);
            my $result = $rc->next_result;
            print "database: ", $result->database_name(), "\n";
            while( my $hit = $result->next_hit ) {
                print "hit name is: ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                    print "score is: ", $hsp->score, "\n";
                }          }
        }
    }
} 
-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Sat Apr 14 17:39:50 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
Message-ID: <9997343.post@talk.nabble.com>


hi all, 
can anyone please tell me why and how can i fix the following script, which
gives me an error like:
waiting... 5 units of time
Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.
cheers!

use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], 
                              -format => 'fasta');
my $query = $Seq_in->next_seq(); 

my $factory = Bio::Tools::Run::RemoteBlast->new(
                                '-prog'  => 'blastp',
                                '-data' => 'swissprot',
                                 _READMETHOD => "Blast"
                         );
my $blast_report = $factory->submit_blast($query);
my $max_number = 100;
my $trial = 0;


while ( my @rids = $factory->each_rid ) {

    print STDERR "\nSorry, maximum number of retries $max_number exceeded\n"
if $trial >= $max_number;
    last if $trial >= $max_number;
    $trial++;

    print STDERR "waiting... ".(5*$trial)." units of time\n" ;

    # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
    foreach my $rid ( @rids ) {
        my $rc = $factory->retrieve_blast($rid);
       if( !ref($rc) ) {
           if( $rc < 0 ) {
                # retrieve_blast returns -1 on error
               $factory->remove_rid($rid);
            }
            # retrieve_blast returns 0 on 'job not finished'
           sleep 5*$trial;
        } else {

            #---- Blast done ----
            $factory->remove_rid($rid);
            my $result = $rc->next_result;
            print "database: ", $result->database_name(), "\n";
            while( my $hit = $result->next_hit ) {
                print "hit name is: ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                    print "score is: ", $hsp->score, "\n";
                }          }
        }
    }
} 
-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From dmessina at wustl.edu  Sun Apr 15 12:02:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 15 Apr 2007 11:02:51 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <9997343.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
Message-ID: <ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>

Hi DeeGee,

Your script worked fine for me. Perhaps the problem is in your input  
fasta file?

Dave

% perl test.pl AAC12660.fa
waiting... 5 units of time
waiting... 10 units of time
waiting... 15 units of time
database: Non-redundant SwissProt sequences
hit name is: sp|Q15750|TAB1_HUMAN
score is: 2413
hit name is: sp|Q8CF89|TAB1_MOUSE
score is: 2352
hit name is: sp|P49444|PP2C_PARTE
score is: 159
hit name is: sp|Q6ING9|PP2CK_XENLA
[...etc...]

From spiros at lokku.com  Sun Apr 15 12:12:05 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Sun, 15 Apr 2007 17:12:05 +0100
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
Message-ID: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>

Yep, it must be in the input file. The

$result->database_name()

function gets called on $result the result object.

The error you get,

Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.

means the result object is not defined thus the function fails since
there are no data to operate on.

Spiros

On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
> Hi DeeGee,
>
> Your script worked fine for me. Perhaps the problem is in your input
> fasta file?
>
> Dave
>
> % perl test.pl AAC12660.fa
> waiting... 5 units of time
> waiting... 10 units of time
> waiting... 15 units of time
> database: Non-redundant SwissProt sequences
> hit name is: sp|Q15750|TAB1_HUMAN
> score is: 2413
> hit name is: sp|Q8CF89|TAB1_MOUSE
> score is: 2352
> hit name is: sp|P49444|PP2C_PARTE
> score is: 159
> hit name is: sp|Q6ING9|PP2CK_XENLA
> [...etc...]
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From dr.hogart at gmail.com  Sun Apr 15 12:13:29 2007
From: dr.hogart at gmail.com (sergei ryazansky)
Date: Sun, 15 Apr 2007 20:13:29 +0400
Subject: [Bioperl-l] error with blast parsing by searchIO
Message-ID: <op.tqt10r17avnppr@hogart.img.ras.ru>

Hello all,

script (parsing blastn report) that previously had worked today "tell" me  
that:

------------- EXCEPTION  -------------
MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
: No such file or directory
STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
STACK toplevel parse-te-lib2.pl:3

--------------------------------------

What does it mean??

ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8


From cjfields at uiuc.edu  Sun Apr 15 13:40:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 15 Apr 2007 12:40:24 -0500
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqt10r17avnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
Message-ID: <460926E6-0EEA-45D9-838E-70706062857C@uiuc.edu>

You have to update to bioperl 1.5.2 or CVS.  BLAST parsing is broken  
for recent BLAST versions (> v.2.2, I believe).

chris

On Apr 15, 2007, at 11:13 AM, sergei ryazansky wrote:

> Hello all,
>
> script (parsing blastn report) that previously had worked today  
> "tell" me
> that:
>
> ------------- EXCEPTION  -------------
> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
> : No such file or directory
> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: 
> 273
> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
> STACK toplevel parse-te-lib2.pl:3
>
> --------------------------------------
>
> What does it mean??
>
> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Sun Apr 15 14:24:56 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 15 Apr 2007 11:24:56 -0700
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqt10r17avnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
Message-ID: <C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>

It looks like something is broken in your script as to how you are  
passing it a filename - it is trying to open a file called "BLASTN  
2.2.13 [Nov-27-2005]".
did you already open the file and are you passing data from the first  
line of the file to SearchIO perhaps?
Sending the relevant part of your script to the list will help us  
diagnose the problem better.

-jason
On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote:

> Hello all,
>
> script (parsing blastn report) that previously had worked today  
> "tell" me
> that:
>
> ------------- EXCEPTION  -------------
> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
> : No such file or directory
> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: 
> 273
> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
> STACK toplevel parse-te-lib2.pl:3
>
> --------------------------------------
>
> What does it mean??
>
> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
Url : http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment.bin 

From gdorjee at hotmail.com  Sun Apr 15 20:40:22 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
Message-ID: <10008507.post@talk.nabble.com>


hi guys,
thanks for your replies, but i still don't understand why it doesn't work.
my input fasta sequence looks fine. here, take a look,

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

is it possible that the script is not being about to read the
RemoteBlast.pm? but the thing is, i can run the standalone blast on the
command line, although i've never been able the run the same with cgi module
(by gettting the input from an html textarea). i don't understand. i've been
trying to get the standalone running for a while now, and i also mentioned
it in my previous postings....but all in vain. i haven't got over it yet. 
any help or an example would be much appreciated.


Spiros Denaxas wrote:
> 
> Yep, it must be in the input file. The
> 
> $result->database_name()
> 
> function gets called on $result the result object.
> 
> The error you get,
> 
> Can't call method "database_name" on an undefined value at
> test1_remote_swissblast.pl line 41, <GEN4> line 31.
> 
> means the result object is not defined thus the function fails since
> there are no data to operate on.
> 
> Spiros
> 
> On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
>> Hi DeeGee,
>>
>> Your script worked fine for me. Perhaps the problem is in your input
>> fasta file?
>>
>> Dave
>>
>> % perl test.pl AAC12660.fa
>> waiting... 5 units of time
>> waiting... 10 units of time
>> waiting... 15 units of time
>> database: Non-redundant SwissProt sequences
>> hit name is: sp|Q15750|TAB1_HUMAN
>> score is: 2413
>> hit name is: sp|Q8CF89|TAB1_MOUSE
>> score is: 2352
>> hit name is: sp|P49444|PP2C_PARTE
>> score is: 159
>> hit name is: sp|Q6ING9|PP2CK_XENLA
>> [...etc...]
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Sun Apr 15 20:40:22 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
Message-ID: <10008507.post@talk.nabble.com>


hi guys,
thanks for your replies, but i still don't understand why it doesn't work.
my input fasta sequence looks fine. here, take a look,

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

is it possible that the script is not being able to read the RemoteBlast.pm?
but the thing is, i can run the standalone blast on the command line,
although i've never been able the run the same with cgi module (by gettting
the input from an html textarea). i don't understand. i've been trying to
get the standalone running for a while now, and i also mentioned it in my
previous postings....but all in vain. i haven't got over it yet. 
any help or an example would be much appreciated.


Spiros Denaxas wrote:
> 
> Yep, it must be in the input file. The
> 
> $result->database_name()
> 
> function gets called on $result the result object.
> 
> The error you get,
> 
> Can't call method "database_name" on an undefined value at
> test1_remote_swissblast.pl line 41, <GEN4> line 31.
> 
> means the result object is not defined thus the function fails since
> there are no data to operate on.
> 
> Spiros
> 
> On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
>> Hi DeeGee,
>>
>> Your script worked fine for me. Perhaps the problem is in your input
>> fasta file?
>>
>> Dave
>>
>> % perl test.pl AAC12660.fa
>> waiting... 5 units of time
>> waiting... 10 units of time
>> waiting... 15 units of time
>> database: Non-redundant SwissProt sequences
>> hit name is: sp|Q15750|TAB1_HUMAN
>> score is: 2413
>> hit name is: sp|Q8CF89|TAB1_MOUSE
>> score is: 2352
>> hit name is: sp|P49444|PP2C_PARTE
>> score is: 159
>> hit name is: sp|Q6ING9|PP2CK_XENLA
>> [...etc...]
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From dmessina at wustl.edu  Sun Apr 15 22:43:06 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 15 Apr 2007 21:43:06 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10008507.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
Message-ID: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>

You're right, it's not the input sequence. I just tried it with your  
script and it worked.


> is it possible that the script is not being about to read the
> RemoteBlast.pm?

I think the program wouldn't compile if that were the case, and your  
error message would be about not finding RemoteBlast.pm rather than  
the one you got.


> but the thing is, i can run the standalone blast on the
> command line, although i've never been able the run the same with  
> cgi module
> (by gettting the input from an html textarea). i don't understand.

This result really suggests that perl and Bioperl are not the issue.  
I'm not saying the following to give you the brushoff, but given the  
numerous ways in which web-based apps can fail and in which  
webservers can be installed, it might be best for you to find someone  
at your institution who can sit down with you and work through it.

Dave


From cjfields at uiuc.edu  Sun Apr 15 23:51:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 15 Apr 2007 22:51:05 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
Message-ID: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>

This sounds like a similar issue that popped up a few weeks ago  
related to URLAPI changes for remote BLAST access.  That was fixed on  
NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
well.

Saying that, my guess is the same as Dave's, that there are  
connectivity issues.  What happens when you set the RemoteBlast  
factory to a verbosity of 1?  This will spill out debugging output  
from the repeated queries to the NCBI server (so if there are  
problems they'll show up there).

...
my $factory = Bio::Tools::Run::RemoteBlast->new(
                                 '-prog'  => 'blastp',
                                 '-data' => 'swissprot',
                                  _READMETHOD => "Blast",
                                  -verbose => 1    # debugging output
                          );
...

If you see the BLAST report but get the same error try using the  
RemoteBlast in CVS to see if it fixes the problem.

chris


On Apr 15, 2007, at 9:43 PM, David Messina wrote:

> You're right, it's not the input sequence. I just tried it with your
> script and it worked.
>
>
>> is it possible that the script is not being about to read the
>> RemoteBlast.pm?
>
> I think the program wouldn't compile if that were the case, and your
> error message would be about not finding RemoteBlast.pm rather than
> the one you got.
>
>
>> but the thing is, i can run the standalone blast on the
>> command line, although i've never been able the run the same with
>> cgi module
>> (by gettting the input from an html textarea). i don't understand.
>
> This result really suggests that perl and Bioperl are not the issue.
> I'm not saying the following to give you the brushoff, but given the
> numerous ways in which web-based apps can fail and in which
> webservers can be installed, it might be best for you to find someone
> at your institution who can sit down with you and work through it.
>
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dr.hogart at gmail.com  Mon Apr 16 03:03:46 2007
From: dr.hogart at gmail.com (sergei ryazansky)
Date: Mon, 16 Apr 2007 11:03:46 +0400
Subject: [Bioperl-l] error with blast parsing by searchIO
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
	<C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>
Message-ID: <op.tqu68kvzavnppr@hogart.img.ras.ru>

The problem was resolved by the direct path (-file=>'d\...\input.txt') to  
input file in the my script.
I think that Chris right and i should update my bioperl to 1.5 version.
By the way, bioperl-1.5 is not accessible via ppm. Where I can download it  
for winXP?

On Sun, 15 Apr 2007 22:24:56 +0400, Jason Stajich <jason at bioperl.org>  
wrote:

> It looks like something is broken in your script as to how you are
> passing it a filename - it is trying to open a file called "BLASTN
> 2.2.13 [Nov-27-2005]".
> did you already open the file and are you passing data from the first
> line of the file to SearchIO perhaps?
> Sending the relevant part of your script to the list will help us
> diagnose the problem better.
>
> -jason
> On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote:
>
>> Hello all,
>>
>> script (parsing blastn report) that previously had worked today
>> "tell" me
>> that:
>>
>> ------------- EXCEPTION  -------------
>> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
>> : No such file or directory
>> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm:
>> 273
>> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
>> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
>> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
>> STACK toplevel parse-te-lib2.pl:3
>>
>> --------------------------------------
>>
>> What does it mean??
>>
>> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>


-- 
?????????? M2, ????????????? ???????? ?????????? Opera:  
http://www.opera.com/mail/mail/


From bix at sendu.me.uk  Mon Apr 16 04:34:56 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 16 Apr 2007 09:34:56 +0100
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqu68kvzavnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>	<C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>
	<op.tqu68kvzavnppr@hogart.img.ras.ru>
Message-ID: <46233530.1010109@sendu.me.uk>

sergei ryazansky wrote:
> The problem was resolved by the direct path (-file=>'d\...\input.txt') to  
> input file in the my script.
> I think that Chris right and i should update my bioperl to 1.5 version.
> By the way, bioperl-1.5 is not accessible via ppm. Where I can download it  
> for winXP?

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 10:36:33 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 22:36:33 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>


Dear all,
 
Given a GO id, is there a way to extract all
the related gene names from that id with Perl?
 
Anybody has experience with that?
I've looked through GO module in CPAN, but can't seem
to find any tool that facilitated that searc
 
Look forward very much for your advice.
 
--
Edward WIJAYA
SINGAPORE

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From spiros at lokku.com  Mon Apr 16 11:10:49 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 16 Apr 2007 16:10:49 +0100
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>

Hi Edward,

What organism are you interested in? I have some code from my PhD
based on the Saccharomyces cerevisiae genome. Basically uses the SGD
flat files and a local MySQL instance of GO. Might be worth turning
into modules if people are interested in it, although it is pretty
organism oriented and the lack of abstraction might introduce a number
of problems.

Spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 11:14:09 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 23:14:09 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>


Hi Spiros,
 
Thanks for your reply. I am interested to apply it for 
all the kind of organisms related to that particular GO ID.
 
Do you have a CPAN module for that?
--
Edward WIJAYA
SINGAPORE

________________________________

From: s.denaxas at gmail.com on behalf of Spiros Denaxas
Sent: Mon 4/16/2007 11:10 PM
To: Wijaya Edward
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


Hi Edward,

What organism are you interested in? I have some code from my PhD
based on the Saccharomyces cerevisiae genome. Basically uses the SGD
flat files and a local MySQL instance of GO. Might be worth turning
into modules if people are interested in it, although it is pretty
organism oriented and the lack of abstraction might introduce a number
of problems.

Spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From dmessina at wustl.edu  Mon Apr 16 11:21:01 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 16 Apr 2007 10:21:01 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>

I use BioMART for this kind of thing. If you need to do this for more  
than a couple of GO terms, BioMART has a Perl API you can use to  
connect to their data.

http://www.biomart.org/

http://www.biomart.org/install-overview.html

Dave

From spiros at lokku.com  Mon Apr 16 11:21:40 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 16 Apr 2007 16:21:40 +0100
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
Message-ID: <bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>

Nope, I don't have a CPAN module for it, and to be honest, I don't
think I will release one for it until I actually finish my PhD. The
code is really scruffy at some parts, lacks documentation and might
not work under all setups. My plan is to take some time after and
clean it up and release a proper version of it to the public.

What you are talking about however, if I understand correctly, is a
much much bigger project. Different genome databases have different
formats and a potential module must take them all into consideration.
Then the issue of the different evidence codes GO annotators use
throughout different genomes and which you consider of higher or lower
quality respective.

If you happen to stumble upon such a module, please share it, it would
be very interesting !

spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> >
> > Dear all,
> >
> > Given a GO id, is there a way to extract all
> > the related gene names from that id with Perl?
> >
> > Anybody has experience with that?
> > I've looked through GO module in CPAN, but can't seem
> > to find any tool that facilitated that searc
> >
> > Look forward very much for your advice.
> >
> > --
> > Edward WIJAYA
> > SINGAPORE
> >
> > ------------ Institute For Infocomm Research - Disclaimer -------------
> > This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> > --------------------------------------------------------
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
>

From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 11:33:27 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 23:33:27 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>


Hi David, 
 
There seems to be no biomart-perl module in CPAN.
 
I tried their cvs:
cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login
 
But require password. Can suggest if there is another way to get this module?
 
--
Edward WIJAYA

________________________________

From: David Messina [mailto:dmessina at wustl.edu]
Sent: Mon 4/16/2007 11:21 PM
To: Wijaya Edward
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


I use BioMART for this kind of thing. If you need to do this for more 
than a couple of GO terms, BioMART has a Perl API you can use to 
connect to their data.

http://www.biomart.org/

http://www.biomart.org/install-overview.html

Dave


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From Kevin.M.Brown at asu.edu  Mon Apr 16 11:44:28 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 16 Apr 2007 08:44:28 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net><BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
	<3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
Message-ID: <1A4207F8295607498283FE9E93B775B4030A4914@EX02.asurite.ad.asu.edu>

Did you follow the directions as listed at?

http://www.biomart.org/install-overview.html 


> There seems to be no biomart-perl module in CPAN.
>  
> I tried their cvs:
> cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login
>  
> But require password. Can suggest if there is another way to 
> get this module?


From dmessina at wustl.edu  Mon Apr 16 11:44:26 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 16 Apr 2007 10:44:26 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
	<3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
Message-ID: <2D698B2E-49B9-411E-B1FA-C12F4A235EB2@wustl.edu>

The password you need to enter when asked is CVSUSER.

Dave

From sdavis2 at mail.nih.gov  Mon Apr 16 11:55:14 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 16 Apr 2007 11:55:14 -0400
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
	<bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>
Message-ID: <200704161155.14567.sdavis2@mail.nih.gov>


> > On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> > > Dear all,
> > >
> > > Given a GO id, is there a way to extract all
> > > the related gene names from that id with Perl?

This is a pretty simple problem if you have the data in a useable format.  The 
data that you need are available here:

ftp://ftp.ncbi.nih.gov/gene/DATA

The README file gives details, but the files in this directory are all 
tab-delimited text.  Download the gene2go.gz file, which contains a mapping 
from Entrez Gene ID to GO accession.  Then, download the gene_info.gz file, 
which contains the information about the Entrez Gene ID, including 
description, gene symbol, etc.  If you need to link to other data, you can of 
course download the respective files from NCBI.  You can either load the data 
into a SQL database of some type for general queries, or you can simply read 
them into perl directly (with appropriate data structures) to do you mapping.  
Since they are tab-delimited text, I would choose the database route and then 
use SQL and DBI to do the queries you like.

Sean

From cjfields at uiuc.edu  Mon Apr 16 12:25:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 11:25:42 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>

You can limit EntrezGene searches by Gene Ontology ID using the [Gene  
Ontology] field in queries.  The following query:

'9220[Gene Ontology]'

will give 120 gene IDs.  You can get the same list using the still- 
under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm  
still working on this):

my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                        -db => 'gene',
                                        -term => '9220[Gene Ontology]',
                                        -retmax => 300);
$esearch->get_response;
my @ids = $esearch->get_ids;
print join "\n", at ids;

In my opinion, Sean's idea of using SQL is probably better if you  
have tons of searches to do.

chris

On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:

>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Apr 16 14:34:25 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 13:34:25 -0500
Subject: [Bioperl-l] Bio::Matrix::PSM::ProtPsm
Message-ID: <CA820306-7480-478D-BD3E-A0F094943065@uiuc.edu>

I was going through tests converting to Test::More and found this  
module is largely unimplemented (relevant tests are in t/ProtPsm.t in  
CVS).  It was written by James Thompson a few years ago and the  
module docs seem to indicate some uncertainty on what this class is  
meant to accomplish.  Does anyone know the status of this code?

chris


From cjm at fruitfly.org  Mon Apr 16 14:49:23 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 11:49:23 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
Message-ID: <AAF82F3A-3C75-4D51-AFD4-FDE358391A03@fruitfly.org>


Download:
http://search.cpan.org/~cmungall/go-db-perl

or do:

cpan GO::AppHandle

The API call you want is here:
http://search.cpan.org/~cmungall/go-db-perl/GO/ 
AppHandle.pm#get_deep_products

Here is an example snippet:

   use GO::AppHandle;
   my $apph=GO::AppHandle->connect(@ARGV);
   my $go_acc = shift @ARGV;
   my $gps = $apph->get_deep_products({term=>{acc=>$go_acc}});
   foreach my $gp (@$gps) {
     printf "%s %s\n", $gp->xref->acc, $gp->symbol;
   }

You will need to download the GO Database.

Cheers
Chris

On Apr 16, 2007, at 8:14 AM, Wijaya Edward wrote:

>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO)  
> with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer  
>> -------------
>> This email is confidential and may be privileged.  If you are not  
>> the intended recipient, please delete it and notify us  
>> immediately. Please do not copy or use it for any purpose, or  
>> disclose its contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gdorjee at hotmail.com  Mon Apr 16 15:10:01 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:10:01 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
Message-ID: <10022463.post@talk.nabble.com>


hi Chris,
thanks for your reply. i set the RemoteBlast factory to a verbosity of 1,
and i get the same error message. i'm new to all these. so, could you plz
tell me how can i do the RemoteBlast in CVS that you've suggested.

cheers!!!
 

Chris Fields wrote:
> 
> This sounds like a similar issue that popped up a few weeks ago  
> related to URLAPI changes for remote BLAST access.  That was fixed on  
> NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
> well.
> 
> Saying that, my guess is the same as Dave's, that there are  
> connectivity issues.  What happens when you set the RemoteBlast  
> factory to a verbosity of 1?  This will spill out debugging output  
> from the repeated queries to the NCBI server (so if there are  
> problems they'll show up there).
> 
> ...
> my $factory = Bio::Tools::Run::RemoteBlast->new(
>                                  '-prog'  => 'blastp',
>                                  '-data' => 'swissprot',
>                                   _READMETHOD => "Blast",
>                                   -verbose => 1    # debugging output
>                           );
> ...
> 
> If you see the BLAST report but get the same error try using the  
> RemoteBlast in CVS to see if it fixes the problem.
> 
> chris
> 
> 
> On Apr 15, 2007, at 9:43 PM, David Messina wrote:
> 
>> You're right, it's not the input sequence. I just tried it with your
>> script and it worked.
>>
>>
>>> is it possible that the script is not being about to read the
>>> RemoteBlast.pm?
>>
>> I think the program wouldn't compile if that were the case, and your
>> error message would be about not finding RemoteBlast.pm rather than
>> the one you got.
>>
>>
>>> but the thing is, i can run the standalone blast on the
>>> command line, although i've never been able the run the same with
>>> cgi module
>>> (by gettting the input from an html textarea). i don't understand.
>>
>> This result really suggests that perl and Bioperl are not the issue.
>> I'm not saying the following to give you the brushoff, but given the
>> numerous ways in which web-based apps can fail and in which
>> webservers can be installed, it might be best for you to find someone
>> at your institution who can sit down with you and work through it.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022463
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Mon Apr 16 15:11:18 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:11:18 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
Message-ID: <10022464.post@talk.nabble.com>


Thank you, David.


David Messina-2 wrote:
> 
> You're right, it's not the input sequence. I just tried it with your  
> script and it worked.
> 
> 
>> is it possible that the script is not being about to read the
>> RemoteBlast.pm?
> 
> I think the program wouldn't compile if that were the case, and your  
> error message would be about not finding RemoteBlast.pm rather than  
> the one you got.
> 
> 
>> but the thing is, i can run the standalone blast on the
>> command line, although i've never been able the run the same with  
>> cgi module
>> (by gettting the input from an html textarea). i don't understand.
> 
> This result really suggests that perl and Bioperl are not the issue.  
> I'm not saying the following to give you the brushoff, but given the  
> numerous ways in which web-based apps can fail and in which  
> webservers can be installed, it might be best for you to find someone  
> at your institution who can sit down with you and work through it.
> 
> Dave
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022464
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjm at fruitfly.org  Mon Apr 16 14:41:59 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 11:41:59 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
Message-ID: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>


Unless the Entrez interface has changed since I last looked, the  
query below for "pyrimidine ribonucleotide biosynthetic process" will  
NOT perform the transitive closure over the graph; this means genes  
and gene products annotated to GO:0009174 "pyrimidine ribonucleoside  
monophosphate biosynthetic process", for example

On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:

> You can limit EntrezGene searches by Gene Ontology ID using the [Gene
> Ontology] field in queries.  The following query:
>
> '9220[Gene Ontology]'
>
> will give 120 gene IDs.  You can get the same list using the still-
> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
> still working on this):
>
> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                         -db => 'gene',
>                                         -term => '9220[Gene  
> Ontology]',
>                                         -retmax => 300);
> $esearch->get_response;
> my @ids = $esearch->get_ids;
> print join "\n", at ids;
>
> In my opinion, Sean's idea of using SQL is probably better if you
> have tons of searches to do.
>
> chris
>
> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer
>> -------------
>> This email is confidential and may be privileged.  If you are not
>> the intended recipient, please delete it and notify us immediately.
>> Please do not copy or use it for any purpose, or disclose its
>> contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Mon Apr 16 15:25:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 14:25:14 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
	<50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
Message-ID: <3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>

You are correct; it explains why the list is only 120 genes.  The  
only way (currently) to do so would be to perform the closure locally  
somehow (maybe via go-perl or similar).

chris

On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:

>
> Unless the Entrez interface has changed since I last looked, the  
> query below for "pyrimidine ribonucleotide biosynthetic process"  
> will NOT perform the transitive closure over the graph; this means  
> genes and gene products annotated to GO:0009174 "pyrimidine  
> ribonucleoside monophosphate biosynthetic process", for example
>
> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>
>> You can limit EntrezGene searches by Gene Ontology ID using the [Gene
>> Ontology] field in queries.  The following query:
>>
>> '9220[Gene Ontology]'
>>
>> will give 120 gene IDs.  You can get the same list using the still-
>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>> still working on this):
>>
>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>                                         -db => 'gene',
>>                                         -term => '9220[Gene  
>> Ontology]',
>>                                         -retmax => 300);
>> $esearch->get_response;
>> my @ids = $esearch->get_ids;
>> print join "\n", at ids;
>>
>> In my opinion, Sean's idea of using SQL is probably better if you
>> have tons of searches to do.
>>
>> chris
>>
>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>
>>>
>>> Dear all,
>>>
>>> Given a GO id, is there a way to extract all
>>> the related gene names from that id with Perl?
>>>
>>> Anybody has experience with that?
>>> I've looked through GO module in CPAN, but can't seem
>>> to find any tool that facilitated that searc
>>>
>>> Look forward very much for your advice.
>>>
>>> --
>>> Edward WIJAYA
>>> SINGAPORE
>>>
>>> ------------ Institute For Infocomm Research - Disclaimer
>>> -------------
>>> This email is confidential and may be privileged.  If you are not
>>> the intended recipient, please delete it and notify us immediately.
>>> Please do not copy or use it for any purpose, or disclose its
>>> contents to any other person. Thank you.
>>> --------------------------------------------------------
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr 16 15:27:32 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:27:32 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
Message-ID: <10022661.post@talk.nabble.com>


hi Chris, 
sorry to bother you again, but could you plz check the following script to
see what's wrong. i've been getting errors like :

Content-type: text/html
Software error:
------------- EXCEPTION  -------------
MSG:   (0) not Bio::Seq object or array of Bio::Seq objects or file name!
STACK Bio::Tools::Run::StandAloneBlast::blastall
/usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:532
STACK toplevel /usr/local/apache2/htdocs/rmtest.pl:46
--------------------------------------

#### the script ######
#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::SeqIO;
use Bio::SearchIO;
use Bio::DB::GenPept; 
use Bio::Tools::Run::StandAloneBlast;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

my $cgi = new CGI;

print $cgi->header,
$cgi->start_html(-title=>'A StandAloneBlast Test'),
$cgi->h1('Blast Result'),
$cgi->start_form,
"Enter or paste an amino-acid sequence? ",
$cgi->p,
$cgi->textarea(-name=>'name', rows=>10, -columns=>60),
$cgi->p,
$cgi->submit,
$cgi->end_form,
$cgi->hr;

open(OUTPUT,">result/query.faa");

if ($cgi->param()) {
        my $seq = $cgi->param('name');
        print OUTPUT $seq;

my @params = ('program'=>'blastp', 'database' =>
'/export/home/dorjee/database/nrpart', 'outfile' => 'result/blast.out',
_READMETHOD => 'Blast');
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

# Blast a sequence against a database:
my $str = Bio::SeqIO->new(-file => "result/query.faa", '-format' => 'Fasta'
);
my $input = $str->next_seq();
my $blast_report = $factory->blastall($input);
}


Chris Fields wrote:
> 
> This sounds like a similar issue that popped up a few weeks ago  
> related to URLAPI changes for remote BLAST access.  That was fixed on  
> NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
> well.
> 
> Saying that, my guess is the same as Dave's, that there are  
> connectivity issues.  What happens when you set the RemoteBlast  
> factory to a verbosity of 1?  This will spill out debugging output  
> from the repeated queries to the NCBI server (so if there are  
> problems they'll show up there).
> 
> ...
> my $factory = Bio::Tools::Run::RemoteBlast->new(
>                                  '-prog'  => 'blastp',
>                                  '-data' => 'swissprot',
>                                   _READMETHOD => "Blast",
>                                   -verbose => 1    # debugging output
>                           );
> ...
> 
> If you see the BLAST report but get the same error try using the  
> RemoteBlast in CVS to see if it fixes the problem.
> 
> chris
> 
> 
> On Apr 15, 2007, at 9:43 PM, David Messina wrote:
> 
>> You're right, it's not the input sequence. I just tried it with your
>> script and it worked.
>>
>>
>>> is it possible that the script is not being about to read the
>>> RemoteBlast.pm?
>>
>> I think the program wouldn't compile if that were the case, and your
>> error message would be about not finding RemoteBlast.pm rather than
>> the one you got.
>>
>>
>>> but the thing is, i can run the standalone blast on the
>>> command line, although i've never been able the run the same with
>>> cgi module
>>> (by gettting the input from an html textarea). i don't understand.
>>
>> This result really suggests that perl and Bioperl are not the issue.
>> I'm not saying the following to give you the brushoff, but given the
>> numerous ways in which web-based apps can fail and in which
>> webservers can be installed, it might be best for you to find someone
>> at your institution who can sit down with you and work through it.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022661
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Apr 16 15:37:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 14:37:58 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10022463.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
Message-ID: <5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>

The 'verbose' setting doesn't change the way the BLAST query is sent,  
it just sends the raw output from the repeated attempts to retrieve  
the report (using the RID) to STDERR.  The error you saw won't be  
fixed by doing so.

What I was interested in was the raw HTML output dumped to the  
screen.  If it is querying the NCBI server it should dump stuff that  
includes something like this:

...
<HTML>
<p></p>
<!--
QBlastInfoBegin
         Status=WAITING
QBlastInfoEnd
--><p></p>
<SCRIPT LANGUAGE="JavaScript"><!--
...

which indicates you have a request in the BLAST queue.  If you aren't  
seeing anything then the problem is likely network-related on your  
end, so getting the latest RemoteBlast won't help.  Do any other  
BioPerl modules requiring network access work (Bio::DB::GenBank, for  
instance)?  If not it could be a proxy issue...

Just in case, here's the browsable CVS location for RemoteBlast:

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
Tools/Run/RemoteBlast.pm?cvsroot=bioperl

Click on the download link and save over your local version.

chris

On Apr 16, 2007, at 2:10 PM, DeeGee wrote:

>
> hi Chris,
> thanks for your reply. i set the RemoteBlast factory to a verbosity  
> of 1,
> and i get the same error message. i'm new to all these. so, could  
> you plz
> tell me how can i do the RemoteBlast in CVS that you've suggested.
>
> cheers!!!


From gdorjee at hotmail.com  Mon Apr 16 16:42:37 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 13:42:37 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
Message-ID: <10024333.post@talk.nabble.com>


hi 
i tried the following code just to check the network, and it worked fine
except for the SwissProt part, for which i got the error message instead of
the sequence:

------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq
/usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
STACK toplevel bbbbb.pl:21
--------------------------------------

#### check #####
#!/usr/bin/perl -w
use strict;
use Bio::DB::GenBank;
use Bio::DB::SwissProt;
use Bio::DB::GenPept;
use Bio::SeqIO;

my $genpeptdb = new Bio::DB::GenPept();
my $genbankdb = new Bio::DB::GenBank();
my $swissdb = new Bio::DB::SwissProt();

my $seqio = new Bio::SeqIO(-format => 'fasta',
                           -fh     => \*STDOUT);

my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
$seqio->write_seq($protseq);

my $seq = $genbankdb->get_Seq_by_acc('AF303112');
$seqio->write_seq($seq);

$protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
$seqio->write_seq($protseq);

thanks a lot.


Chris Fields wrote:
> 
> The 'verbose' setting doesn't change the way the BLAST query is sent,  
> it just sends the raw output from the repeated attempts to retrieve  
> the report (using the RID) to STDERR.  The error you saw won't be  
> fixed by doing so.
> 
> What I was interested in was the raw HTML output dumped to the  
> screen.  If it is querying the NCBI server it should dump stuff that  
> includes something like this:
> 
> ...
> <HTML>
> <p></p>
> <!--
> QBlastInfoBegin
>          Status=WAITING
> QBlastInfoEnd
> --><p></p>
> <SCRIPT LANGUAGE="JavaScript"><!--
> ...
> 
> which indicates you have a request in the BLAST queue.  If you aren't  
> seeing anything then the problem is likely network-related on your  
> end, so getting the latest RemoteBlast won't help.  Do any other  
> BioPerl modules requiring network access work (Bio::DB::GenBank, for  
> instance)?  If not it could be a proxy issue...
> 
> Just in case, here's the browsable CVS location for RemoteBlast:
> 
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
> 
> Click on the download link and save over your local version.
> 
> chris
> 
> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
> 
>>
>> hi Chris,
>> thanks for your reply. i set the RemoteBlast factory to a verbosity  
>> of 1,
>> and i get the same error message. i'm new to all these. so, could  
>> you plz
>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>
>> cheers!!!
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10024333
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Apr 16 18:24:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 17:24:11 -0500
Subject: [Bioperl-l] HOWTO:Writing BioPerl Tests
Message-ID: <547A30CD-6BAA-4C08-A935-9975634691B2@uiuc.edu>

I have posted a quickie HOWTO on writing up BioPerl tests using  
Test::More.  If anyone wants to add to it feel free (make sure to  
credit yourself in the authors section).

http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

There is space in there if we decide to add more modules for  
enhancing tests (I think Nathan suggested Test::Exception or similar).

chris

From cjfields at uiuc.edu  Mon Apr 16 19:24:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 18:24:32 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10024333.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
	<10024333.post@talk.nabble.com>
Message-ID: <54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>

What version of bioperl are you using?  I get an error but it is b/c  
the ID doesn't exist.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: acc KPYK_ECOLI does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /Users/cjfields/src/bioperl- 
live/Bio/DB/WebDBSeqI.pm:181
STACK: genpept.pl:21
-----------------------------------------------------------

The actual accession is 'KPYK1_ECOLI'.

chris

On Apr 16, 2007, at 3:42 PM, DeeGee wrote:

>
> hi
> i tried the following code just to check the network, and it worked  
> fine
> except for the SwissProt part, for which i got the error message  
> instead of
> the sequence:
>
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq
> /usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
> STACK toplevel bbbbb.pl:21
> --------------------------------------
>
> #### check #####
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::GenBank;
> use Bio::DB::SwissProt;
> use Bio::DB::GenPept;
> use Bio::SeqIO;
>
> my $genpeptdb = new Bio::DB::GenPept();
> my $genbankdb = new Bio::DB::GenBank();
> my $swissdb = new Bio::DB::SwissProt();
>
> my $seqio = new Bio::SeqIO(-format => 'fasta',
>                            -fh     => \*STDOUT);
>
> my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
> $seqio->write_seq($protseq);
>
> my $seq = $genbankdb->get_Seq_by_acc('AF303112');
> $seqio->write_seq($seq);
>
> $protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
> $seqio->write_seq($protseq);
>
> thanks a lot.
>
>
> Chris Fields wrote:
>>
>> The 'verbose' setting doesn't change the way the BLAST query is sent,
>> it just sends the raw output from the repeated attempts to retrieve
>> the report (using the RID) to STDERR.  The error you saw won't be
>> fixed by doing so.
>>
>> What I was interested in was the raw HTML output dumped to the
>> screen.  If it is querying the NCBI server it should dump stuff that
>> includes something like this:
>>
>> ...
>> <HTML>
>> <p></p>
>> <!--
>> QBlastInfoBegin
>>          Status=WAITING
>> QBlastInfoEnd
>> --><p></p>
>> <SCRIPT LANGUAGE="JavaScript"><!--
>> ...
>>
>> which indicates you have a request in the BLAST queue.  If you aren't
>> seeing anything then the problem is likely network-related on your
>> end, so getting the latest RemoteBlast won't help.  Do any other
>> BioPerl modules requiring network access work (Bio::DB::GenBank, for
>> instance)?  If not it could be a proxy issue...
>>
>> Just in case, here's the browsable CVS location for RemoteBlast:
>>
>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
>>
>> Click on the download link and save over your local version.
>>
>> chris
>>
>> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
>>
>>>
>>> hi Chris,
>>> thanks for your reply. i set the RemoteBlast factory to a verbosity
>>> of 1,
>>> and i get the same error message. i'm new to all these. so, could
>>> you plz
>>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>>
>>> cheers!!!
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/error-while- 
> remote-blast-against-swissprot-db-tf3577674.html#a10024333
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjm at fruitfly.org  Mon Apr 16 20:59:46 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 17:59:46 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
	<50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
	<3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>
Message-ID: <9612F3E7-F239-49C1-A5BE-E10FF2FC2063@fruitfly.org>


You could perform the closure locally and then iterate over the  
individual IDs or construct a big disjunctive query to Entrez -  
either way it's not so efficient, especially for less specific nodes  
(distributed queries with ontologies is an interesting challenge).

Soon you'll be able to do the same query over the GO Database / AmiGO  
using a REST API

On Apr 16, 2007, at 12:25 PM, Chris Fields wrote:

> You are correct; it explains why the list is only 120 genes.  The
> only way (currently) to do so would be to perform the closure locally
> somehow (maybe via go-perl or similar).
>
> chris
>
> On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:
>
>>
>> Unless the Entrez interface has changed since I last looked, the
>> query below for "pyrimidine ribonucleotide biosynthetic process"
>> will NOT perform the transitive closure over the graph; this means
>> genes and gene products annotated to GO:0009174 "pyrimidine
>> ribonucleoside monophosphate biosynthetic process", for example
>>
>> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>>
>>> You can limit EntrezGene searches by Gene Ontology ID using the  
>>> [Gene
>>> Ontology] field in queries.  The following query:
>>>
>>> '9220[Gene Ontology]'
>>>
>>> will give 120 gene IDs.  You can get the same list using the still-
>>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>>> still working on this):
>>>
>>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                         -db => 'gene',
>>>                                         -term => '9220[Gene
>>> Ontology]',
>>>                                         -retmax => 300);
>>> $esearch->get_response;
>>> my @ids = $esearch->get_ids;
>>> print join "\n", at ids;
>>>
>>> In my opinion, Sean's idea of using SQL is probably better if you
>>> have tons of searches to do.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> Given a GO id, is there a way to extract all
>>>> the related gene names from that id with Perl?
>>>>
>>>> Anybody has experience with that?
>>>> I've looked through GO module in CPAN, but can't seem
>>>> to find any tool that facilitated that searc
>>>>
>>>> Look forward very much for your advice.
>>>>
>>>> --
>>>> Edward WIJAYA
>>>> SINGAPORE
>>>>
>>>> ------------ Institute For Infocomm Research - Disclaimer
>>>> -------------
>>>> This email is confidential and may be privileged.  If you are not
>>>> the intended recipient, please delete it and notify us immediately.
>>>> Please do not copy or use it for any purpose, or disclose its
>>>> contents to any other person. Thank you.
>>>> --------------------------------------------------------
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 23:51:18 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Tue, 17 Apr 2007 11:51:18 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu><50A1CCF2-4650-4F87-8386-DB0
	E87292023@fruitfly.org><3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu><9612
	F3E7-F239-49C1-A5BE-E10FF2FC2063@fruitfly.org>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061686@mailbe01.teak.local.net>


Thanks so much for all the suggestion.
It was really helpful to me. 
 
--
Edward WIJAYA

________________________________

From: Chris Mungall [mailto:cjm at fruitfly.org]
Sent: Tue 4/17/2007 8:59 AM
To: Chris Fields
Cc: bioperl-l at lists.open-bio.org; Wijaya Edward
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


You could perform the closure locally and then iterate over the 
individual IDs or construct a big disjunctive query to Entrez - 
either way it's not so efficient, especially for less specific nodes 
(distributed queries with ontologies is an interesting challenge).

Soon you'll be able to do the same query over the GO Database / AmiGO 
using a REST API

On Apr 16, 2007, at 12:25 PM, Chris Fields wrote:

> You are correct; it explains why the list is only 120 genes.  The
> only way (currently) to do so would be to perform the closure locally
> somehow (maybe via go-perl or similar).
>
> chris
>
> On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:
>
>>
>> Unless the Entrez interface has changed since I last looked, the
>> query below for "pyrimidine ribonucleotide biosynthetic process"
>> will NOT perform the transitive closure over the graph; this means
>> genes and gene products annotated to GO:0009174 "pyrimidine
>> ribonucleoside monophosphate biosynthetic process", for example
>>
>> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>>
>>> You can limit EntrezGene searches by Gene Ontology ID using the 
>>> [Gene
>>> Ontology] field in queries.  The following query:
>>>
>>> '9220[Gene Ontology]'
>>>
>>> will give 120 gene IDs.  You can get the same list using the still-
>>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>>> still working on this):
>>>
>>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                         -db => 'gene',
>>>                                         -term => '9220[Gene
>>> Ontology]',
>>>                                         -retmax => 300);
>>> $esearch->get_response;
>>> my @ids = $esearch->get_ids;
>>> print join "\n", at ids;
>>>
>>> In my opinion, Sean's idea of using SQL is probably better if you
>>> have tons of searches to do.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> Given a GO id, is there a way to extract all
>>>> the related gene names from that id with Perl?
>>>>
>>>> Anybody has experience with that?
>>>> I've looked through GO module in CPAN, but can't seem
>>>> to find any tool that facilitated that searc
>>>>
>>>> Look forward very much for your advice.
>>>>
>>>> --
>>>> Edward WIJAYA
>>>> SINGAPORE
>>>>
>>>> ------------ Institute For Infocomm Research - Disclaimer
>>>> -------------
>>>> This email is confidential and may be privileged.  If you are not
>>>> the intended recipient, please delete it and notify us immediately.
>>>> Please do not copy or use it for any purpose, or disclose its
>>>> contents to any other person. Thank you.
>>>> --------------------------------------------------------
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From hlapp at gmx.net  Tue Apr 17 00:00:55 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 17 Apr 2007 00:00:55 -0400
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>

Hi Leighton, please see below:

On Apr 16, 2007, at 11:55 AM, Leighton Pritchard wrote:

> Hi,
>
> I've been trying to upload the GO into a clean BioSQL (MySQL, 1.4.1)
> schema using the BioPerl bp_load_ontology.pl script, with the OBOv1.0,
> OBOv1.2, and the most recent flatfiles from
> http://www.geneontology.org/GO.downloads.ontology.shtml - none of my
> attempts have been successful.  The errors below are from a Linux
> installation, but the same errors are thrown on OS X, too.  I am using
> the most recent versions of BioPerl and bioperl-db, installed via  
> CPAN:
>
> [lpritc at lplinuxdev sequence_data]$ perl -MBio::Root::Version -e 'print
> $Bio::Root::Version::VERSION,"\n"'
> 1.005002102
>
> and bioperl-db 1.5.2.
>
> I have attached the traceback below (running with --safe throws a  
> number
> of equivalent errors),

Using --safe will throw the same errors, but will continue loading.  
I.e., you'd lose the one term, but keep everything else.

I do realize that especially for a graph losing an internal node can  
be quite detrimental.

> [...]
> ########
>
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format obo ~/Downloads/gene_ontology_edit.obo
> Loading ontology gene_ontology:
>         ... terms
>         ... relationships
>         Done with gene_ontology.
> Loading ontology biological_process:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------

This would point to a problem of the BioPerl obo parser. According to  
the message, both the database name and the accession of the db_xref  
for the term are - surely erroneously - empty. Apparently the parser  
fails to parse out database and accession for this db_xref of term GO: 
0018901.

If you can edit the obo file, you can try deleting the db_xref(s) for  
that term that look odd (or delete all if you don't need them).

I'd have to debug the obo parser to see exactly where it's going  
wrong in parsing.

> Could not store term GO:0018901, name '2,4-dichlorophenoxyacetic acid
> metabolic process':
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> [...]
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format goflat --fmtargs ~/Downloads/GO.defs

Note that the argument for --fmtargs here should read
"-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
there is no tilde expansion.)

> ~/Downloads/function.ontology
> Loading ontology Gene Ontology:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("MetaCyc","2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RXN","0","")  
> FKs
> ()
> Duplicate entry '2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX- 
> MetaCyc-0' for
> key 2
> ---------------------------------------------------

This is one the things why you've got to love MySQL (and I am correct  
in inferring that you're using MySQL?). The width of the  
dbxref.accession column (for which the second value in parentheses  
is) is 40 chars. The apparently pre-existing value ("2\,3- 
DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX-MetaCyc-0") is 50 chars, which  
when loaded should have resulted in an exception. Instead, MySQL just  
simply and silently truncates it to 40 chars, which makes it  
identical to the first 40 chars of "2\,3-DIHYDROXYINDOLE-2\,3- 
DIOXYGENASE-RXN" (which is 41 chars in length).

It may be necessary to widen the length of dbname.accession here, for  
example to 80 chars? Let me know if you need help with the DDL  
command to do this.

Let me know how far this gets you.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lubapardo at gmail.com  Tue Apr 17 05:16:04 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 17 Apr 2007 10:16:04 +0100
Subject: [Bioperl-l] CVS AND PAML
Message-ID: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>

Dear all,
I have two questions.
1.) I am trying to download some modules from Bioperl-run via CVS but I can
not login.

$ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.

The error I get is: time out, failed to connect to the server. I have
no trouble to download other files and I installed bioperl modules via
CPAN and it works.

2) The second question I have is that I am using the PAML:CODEML
module to do phylogenetic analysis.

I have used the example provided in the HOWTO:PAML (also given as
example: pairwise_ka_ks.PL). The program does not crash but it returns
and empty object. I think the problem is in the last part of the
script because I manage to get sequences and also the alignment, but I
can not get any ka, ks value. I am not sure whether there is a bug in
the last part of the script.

Does anyone have an idea?

Thank you very much

Luba Pardo

$kaks_factory->alignment($dna_aln);

# run the KaKs analysis
my ($rc,$parser) = $kaks_factory->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();

my @otus = $result->get_seqs();
# this gives us a mapping from the PAML order of sequences back to
# the input order (since names get truncated)
my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
    my $c= 1;
    foreach my $s ( @each ) {
      last if( $s->display_id eq $_->display_id );
      $c++;
    }
    $c;
   } @otus;

print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
<http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
<http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
for( my $i = 0; $i < (scalar
<http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
$i++) {
  for( my $j = $i+1; $j < (scalar
<http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
    my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
    my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
    print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
join <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
$otus[$i]->display_id,
                         $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
                         $MLmatrix->[$i]->[$j]->{'dS'},
                         $MLmatrix->[$i]->[$j]->{'omega'},
                         sprintf
<http://www.perldoc.com/perl5.6/pod/func/sprintf.html>("%.2f",$sub_aa_aln->percentage_identity),
                         sprintf
<http://www.perldoc.com/perl5.6/pod/func/sprintf.html>("%.2f",$sub_dna_aln->percentage_identity),
                         ), "\n";
  }
}

From avilella at gmail.com  Tue Apr 17 05:25:40 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 17 Apr 2007 10:25:40 +0100
Subject: [Bioperl-l] CVS AND PAML
In-Reply-To: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
References: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
Message-ID: <358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>

hmmm, there are some perldoc links around your code snippet. can you post
the code again? what is the input data you are trying this with?

On 4/17/07, Luba Pardo <lubapardo at gmail.com> wrote:
>
> Dear all,
> I have two questions.
> 1.) I am trying to download some modules from Bioperl-run via CVS but I
> can
> not login.
>
> $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.
>
> The error I get is: time out, failed to connect to the server. I have
> no trouble to download other files and I installed bioperl modules via
> CPAN and it works.
>
> 2) The second question I have is that I am using the PAML:CODEML
> module to do phylogenetic analysis.
>
> I have used the example provided in the HOWTO:PAML (also given as
> example: pairwise_ka_ks.PL). The program does not crash but it returns
> and empty object. I think the problem is in the last part of the
> script because I manage to get sequences and also the alignment, but I
> can not get any ka, ks value. I am not sure whether there is a bug in
> the last part of the script.
>
> Does anyone have an idea?
>
> Thank you very much
>
> Luba Pardo
>
> $kaks_factory->alignment($dna_aln);
>
> # run the KaKs analysis
> my ($rc,$parser) = $kaks_factory->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
>
> my @otus = $result->get_seqs();
> # this gives us a mapping from the PAML order of sequences back to
> # the input order (since names get truncated)
> my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
>     my $c= 1;
>     foreach my $s ( @each ) {
>       last if( $s->display_id eq $_->display_id );
>       $c++;
>     }
>     $c;
>    } @otus;
>
> print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
> <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
> <http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
> Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
> for( my $i = 0; $i < (scalar
> <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
> $i++) {
>   for( my $j = $i+1; $j < (scalar
> <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
>     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>     print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
> join <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
> $otus[$i]->display_id,
>
> $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>                          $MLmatrix->[$i]->[$j]->{'dS'},
>                          $MLmatrix->[$i]->[$j]->{'omega'},
>                          sprintf
> <http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> >("%.2f",$sub_aa_aln->percentage_identity),
>                          sprintf
> <http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> >("%.2f",$sub_dna_aln->percentage_identity),
>                          ), "\n";
>   }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From IoannisKirmitzoglou at gmail.com  Tue Apr 17 09:05:37 2007
From: IoannisKirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Tue, 17 Apr 2007 06:05:37 -0700 (PDT)
Subject: [Bioperl-l] Parsing FASTA m10 output
Message-ID: <10034698.post@talk.nabble.com>


I apologize if this question has already been answered but my search came up
with no relevant results.
I am new to the FASTA program and after reading the fasta3x.doc I decided to
run it using the m10 output. The reason for doing such a choice was 

Quote from fasta3x.doc:  
     -m 10 is a new, parseable format for use with other
     programs.... 


I ran FASTA in batch mode and waited about 3-4 days for the results.
My problem is that today, when i started writing a perl script to parse the
output I realized that SearchIO doesn't supports m10 format.
Seems like I had to be more careful...
Before starting coding a module that will be able to parse the output (or
re-running FASTA with -m9 -d0 switches which will take 4 more days) I would
be really thankful if any of you knows of any other way to parse those
files?
Thanks in advance...

Ioannis Kirmitzoglou, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus

-- 
View this message in context: http://www.nabble.com/Parsing-FASTA-m10-output-tf3590568.html#a10034698
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.

From ewijaya at i2r.a-star.edu.sg  Tue Apr 17 09:10:00 2007
From: ewijaya at i2r.a-star.edu.sg (Edward WIJAYA)
Date: Tue, 17 Apr 2007 21:10:00 +0800
Subject: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
Message-ID: <462473B7.4070905@i2r.a-star.edu.sg>


Dear all,

How do you usually construct a graph for TFBS (binding sites) position
within their sequences? I was thinking to build something like this kind of
visualization tool:

http://research.i2r.a-star.edu.sg/Dragon/Motif_Search/cgi-bin/tmp/29740M1.html

or

http://wingless.cs.washington.edu:8080/assessment/servlet?filenameID=submission/SPACE.D9F26D506DE90E9A0A0010BB6BCCAEF3&pageType=visualizationForm&action=Visualize+It

Is there a BioPerl module to do that?

--
Edward


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From lubapardo at gmail.com  Tue Apr 17 10:01:57 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 17 Apr 2007 15:01:57 +0100
Subject: [Bioperl-l] CVS AND PAML
In-Reply-To: <358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>
References: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
	<358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>
Message-ID: <58ff33550704170701p1207ad51r271b0aff235bfd05@mail.gmail.com>

Hi,
Sorry. Bellow is the code. The part of the code that does not work is when
using the codeml module.
Thanks
Luba
# for projecting alignments from protein to R/DNA space
use Bio::Align::Utilities qw(aa_to_dna_aln);
# for input of the sequence data
use Bio::SeqIO;
use Bio::AlignIO;

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new;
my $seqdata = shift || 'cds.fa';

my $seqio = new Bio::SeqIO(-file   => $seqdata,
                           -format => 'fasta');
my %seqs;
my @prots;
# process each sequence
while ( my $seq = $seqio->next_seq ) {
    $seqs{$seq->display_id} = $seq;
    # translate them into protein
    my $protein = $seq->translate();
    my $pseq = $protein->seq();
    if( $pseq =~ /\*/ &&
        $pseq !~ /\*$/ ) {
          warn("provided a CDS sequence with a stop codon, PAML will
choke!");
          exit(0);
    }
    # Tcoffee can't handle '*' even if it is trailing
    $pseq =~ s/\*//g;
    $protein->seq($pseq);
    push @prots, $protein;
}

if( @prots < 2 ) {
    warn("Need at least 2 CDS sequences to proceed");
    exit(0);
}

open(OUT, ">align_output.txt") ||  die("cannot open output align_output for
writing");
# Align the sequences with clustalw
my $aa_aln = $aln_factory->align(\@prots);
# project the protein alignment back to CDS coordinates
my $dna_aln = aa_to_dna_aln($aa_aln, \%seqs);

my @each = $dna_aln->each_seq();

my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
                   ( -params => { 'runmode' => -2,
                                  'seqtype' => 1,
                                } );

# set the alignment object
$kaks_factory->alignment($dna_aln);

# run the KaKs analysis
my ($rc,$parser) = $kaks_factory->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();

my @otus = $result->get_seqs();
# this gives us a mapping from the PAML order of sequences back to
# the input order (since names get truncated)
my @pos = map {
    my $c= 1;
    foreach my $s ( @each ) {
      last if( $s->display_id eq $_->display_id );
      $c++;
    }
    $c;
   } @otus;

print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
CDNA_PERCENTID)),"\n";
for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
  for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
    my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
    my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
    print OUT join("\t", $otus[$i]->display_id,

$otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
                         $MLmatrix->[$i]->[$j]->{'dS'},
                         $MLmatrix->[$i]->[$j]->{'omega'},
                         sprintf("%.2f",$sub_aa_aln->percentage_identity),
                         sprintf("%.2f",$sub_dna_aln->percentage_identity),
                         ), "\n";
  }
}

On 17/04/07, Albert Vilella <avilella at gmail.com> wrote:
>
> hmmm, there are some perldoc links around your code snippet. can you post
> the code again? what is the input data you are trying this with?
>
> On 4/17/07, Luba Pardo <lubapardo at gmail.com> wrote:
>
> > Dear all,
> > I have two questions.
> > 1.) I am trying to download some modules from Bioperl-run via CVS but I
> > can
> > not login.
> >
> > $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.
> >
> > The error I get is: time out, failed to connect to the server. I have
> > no trouble to download other files and I installed bioperl modules via
> > CPAN and it works.
> >
> > 2) The second question I have is that I am using the PAML:CODEML
> > module to do phylogenetic analysis.
> >
> > I have used the example provided in the HOWTO:PAML (also given as
> > example: pairwise_ka_ks.PL). The program does not crash but it returns
> > and empty object. I think the problem is in the last part of the
> > script because I manage to get sequences and also the alignment, but I
> > can not get any ka, ks value. I am not sure whether there is a bug in
> > the last part of the script.
> >
> > Does anyone have an idea?
> >
> > Thank you very much
> >
> > Luba Pardo
> >
> > $kaks_factory->alignment($dna_aln);
> >
> > # run the KaKs analysis
> > my ($rc,$parser) = $kaks_factory->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> >
> > my @otus = $result->get_seqs();
> > # this gives us a mapping from the PAML order of sequences back to
> > # the input order (since names get truncated)
> > my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
> >     my $c= 1;
> >     foreach my $s ( @each ) {
> >       last if( $s->display_id eq $_->display_id );
> >       $c++;
> >     }
> >     $c;
> >    } @otus;
> >
> > print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
> > < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
> > <http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
> > Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
> > for( my $i = 0; $i < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
> > $i++) {
> >   for( my $j = $i+1; $j < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
> >     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
> >     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
> >     print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
> > join < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
> > $otus[$i]->display_id,
> >
> > $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
> >                          $MLmatrix->[$i]->[$j]->{'dS'},
> >                          $MLmatrix->[$i]->[$j]->{'omega'},
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_aa_aln->percentage_identity),
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_dna_aln->percentage_identity),
> >                          ), "\n";
> >   }
> > }
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>

From alexl at users.sourceforge.net  Tue Apr 17 09:54:13 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Tue, 17 Apr 2007 06:54:13 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu> (Chris Fields's
	message of "Fri\, 30 Mar 2007 23\:39\:15 -0500")
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
Message-ID: <5h4pnff6nu.fsf@delpy.biol.berkeley.edu>

On Mar 30, 2007, at 11:02 PM, Allen Day wrote:

[...]

>> If we're in agreement that the primary data sets and
>> libraries/applications for producing derivative data should not be
>> present in Fedora Extras, then it follows that the Bioperl classes
>> for manipulating these primary and derivative data should also not
>> be present in Fedora Extras as they are of little use without data
>> to manipulate.

Chris Fields wrote:

CF> I respectfully disagree.  BioPerl, to me, is a toolkit which helps
CF> accomplish certain tasks.  As with any toolkit, not all parts are
CF> required to do what one needs.  A good number of end-users use
CF> BioPerl for remote database queries
CF> (Bio::DB::GenBank/Taxonomy/etc), remote BLAST, seq analysis,
CF> alignment analysis, phylogenetic tree manipulation, etc, none of
CF> which require outside apps be installed.  For many a remote db is
CF> their primary source of data; not everybody sets up BioPerl for
CF> accessing local db records, running programs, etc (just the smart
CF> ones!).  As for outside apps, the docs are pretty explicit where
CF> certain outside resources (libxml2, expat, libgd) are needed for
CF> functionality.

CF> When we package up a new release we generally have ActiveState PPM
CF> archives available for Win32 users who want an easy way to install
CF> BioPerl.  I wouldn't have a problem if ActiveState wanted to post
CF> these to their repository.  Why would allowing someone to do the
CF> same for fedora extras be any different?

Hi all,

Given that there seems to be a reasonable consensus (including list
discussion here as well as in private e-mail) from bioperl folks that
including bioperl in Fedora is OK, I'm going ahead and building
bioperl for Fedora >= 6 (it's currently in the development branch).  I
thought about the issue carefully and this seems to makes sense for
several reasons:

1. Biopackages.net isn't currently building packages for Fedora Core 6
   and later (as Allen said, that may happen later when more build
   resources come online).  I won't build perl-bioperl for FC-5 or
   earlier to make sure that the Fedora package doesn't disrupt
   installations with the biopackages.net version.

2. Currently I've only run the the base bioperl (live) package through
   the reviewing gauntlet, but I plan to add the bioperl-run package
   as well.  Even though the bioperl-run package is intended to use
   third party packages (e.g. Clustal etc.) which may not be
   distributed with Fedora, it appears that the bioperl-run package
   contains code that can download those packages directly (albeit
   outside the RPM package system).  And some of the external tools
   could be packaged in Fedora because they have open-source licenses
   (e.g. Wise2, EMBOSS, NCBI toolkit etc.)

   Furthermore it appears the biopackages.net version of that package
   doesn't actually have "Requires:" that would automatically install
   those third-party tool that is run via bioperl (e.g. Clustal) in
   any case, so when biopackages start building for >FC-6 the Fedora
   perl-bioperl* packages can function as a drop-in replacement
   without disturbing other biopackages dependencies such as genome
   databases.

3. Third-party packages that can't be included directly in Fedora
   (such as Clustal) that can be used by bioperl-run could still be
   added via third-party repos like biopackages.net, in the same way
   that the multimedia packages gstreamer and gstreamer-plugins-good
   live in Fedora, but gstreamer-plugins-bad containing patent
   encumbered MP3 codecs with live in Livna.

Cheers,
Alex

From cjfields at uiuc.edu  Tue Apr 17 10:35:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 09:35:10 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
Message-ID: <0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>

On Apr 17, 2007, at 8:54 AM, Alex Lancaster wrote:

> Hi all,
>
> Given that there seems to be a reasonable consensus (including list
> discussion here as well as in private e-mail) from bioperl folks that
> including bioperl in Fedora is OK, I'm going ahead and building
> bioperl for Fedora >= 6 (it's currently in the development branch).  I
> thought about the issue carefully and this seems to makes sense for
> several reasons:
>
> ...
> 2. Currently I've only run the the base bioperl (live) package through
>    the reviewing gauntlet, but I plan to add the bioperl-run package
>    as well.  Even though the bioperl-run package is intended to use
>    third party packages (e.g. Clustal etc.) which may not be
>    distributed with Fedora, it appears that the bioperl-run package
>    contains code that can download those packages directly (albeit
>    outside the RPM package system).  And some of the external tools
>    could be packaged in Fedora because they have open-source licenses
>    (e.g. Wise2, EMBOSS, NCBI toolkit etc.)
...

Do you mean the bioperl core modules instead of "bioperl-live"?  We  
use the term "bioperl-live" to designate code updated regularly via  
CVS, which can be buggy depending on when it's retrieved.

I'm not sure how others feel about this, but it's probably best to  
stick with either the latest official releases (v 1.5.2 at this time)  
or even GBrowse-sponsored interim releases (which fix GBrowse-related  
bugs and normally pass tests).

chris


From hlapp at gmx.net  Tue Apr 17 11:09:45 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 17 Apr 2007 11:09:45 -0400
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>


On Apr 17, 2007, at 9:35 AM, Leighton Pritchard wrote:

> Hi Hilmar,
>
> Thanks for the very quick response.  Apologies for the long reply,  
> but I
> thought it might be useful if anyone else happens across the same
> problems that I did.

Thanks for reporting all these.

> [...]
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------
> Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
> lactonase activity':
> [...]
> I tracked this down to an apparently poor formatting of the GO.defs  
> file
> (note that the first and third definition_lines appear to be two  
> halves
> of the same entry):
>
> term: 2-pyrone-4,6-dicarboxylate lactonase activity
> goid: GO:0047554
> definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate +  
> H2O
> = 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
> definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN

I wonder whether this is the line that throws the parser off. It  
looks like the database part of the reference is missing - bad.

> definition_reference: EC:3.1.1.57
> definition_reference: MetaCyc:2-PYRONE-4
>
> I found 43 similar errors for other GOIDs, and it appears to result  
> from
> the occurrence of the string "\," in a dbxref - mostly MetaCyc  
> entries,
> but also some UM-BBD_pathwayID entries.

I'm not sure - although the string "\," might indeed trip up the  
parser, would have to investigate to confirm. Could it be a  
coincidence with definition_references that lack the database part  
before the colon?

>
> These errors appear to have followed through into the generation of  
> the
> OBO format files in each case, e.g.:
>
> def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
> 4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE- 
> LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]

Again, the first db_xref lacks the database in front of the colon. I  
can also see why "\," will trip up the parser in this format.

>
> and so is something for the GO guys to fix, I guess.

The lack of a database for certain xrefs surely is. If the escaped  
comma does throw off the BioPerl parser then that part is for BioPerl  
to fix. It does seem to extract the parts correctly, if the error  
message is any indication, though you may argue that it should remove  
the escaping backslashes (and I'd certainly agree with that).

>
>
> Another error is thrown after fixing the above, though (with the same
> command as before):
>
> Loading ontology Gene Ontology:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values  
> were
> ("GO:0006905","vesicle transport","OBSOLETE (was not defined before
> being made obsolete).","X","") FKs (1)
> Duplicate entry 'vesicle transport-1-X' for key 3
> ---------------------------------------------------
> Could not store term GO:0006905, name 'vesicle transport':
> [...]
> There are duplicate terms, identical in the term table except for  
> GOID:
> GO:0006905 and GO:0005480.  They are both "vesicle transport", and
> obsoleted:

That violates the uniqueness constraint, and this sounds more like a  
bug in the GO file. I'm also not sure what motivated them to create  
the same term multiple times only to obsolete it immediately.

> [...]
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("PMID","","0","") FKs ()
> Column 'accession' cannot be null
> ---------------------------------------------------
> Could not store term GO:0032933, name 'SREBP-mediated signaling
> pathway':
> [...]
> with the offending entry being
>
> term: SREBP-mediated signaling pathway
> goid: GO:0032933
> definition: A series of molecular signals from the endoplasmic  
> reticulum
> to the nucleus generated as a consequence of altered levels of one or
> more lipids, and resulting in the activation of transcription by  
> SREBP.
> definition_reference: GOC:mah
> definition_reference: PMID:0
>
> I commented out the definition_reference for PMID:0, which seemed  
> to fix
> matters.

Right, it seems to be a bogus reference.

>
> The process.ontology and component.ontology files then went into the
> database without a hitch.  Thanks again for your help,

Fantastic you got it all loaded!

Note that you also have the --computetc switch which will compute the  
transitive closure for you automatically.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From alexl at users.sourceforge.net  Tue Apr 17 11:13:30 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Tue, 17 Apr 2007 08:13:30 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu> (Chris Fields's
	message of "Tue\, 17 Apr 2007 09\:35\:10 -0500")
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
	<0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>
Message-ID: <nwy7krdof9.fsf@delpy.biol.berkeley.edu>

>>>>> "CF" == Chris Fields  writes:

[...]

CF> Do you mean the bioperl core modules instead of "bioperl-live"?
CF> We use the term "bioperl-live" to designate code updated regularly
CF> via CVS, which can be buggy depending on when it's retrieved.

Yes, I am referring to the core package.  Called perl-bioperl in the
Fedora naming scheme.

CF> I'm not sure how others feel about this, but it's probably best to
CF> stick with either the latest official releases (v 1.5.2 at this
CF> time) or even GBrowse-sponsored interim releases (which fix
CF> GBrowse-related bugs and normally pass tests).

Yes I am sticking to the latest official release 1.5.2_102.  The
package is here:

http://download.fedora.redhat.com/pub/fedora/linux/extras/development/i386/repoview/perl-bioperl.html

and installable via yum (on the development branch) using:

$ yum install perl-bioperl

The FC-6 package will be available soon.

Alex

From cjfields at uiuc.edu  Tue Apr 17 12:18:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:18:19 -0500
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
	<1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>

On Apr 17, 2007, at 11:05 AM, Leighton Pritchard wrote:
...
>
>>> and so is something for the GO guys to fix, I guess.
>>
>> The lack of a database for certain xrefs surely is. If the escaped
>> comma does throw off the BioPerl parser then that part is for BioPerl
>> to fix.
>
> I thinkk the problems are now all in the data I downloaded from
> http://www.geneontology.org/GO.downloads.shtml - I believe the BioPerl
> parser to be innocent of these charges ;)  I've submitted the issue at
> the GO site, and with any luck they'll handle it quite soon (if it  
> is in
> fact their problem).
>
>> Note that you also have the --computetc switch which will compute the
>> transitive closure for you automatically.
>
> :D Excellent!  Thanks for the pointer, and again for your efforts,
>
> L.
...

If you do find anything that is BioSQL- or Bioperl-related then file  
a bug report so we can track it.  I agree with Hilmar that it's  
likely the parser is partly to blame.

http://bugzilla.open-bio.org/

We really appreciate the work you're putting into this!

chris

From cjfields at uiuc.edu  Tue Apr 17 12:19:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:19:02 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <nwy7krdof9.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
	<0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>
	<nwy7krdof9.fsf@delpy.biol.berkeley.edu>
Message-ID: <3963AFE3-68B6-43F0-8A20-82A575CA8806@uiuc.edu>


On Apr 17, 2007, at 10:13 AM, Alex Lancaster wrote:

>
> [...]
>
> CF> Do you mean the bioperl core modules instead of "bioperl-live"?
> CF> We use the term "bioperl-live" to designate code updated regularly
> CF> via CVS, which can be buggy depending on when it's retrieved.
>
> Yes, I am referring to the core package.  Called perl-bioperl in the
> Fedora naming scheme.
>
> CF> I'm not sure how others feel about this, but it's probably best to
> CF> stick with either the latest official releases (v 1.5.2 at this
> CF> time) or even GBrowse-sponsored interim releases (which fix
> CF> GBrowse-related bugs and normally pass tests).
>
> Yes I am sticking to the latest official release 1.5.2_102.  The
> package is here:
>
> http://download.fedora.redhat.com/pub/fedora/linux/extras/ 
> development/i386/repoview/perl-bioperl.html
>
> and installable via yum (on the development branch) using:
>
> $ yum install perl-bioperl
>
> The FC-6 package will be available soon.
>
> Alex

Sounds good.  Thanks Alex!

chris

From ioanniskirmitzoglou at gmail.com  Tue Apr 17 12:21:36 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Tue, 17 Apr 2007 19:21:36 +0300
Subject: [Bioperl-l]  Parsing FASTA m10 output
In-Reply-To: <b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
Message-ID: <b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>

Thanks for the prompt reply...
Seems like I will have to "quit talking and begin doing"
I will post the code here in case someone else finds himself in the same
situation...

-- 
Ioannis Kirmitzoglou, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


On 17/04/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
> I am parsing FASTA outputs these days.
>
> The m 10 format is a recent implementation, not so popular yet. So, I have

> first tested the Bio::SearchIO against a default output and everything is
> fine.
>
> I think future releases of Bio::SearchIO will deal with the m10 output. By
> now, you can run all again or code a little bit to parse what you want
(not
> a hard task).
>
> T.
>
>
> On 4/17/07, Ioannis Kirmitzoglou < IoannisKirmitzoglou at gmail.com> wrote:
> >
> > I apologize if this question has already been answered but my search
came
> up
> > with no relevant results.
> > I am new to the FASTA program and after reading the fasta3x.doc I
decided
> to
> > run it using the m10 output. The reason for doing such a choice was
> >
> > Quote from fasta3x.doc:
> >      -m 10 is a new, parseable format for use with other
> >      programs....
> >
> >
> > I ran FASTA in batch mode and waited about 3-4 days for the results.
> > My problem is that today, when i started writing a perl script to parse
> the
> > output I realized that SearchIO doesn't supports m10 format.
> > Seems like I had to be more careful...
> > Before starting coding a module that will be able to parse the output
(or
> > re-running FASTA with -m9 -d0 switches which will take 4 more days) I
> would
> > be really thankful if any of you knows of any other way to parse those
> > files?
> > Thanks in advance...
> >
> > Ioannis Kirmitzoglou, MSc
> > PhD. Student,
> > Bioinformatics Research Laboratory
> > Department of Biological Sciences
> > University of Cyprus
> >
> > --
> > View this message in context:
> http://www.nabble.com/Parsing-FASTA-m10-output-tf3590568.html#a10034698
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> "The way to get started is to quit talking and begin doing."
>       Walt Disney
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================

From cjfields at uiuc.edu  Tue Apr 17 12:49:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:49:53 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
Message-ID: <639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>

You can post here or add it to Bugzilla as an enhancement request if  
the code is particularly long.

chris

On Apr 17, 2007, at 11:21 AM, Ioannis Kirmitzoglou wrote:

> Thanks for the prompt reply...
> Seems like I will have to "quit talking and begin doing"
> I will post the code here in case someone else finds himself in the  
> same
> situation...
>
> -- 
> Ioannis Kirmitzoglou, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
>
>
> On 17/04/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
>> I am parsing FASTA outputs these days.
>>
>> The m 10 format is a recent implementation, not so popular yet.  
>> So, I have
>
>> first tested the Bio::SearchIO against a default output and  
>> everything is
>> fine.
>>
>> I think future releases of Bio::SearchIO will deal with the m10  
>> output. By
>> now, you can run all again or code a little bit to parse what you  
>> want
> (not
>> a hard task).
>>
>> T.
>>
>>
>> On 4/17/07, Ioannis Kirmitzoglou < IoannisKirmitzoglou at gmail.com>  
>> wrote:
>>>
>>> I apologize if this question has already been answered but my search
> came
>> up
>>> with no relevant results.
>>> I am new to the FASTA program and after reading the fasta3x.doc I
> decided
>> to
>>> run it using the m10 output. The reason for doing such a choice was
>>>
>>> Quote from fasta3x.doc:
>>>      -m 10 is a new, parseable format for use with other
>>>      programs....
>>>
>>>
>>> I ran FASTA in batch mode and waited about 3-4 days for the results.
>>> My problem is that today, when i started writing a perl script to  
>>> parse
>> the
>>> output I realized that SearchIO doesn't supports m10 format.
>>> Seems like I had to be more careful...
>>> Before starting coding a module that will be able to parse the  
>>> output
> (or
>>> re-running FASTA with -m9 -d0 switches which will take 4 more  
>>> days) I
>> would
>>> be really thankful if any of you knows of any other way to parse  
>>> those
>>> files?
>>> Thanks in advance...
>>>
>>> Ioannis Kirmitzoglou, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>>
>>> --
>>> View this message in context:
>> http://www.nabble.com/Parsing-FASTA-m10-output- 
>> tf3590568.html#a10034698
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> --
>> "The way to get started is to quit talking and begin doing."
>>       Walt Disney
>>
>> ========================
>> Thiago Motta Venancio, MSc
>> PhD student in Bioinformatics
>> University of Sao Paulo
>> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lpritc at scri.ac.uk  Tue Apr 17 09:35:44 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 14:35:44 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
Message-ID: <1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>

Hi Hilmar, 

Thanks for the very quick response.  Apologies for the long reply, but I
thought it might be useful if anyone else happens across the same
problems that I did.

On Tue, 2007-04-17 at 00:00 -0400, Hilmar Lapp wrote:
> Apparently the parser  
> fails to parse out database and accession for this db_xref of term GO: 
> 0018901.
> 
> If you can edit the obo file, you can try deleting the db_xref(s) for  
> that term that look odd (or delete all if you don't need them).

You're spot on - see further down for details...

> Note that the argument for --fmtargs here should read
> "-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
> there is no tilde expansion.)

D'oh!  Thanks for the note - my bad, there.

> This is one the things why you've got to love MySQL (and I am correct  
> in inferring that you're using MySQL?). 

The 'choice' was forced upon me ;)

> It may be necessary to widen the length of dbname.accession here, for  
> example to 80 chars? Let me know if you need help with the DDL  
> command to do this.

I've fixed that now (and added it to my local biosqldb-mysql.sql
schema), but with a clean BioSQL schema and using:

[lpritc at lplinuxdev sql]$ bp_load_ontology.pl --host localhost --dbname
biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass ********
--format goflat --fmtargs
"-defs_file,/home/lpritc/Downloads/GO.defs" /home/lpritc/Downloads/function.ontology 

I was still getting errors with the GO flatfile:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("","","0","") FKs ()
Column 'dbname' cannot be null
---------------------------------------------------
Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
lactonase activity':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0x88497a4)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x897f074)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x8d64ad8)', '-throw',
'CODE(0x851abc8)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

I tracked this down to an apparently poor formatting of the GO.defs file
(note that the first and third definition_lines appear to be two halves
of the same entry):

term: 2-pyrone-4,6-dicarboxylate lactonase activity
goid: GO:0047554
definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O
= 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
definition_reference: EC:3.1.1.57
definition_reference: MetaCyc:2-PYRONE-4

I found 43 similar errors for other GOIDs, and it appears to result from
the occurrence of the string "\," in a dbxref - mostly MetaCyc entries,
but also some UM-BBD_pathwayID entries.

These errors appear to have followed through into the generation of the
OBO format files in each case, e.g.:

def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE-LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]

and so is something for the GO guys to fix, I guess.


Another error is thrown after fixing the above, though (with the same
command as before):

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were
("GO:0006905","vesicle transport","OBSOLETE (was not defined before
being made obsolete).","X","") FKs (1)
Duplicate entry 'vesicle transport-1-X' for key 3
---------------------------------------------------
Could not store term GO:0006905, name 'vesicle transport':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbcac418)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x957805c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x995db20)', '-throw',
'CODE(0x9113bd0)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

There are duplicate terms, identical in the term table except for GOID:
GO:0006905 and GO:0005480.  They are both "vesicle transport", and
obsoleted:

term: vesicle transport
goid: GO:0005480
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because it represents a biological
process and not a molecular function. To update annotations, use the
biological process term 'vesicle-mediated transport ; GO:0016192'.

term: vesicle transport
goid: GO:0006905
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because the meaning of the term is
ambiguous. To update annotations, consider the biological process term
'vesicle-mediated transport ; GO:0016192'.

I used the --noobsolete flag to avoid this error - reasoning that since
I'm populating the database for the first time, ignoring the obsolete
terms won't hurt - but finally this error was thrown:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("PMID","","0","") FKs ()
Column 'accession' cannot be null
---------------------------------------------------
Could not store term GO:0032933, name 'SREBP-mediated signaling
pathway':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbe18f14)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x99bbf2c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x9da0ad8)', '-throw',
'CODE(0x9556bb4)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

with the offending entry being 

term: SREBP-mediated signaling pathway
goid: GO:0032933
definition: A series of molecular signals from the endoplasmic reticulum
to the nucleus generated as a consequence of altered levels of one or
more lipids, and resulting in the activation of transcription by SREBP.
definition_reference: GOC:mah
definition_reference: PMID:0

I commented out the definition_reference for PMID:0, which seemed to fix
matters.

The process.ontology and component.ontology files then went into the
database without a hitch.  Thanks again for your help,

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From lpritc at scri.ac.uk  Tue Apr 17 12:05:16 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 17:05:16 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
Message-ID: <1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>

Hello again,

On Tue, 2007-04-17 at 11:09 -0400, Hilmar Lapp wrote:
> Thanks for reporting all these.

No problem at all.

> On Apr 17, 2007, at 9:35 AM, Leighton Pritchard wrote:
> > term: 2-pyrone-4,6-dicarboxylate lactonase activity
[...]
> > definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
> 
> I wonder whether this is the line that throws the parser off. It  
> looks like the database part of the reference is missing - bad.

> > definition_reference: MetaCyc:2-PYRONE-4

I don't think the parser is to blame, here.  Note that if you join the
definition_reference strings from the GO.defs file, you get:

MetaCyc:2-PYRONE-4:6-DICARBOXYLATE-LACTONASE-RXN

Then if you replace the colon by "\," you get what should (I think)
actually be the MetaCyc entry:

MetaCyc:2-PYRONE-4\,6-DICARBOXYLATE-LACTONASE-RXN

> > I found 43 similar errors for other GOIDs, and it appears to result  
> > from
> > the occurrence of the string "\," in a dbxref - mostly MetaCyc  
> > entries,
> > but also some UM-BBD_pathwayID entries.
> 
> I'm not sure - although the string "\," might indeed trip up the  
> parser, would have to investigate to confirm. Could it be a  
> coincidence with definition_references that lack the database part  
> before the colon?

Inspecting the troublesome entries by eye seems to turn up the same
problem as above consistently: a GO term in the GO.defs file is
malformed.  The term should have a definition_reference field describing
a MetaCyc entry that matches the term field.  In the term string, there
would be an escaped comma, but the string ends where we expect this.
The string that would follow the escaped comma is present as the first
definition_reference.

This observation also extends to cases where there should be two
occurrences of "\," in the MetaCyc field, e.g.:

term: 2,3-dihydroxyindole 2,3-dioxygenase activity
goid: GO:0047528
definition: Catalysis of the reaction: 2,3-dihydroxyindole + O2 =
anthranilate + CO2.
definition_reference: :3-DIHYDROXYINDOLE-2
definition_reference: :3-DIOXYGENASE-RXN
definition_reference: EC:1.13.11.2
definition_reference: MetaCyc:2

It then appears as though the GO flatfiles were used automatically to
generate the OBO format files, and propagated the same error into the
square brackets in each case.

> > and so is something for the GO guys to fix, I guess.
> 
> The lack of a database for certain xrefs surely is. If the escaped  
> comma does throw off the BioPerl parser then that part is for BioPerl  
> to fix. 

I thinkk the problems are now all in the data I downloaded from
http://www.geneontology.org/GO.downloads.shtml - I believe the BioPerl
parser to be innocent of these charges ;)  I've submitted the issue at
the GO site, and with any luck they'll handle it quite soon (if it is in
fact their problem).

> Note that you also have the --computetc switch which will compute the  
> transitive closure for you automatically.

:D Excellent!  Thanks for the pointer, and again for your efforts,

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From stefan.kirov at bms.com  Tue Apr 17 11:09:30 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 17 Apr 2007 11:09:30 -0400
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph with
	Perl]
Message-ID: <4624E32A.6010704@bms.com>

Missed to send this to the list....
Stefan
-------------- next part --------------
An embedded message was scrubbed...
From: Stefan Kirov <stefan.kirov at bms.com>
Subject: Re: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
Date: Tue, 17 Apr 2007 10:30:11 -0400
Size: 2262
Url: http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070417/cc49d62a/attachment.mht 

From lpritc at scri.ac.uk  Tue Apr 17 12:55:38 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 17:55:38 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
	<1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
	<146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>
Message-ID: <1176828938.988.133.camel@lplinuxdev.scri.sari.ac.uk>

Hi Chris,

On Tue, 2007-04-17 at 11:18 -0500, Chris Fields wrote:
> If you do find anything that is BioSQL- or Bioperl-related then file  
> a bug report so we can track it.  I agree with Hilmar that it's  
> likely the parser is partly to blame.
> 
> http://bugzilla.open-bio.org/

I've submitted a bug report, mostly replicating my first post in this
thread.  I added links to the appropriate point in the list archives so
that the rest of the discussion can be considered, too.

> We really appreciate the work you're putting into this!

Thanks - I'm just grateful that the Bio* repertoire is there at all so
that my problems are relatively minor (as opposed to attempting to
replicate the functionality independently).

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From lstein at cshl.edu  Tue Apr 17 13:47:25 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 17 Apr 2007 13:47:25 -0400
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <C2340DDA.D83F%bosborne11@verizon.net>
References: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<C2340DDA.D83F%bosborne11@verizon.net>
Message-ID: <6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>

Hi,

I've been updating the WIKI in anticipation of a new GBrowse release and
have added a "stub" for the biopackages.net install. Since I don't use yum
(I've been running Slackware for ages and have recently started working with
Ubuntu) I'm not sure I got the details right. Could someone check?


        http://www.gmod.org/wiki/index.php/GBrowse_RPM_HOWTO

Also, I think some verbiage on how to use yum to install MySQL and Apache
would be great, since it will be consistent with the Ubuntu install page.

Thanks,

Lincoln

On 3/31/07, Brian Osborne <bosborne11 at verizon.net> wrote:
>
> Allen et al.,
>
> What happened to the "GMOD" package or packages? I've had some
> conversations
> in the past few months with you-all suggesting that a GMOD package, or
> packages, would be useful.
>
> Brian O.
>
>
>
>
> On 3/30/07 8:30 PM, "Allen Day" <allenday at gmail.com> wrote:
>
> > Hi Alex,
> >
> > You've aptly noted that there are several classes of packages being
> > discussed here, and that they should not be treated equally.  From my
> > point of view and of specific relevance to the Bioperl community we
> > have at least:
> >
> > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran
> > dependencies.  These should all be in Fedora Extras, as they are of
> > general utility.  Biopackages.net currently hosts about 200 packages
> > (.spec files, specifically) that are like this.  Maybe 80 of these are
> > needed for Bioperl.
> >
> > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan,
> > etc.  From what I've seen, these typically have strange/custom
> > licenses that may not be valid for some users.  BLAT has a dual
> > licensing scheme for academic and non-academic licensees, for
> > instance.  These packages are not of general utility.  For these two
> > reasons, my stance is that they should not be included in Fedora
> > Extras.
> >
> > 3) Bioperl packages.  Several subsets here.  The Bioperl-run libraries
> > depend directly on type (2) packages, so aren't appropriate to include
> > in Fedora Extras.  Bioperl-live is not really that useful without type
> > (2) packages.  It is also sensible to all of the keep the Bioperl-*
> > packages in the same repository.  For these reasons, my stance is that
> > they should not be included in Fedora Extras.
> >
> > 4) Bioinformatics / Comp. Bio. data sets.  These don't have licensing
> > problems, but they tend to be large.  Usually in the 10E7 - 10E10 byte
> > range.  RPM can not even generate correct metadata for some of them
> > correctly if the files are too large (overflow problems).  Probably
> > not appropriate to put in Fedora Extras because they are too large and
> > not generally useful.
> >
> > 5) Bioinformatics-specific System databases / daemons.  These
> > high-level packages depend on types (2), (3), and (4), and so are not
> > appropriate to put into Fedora Extras.  An example is a BLAT daemon,
> > which relies on the BLAT server, as well as NIB-formatted genome
> > sequence files.
> >
> > That said, there are a lot of type (1) packages in the Biopackages.net
> > repository.  If you're interested in migrating the spec files from our
> > repository to the Fedora project it would save us (the Biopackages.net
> > maintainers) a ton of build and maintenance time, so please feel free
> > to take them, just let us know.  If we can reach some agreement on
> > where the bioinformatics-specific packages should be maintained/built
> > we may be able to work together on these as well.
> >
> > -Allen
> >
> >
> > On 3/30/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> >>>>>>> "AD" == Allen Day  writes:
> >>
> >> AD> Hi Alex, The Biopackages.net project is still active, we are
> >> AD> regularly adding packages to it, mostly R packages lately.  Most
> >> AD> of the systems we use are running CentOS at this point, which is
> >> AD> why you have not seen support for FC6 yet.  There is nothing
> >> AD> preventing building FC6 packages aside from lack of time to set up
> >> AD> the FC6 build farm nodes.
> >>
> >> Hi Allen and other,
> >>
> >> Great news to hear that Biopackages.net is still active!  I would like
> >> to help out if possible.  I don't believe in "FUD" either... ;)
> >>
> >> AD> If you're interested in packaging BioPerl or other
> >> AD> bioinformatics-related software, please join the Biopackages
> >> AD> project on SourceForge.  We object to the Fedora Extras FUD
> >> AD> tactics used to discourage people from using 3rd party
> >> AD> repositories, and suspect they may not want to host some of our
> >> AD> data packages, such as the >2GB genome packages.  Biopackages
> >> AD> project is likely to partially merge with RPMForge.  We are
> >> AD> already discussing with them how best to do it.
> >>
> >> The packages that I created which are currently available in Fedora
> >> Packages are Perl dependencies which, as I said are useful for
> >> packages outside the bioinformatics purview.  I do have a (base)
> >> bioperl package in review, but it is not yet released.
> >>
> >> As for third-party repos, I don't object to them at all, and for some
> >> kinds of projects they are indeed appropriate. (e.g. for non-free
> >> stuff like Livna or Freshrpms).  However I do have practical concerns
> >> about repository mixing, but I think that it does need to be handled
> >> carefully but that co-operation between Fedora and third-party repos
> >> can make it work.
> >>
> >> For example, one practical concern is that as of the
> >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there
> >> will be no distinction at the repository-level between formerly Extras
> >> packages and formerly Core packages (as of now there are only "Fedora
> >> Packages"), which means that it will not be possible for third-party
> >> repos to limit their dependencies to just those in a former base set
> >> (i.e. excluding Extras).
> >>
> >> I agree that a few years ago (circa 2003-2004) there was concern about
> >> the way some third party repositories were treated somewhat badly by
> >> the (then) Fedora Extras (with some people going so far as to say that
> >> third-party repos were bad in principle and should always be ignored
> >> which I disagree with too).  But it seems to me that culture has
> >> shifted since, with some notable packagers such as Matthias Saou (of
> >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to
> >> Fedora itself.  The process of contributing has also become much
> >> simpler and reviews are conducted speedily and efficiently, I had
> >> packages in the repository in a matter of a few days from initial
> >> submission.  Freshrpms itself now enables and depends on the (old)
> >> Extras.
> >>
> >> The real question for me, then is what packages it makes sense to go
> >> in Fedora, and what packages go in third party repositories.  It seems
> >> to me that in the case of Perl packages which could be dependencies
> >> for other packages not specific to the third-party repo in question,
> >> it makes sense for them to go into Fedora itself, so I think I will
> >> continue to package them.  This lessens the load on the third-party
> >> repo, while making them available for all other third-party repos.
> >> (This is approach that Freshrpms seems to be taking, Matthias has
> >> contributed most packages back to Fedora now other than the non-free
> >> ones).
> >>
> >> At the other end of the spectrum are packages like you mention, genome
> >> packages, which may be of concern because of their size and/or highly
> >> specialised nature, and, as you say, may make sense to go in a
> >> third-party repo like Biopackages.net.  Also packages which can't be
> >> packaged by Fedora for legal reasons like Clustal could/should go in
> >> Biopackages.net.
> >>
> >> In the middle are packages like bioperl itself which are potentially
> >> useful to perhaps a wider group of people than the genome packages but
> >> may not necessarily be dependencies for other packages.  I lean
> >> towards making them part of Fedora so that they will be available of
> >> out the box on the planned "Everything" DVD ISO, but I welcome a
> >> discussion on this.
> >>
> >> As I said, I'm glad to hear that Biopackages.net is alive and well and
> >> I welcome a discussion on how upstream Fedora can usefully interact
> >> with Biopackages.net (I guess perhaps on the Biopackages.net list).
> >>
> >> Regards,
> >> Alex
> >>
> >> PS.  As the upstream author If you could clarify the license on
> >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be great.
> >> --
> >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of
> Arizona
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From alexl at users.sourceforge.net  Wed Apr 18 04:50:51 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 18 Apr 2007 01:50:51 -0700
Subject: [Bioperl-l] bioperl-run and Bio::Root::AccessorMaker
Message-ID: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>

In packaging bioperl-run for Fedora, I think I stumbled across a bug
in the bioperl-run package.  It appears from this edit:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl

that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
bioperl-run 1.5.2_100 still contains modules that use this module:

$ cd bioperl-run-1.5.2_100
$ grep -r AccessorMaker  *
Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
class min_version)]);
Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
('$'=>[qw(input_file output_file)]);

This causes the automatic Perl dependency generator for RPM to add
Bio::Root::AccessorMake as a requires which means RPM will refuse to
install perl-bioperl-run because it's looking for the now-removed-
from-Core-bioperl module

$ sudo rpm -Uvh --test
/home/alex/rpmbuild/RPMS/noarch/perl-bioperl-run-1.5.2_100-1.noarch.rpm 
error: Failed dependencies:
        perl(Bio::Root::AccessorMaker) is needed by
        perl-bioperl-run-1.5.2_100-1.noarch

Are the SDI and JavaRunner modules being actively developed?  What's
the best course of action for these modules, should I just exclude
them from the package for now? since they won't work, even if if you
tell RPM to ignore the dependency warning.

Alex

From shameer at ncbs.res.in  Wed Apr 18 06:16:07 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Wed, 18 Apr 2007 15:46:07 +0530 (IST)
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph
 with Perl]
In-Reply-To: <4624E32A.6010704@bms.com>
References: <4624E32A.6010704@bms.com>
Message-ID: <36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>

Hi,

I am also interested to use the Bio::Graphics modules from dynamic image
display. I have a doubt,  I tried all the sample programs explained in
this page http://stein.cshl.org/genome_informatics/BioGraphics/index.html.
Is it possible to generate a png/jpg/gif image from this module by
altering the same program. Currently its using diplay option. I know this
can be done by using GD/Image::MAgick in Perl. But Is there any quick way
to accomplish it in BioPerl .

Thanks,


> Missed to send this to the list....
> Stefan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From sdavis2 at mail.nih.gov  Wed Apr 18 07:18:48 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 18 Apr 2007 07:18:48 -0400
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph
	with Perl]
In-Reply-To: <36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
References: <4624E32A.6010704@bms.com>
	<36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
Message-ID: <200704180718.48811.sdavis2@mail.nih.gov>

On Wednesday 18 April 2007 06:16, Shameer Khadar wrote:
> Hi,
>
> I am also interested to use the Bio::Graphics modules from dynamic image
> display. I have a doubt,  I tried all the sample programs explained in
> this page http://stein.cshl.org/genome_informatics/BioGraphics/index.html.
> Is it possible to generate a png/jpg/gif image from this module by
> altering the same program. Currently its using diplay option. 

You just need to print $panel->png to a file.

Sean

From bix at sendu.me.uk  Wed Apr 18 07:48:27 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 12:48:27 +0100
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
Message-ID: <4626058B.8090801@sendu.me.uk>

Alex Lancaster wrote:
> In packaging bioperl-run for Fedora, I think I stumbled across a bug
> in the bioperl-run package.  It appears from this edit:
> 
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
> 
> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
> bioperl-run 1.5.2_100 still contains modules that use this module:
> 
> $ cd bioperl-run-1.5.2_100
> $ grep -r AccessorMaker  *
> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
> class min_version)]);
> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
> ('$'=>[qw(input_file output_file)]);

It looks like I've implemented a similar idea to AccessorMaker and 
AbstractRunner in Bio::Root::Root->_set_from_args() and 
Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses 
AbstractRunner I propose deprecating it immediately.

Forester::SDI and JavaRunner have no tests which is why we didn't notice 
the problem. Since they've been out of use for a number of years now I 
also propose their immediate deprecation. Alternatively, it may not be 
too difficult to just update them to use _set_from_args and _setparams, 
but I've nothing to test against (and JavaRunner is self-described as 
"probably incomplete").


I can remove the modules from cvs and create bioperl-run-1.5.2_101, 
resolving the packaging issue. I plan on doing precisely this within the 
next seven days unless someone puts a hand up to stop me.


[BCC: author, Juguang Xiao]

From cjfields at uiuc.edu  Wed Apr 18 08:43:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 07:43:45 -0500
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <4626058B.8090801@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>


On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:

> It looks like I've implemented a similar idea to AccessorMaker and
> AbstractRunner in Bio::Root::Root->_set_from_args() and
> Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
> AbstractRunner I propose deprecating it immediately.

JavaRunner is-a AbstractRunner, but what you propose below takes care  
of that.

> Forester::SDI and JavaRunner have no tests which is why we didn't  
> notice
> the problem. Since they've been out of use for a number of years now I
> also propose their immediate deprecation. Alternatively, it may not be
> too difficult to just update them to use _set_from_args and  
> _setparams,
> but I've nothing to test against (and JavaRunner is self-described as
> "probably incomplete").
>
>
> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
> resolving the packaging issue. I plan on doing precisely this  
> within the
> next seven days unless someone puts a hand up to stop me.
>
>
> [BCC: author, Juguang Xiao]

I suppose you could just remove the modules from the branch for now,  
but (as you point out) the code appears largely incomplete, so might  
as well deprecate the entire lot.  The code will be in the 'attic'  
once removed if anyone's really interested in it.

You've forwarded the author and the mail list so let's see what the  
response is (if any)...

chris

From cjfields at uiuc.edu  Wed Apr 18 11:30:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 10:30:45 -0500
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <462634DB.2040701@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
Message-ID: <143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>


On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>>> resolving the packaging issue. I plan on doing precisely this  
>>> within the
>>> next seven days unless someone puts a hand up to stop me.
>>>
>>> [BCC: author, Juguang Xiao]
> [snip]
>> You've forwarded the author and the mail list so let's see what  
>> the response is (if any)...
>
> Unfortunately the mail was undeliverable, and I have no other  
> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few  
> more days for other responses on the list.
>
> I never made a branch for bioperl-run 1.5.2, so they'd be removed  
> from HEAD.

It might be a good idea to repost this using the module names  
affected in the subject, just in case, though the last post he made  
on the mail list was ~3 years ago using the same email:

http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/ 
match=xiao

He may be MIA.

chris


From bix at sendu.me.uk  Wed Apr 18 11:10:19 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 16:10:19 +0100
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
Message-ID: <462634DB.2040701@sendu.me.uk>

Chris Fields wrote:
> 
> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
> 
>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>> resolving the packaging issue. I plan on doing precisely this within the
>> next seven days unless someone puts a hand up to stop me.
>>
>> [BCC: author, Juguang Xiao]
[snip]
> You've forwarded the author and the mail list so let's see what the 
> response is (if any)...

Unfortunately the mail was undeliverable, and I have no other address 
for Juguang (I tried juguang at tll.org.sg). I'll wait a few more days for 
other responses on the list.

I never made a branch for bioperl-run 1.5.2, so they'd be removed from HEAD.

From hlapp at gmx.net  Wed Apr 18 11:59:52 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 18 Apr 2007 11:59:52 -0400
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
	<143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
Message-ID: <EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>

There is a Juguang Xiao at juguang.swf at gmail.com. Not sure he's  
the same, but sounds like it's a geek at least. (google and you'll  
see; has anyone here ever heard about neko??)

	-hilmar

On Apr 18, 2007, at 11:30 AM, Chris Fields wrote:

>
> On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:
>
>> Chris Fields wrote:
>>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>>>> resolving the packaging issue. I plan on doing precisely this
>>>> within the
>>>> next seven days unless someone puts a hand up to stop me.
>>>>
>>>> [BCC: author, Juguang Xiao]
>> [snip]
>>> You've forwarded the author and the mail list so let's see what
>>> the response is (if any)...
>>
>> Unfortunately the mail was undeliverable, and I have no other
>> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few
>> more days for other responses on the list.
>>
>> I never made a branch for bioperl-run 1.5.2, so they'd be removed
>> from HEAD.
>
> It might be a good idea to repost this using the module names
> affected in the subject, just in case, though the last post he made
> on the mail list was ~3 years ago using the same email:
>
> http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/
> match=xiao
>
> He may be MIA.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Apr 18 12:00:49 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 18 Apr 2007 12:00:49 -0400
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <4626058B.8090801@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <9159C9DF-41BC-46AA-8511-763AD9B7A3D0@gmx.net>

sounds good to me - the less cruft the better. -hilmar
On Apr 18, 2007, at 7:48 AM, Sendu Bala wrote:

> Alex Lancaster wrote:
>> In packaging bioperl-run for Fedora, I think I stumbled across a bug
>> in the bioperl-run package.  It appears from this edit:
>>
>> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/ 
>> Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
>>
>> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
>> bioperl-run 1.5.2_100 still contains modules that use this module:
>>
>> $ cd bioperl-run-1.5.2_100
>> $ grep -r AccessorMaker  *
>> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
>> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw 
>> (jar
>> class min_version)]);
>> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
>> ('$'=>[qw(input_file output_file)]);
>
> It looks like I've implemented a similar idea to AccessorMaker and
> AbstractRunner in Bio::Root::Root->_set_from_args() and
> Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
> AbstractRunner I propose deprecating it immediately.
>
> Forester::SDI and JavaRunner have no tests which is why we didn't  
> notice
> the problem. Since they've been out of use for a number of years now I
> also propose their immediate deprecation. Alternatively, it may not be
> too difficult to just update them to use _set_from_args and  
> _setparams,
> but I've nothing to test against (and JavaRunner is self-described as
> "probably incomplete").
>
>
> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
> resolving the packaging issue. I plan on doing precisely this  
> within the
> next seven days unless someone puts a hand up to stop me.
>
>
> [BCC: author, Juguang Xiao]
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Apr 18 12:25:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 11:25:54 -0500
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
	<143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
	<EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>
Message-ID: <E0195EBD-731D-4915-91AD-7FFE1FA9F608@uiuc.edu>

My guess is the hilmar's is the most current as posts were made this  
year.  I found another email: juguang at fugu-sg.org.  Looks like he  
added some stuff to Ensembl a while back (sorry about the long URL).

http://www.ensembl.org/info/software/Pdoc/ensembl/modules/Bio/EnsEMBL/ 
Utils/Converter/ens_bio_featurePair_raw.html

chris

On Apr 18, 2007, at 10:59 AM, Hilmar Lapp wrote:

> There is a Juguang Xiao at juguang.swf at gmail.com. Not sure he's
> the same, but sounds like it's a geek at least. (google and you'll
> see; has anyone here ever heard about neko??)
>
> 	-hilmar
>
> On Apr 18, 2007, at 11:30 AM, Chris Fields wrote:
>
>>
>> On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:
>>
>>> Chris Fields wrote:
>>>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>>>> I can remove the modules from cvs and create bioperl- 
>>>>> run-1.5.2_101,
>>>>> resolving the packaging issue. I plan on doing precisely this
>>>>> within the
>>>>> next seven days unless someone puts a hand up to stop me.
>>>>>
>>>>> [BCC: author, Juguang Xiao]
>>> [snip]
>>>> You've forwarded the author and the mail list so let's see what
>>>> the response is (if any)...
>>>
>>> Unfortunately the mail was undeliverable, and I have no other
>>> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few
>>> more days for other responses on the list.
>>>
>>> I never made a branch for bioperl-run 1.5.2, so they'd be removed
>>> from HEAD.
>>
>> It might be a good idea to repost this using the module names
>> affected in the subject, just in case, though the last post he made
>> on the mail list was ~3 years ago using the same email:
>>
>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/
>> match=xiao
>>
>> He may be MIA.
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 18 12:37:55 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 17:37:55 +0100
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
Message-ID: <46264963.9020306@sendu.me.uk>

Hi all,

t/DB.t is currently failing tests 40 and 41:

ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
                                          '-ids' => [qw(J00522 AF303112 
2981014)],
                                          -verbose => 1);

cmp_ok $query->count, '>', 0;

You can see that 
http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522%2CAF303112%2C2981014&retmax=100 
gives no results, where presumably it used to give 3. querying on the 3 
ids individually works fine. So... what changed and how do we get around it?

From cjfields at uiuc.edu  Wed Apr 18 13:05:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 12:05:12 -0500
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <46264963.9020306@sendu.me.uk>
References: <46264963.9020306@sendu.me.uk>
Message-ID: <6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>

I can verify on this end.  Not sure why, but the same accessions are  
used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)  
with success.

chris

On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:

> Hi all,
>
> t/DB.t is currently failing tests 40 and 41:
>
> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>                                           '-ids' => [qw(J00522  
> AF303112
> 2981014)],
>                                           -verbose => 1);
>
> cmp_ok $query->count, '>', 0;
>
> You can see that
> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi? 
> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522% 
> 2CAF303112%2C2981014&retmax=100
> gives no results, where presumably it used to give 3. querying on  
> the 3
> ids individually works fine. So... what changed and how do we get  
> around it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Apr 18 14:07:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 13:07:22 -0500
Subject: [Bioperl-l] Skipping/Failing tests
Message-ID: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>

To the BioPerl community at large,

I have noticed a problem with some BioPerl tests when converting to  
Test::More.  When using the following:

     while ($seq = $seqin->next_seq) {
         my $acc = $seq->accession;
         ok exists $result{ $acc };
         is $seq->length, $result{ $acc };
         delete $result{$acc};
     }

if $seq is undef then the test plan is off by a factor of 2 for every  
iteration of the loop.  Two serious problems:

1) No specific failures are seen until the end of the test suite when  
the test plan doesn't match the number of tests (which could be  
several hundred lines away from the actual failure).
2) Worse, if one were lazy enough to not track the actual number of  
tests (heh, not that would happen) they could inadvertently change  
the test plan to match the final number of tests.

There are several ways to work around this, such as using a counter  
to track the number of iterations and check to make sure they pass:

     $ct = 0;
     while ($seq = $seqin->next_seq) {
         $ct++;
         my $acc = $seq->accession;
         ok exists $result{ $acc };
         is $seq->length, $result{ $acc };
         delete $result{$acc};
     }
     is($ct, 3);

Here, if $ct is 0 you'll get an error.  However, the test count will  
still be off at the end (the test plan will be off by 6 tests).

My opinion is that we should try to match the plan, as a single fail  
doesn't reflect the severity of the bug (i.e. it should fail each  
test per iteration, as expected).  Skipping to match is an option as  
well (one I've used) but again doesn't reflect the severity of the  
problem in my opinion.  The flip side is that some consider any  
failed test significant, so there is no reason to try matching the  
tests up.

What I would like to do is hammer out something we can add to the  
Writing Tests HOWTO which addresses some ways to deal with the above  
for those who want to contribute code and tests to BioPerl.  I'm  
looking for some (any) additional opinions on the matter (or, if you  
have the initiative, adding some ideas to the HOWTO itself).

http://www.bioperl.org/wiki/Special:Recentchanges

Thanks!

chris


From ki.baik at roche.com  Wed Apr 18 14:32:35 2007
From: ki.baik at roche.com (Baik, Ki)
Date: Wed, 18 Apr 2007 11:32:35 -0700
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
References: <46264963.9020306@sendu.me.uk>
	<6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
Message-ID: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>

I have had similar problems in which a couple of accession numbers out
of a series were not retrieved, yet they do exist in ncbi.

Ki Baik

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Wednesday, April 18, 2007 10:05 AM
To: Sendu Bala
Cc: bioperl-l
Subject: Re: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures

I can verify on this end.  Not sure why, but the same accessions are  
used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)  
with success.

chris

On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:

> Hi all,
>
> t/DB.t is currently failing tests 40 and 41:
>
> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>                                           '-ids' => [qw(J00522  
> AF303112
> 2981014)],
>                                           -verbose => 1);
>
> cmp_ok $query->count, '>', 0;
>
> You can see that
> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi? 
> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522% 
> 2CAF303112%2C2981014&retmax=100
> gives no results, where presumably it used to give 3. querying on  
> the 3
> ids individually works fine. So... what changed and how do we get  
> around it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed Apr 18 15:12:29 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 18 Apr 2007 14:12:29 -0500
Subject: [Bioperl-l] Skipping/Failing tests
In-Reply-To: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>
References: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>
Message-ID: <46266D9D.1050703@campus.iztacala.unam.mx>

Hey Chris,

I don't know if this helps those working on the test suite but, there's 
a recently-cooked recipe for keeping track on the number of tests (thus 
helping to update the test plan accordingly):

http://www.perl.com/pub/a/2007/04/12/lightning-four.html?page=3

My quick .2 cents :)

Cheers,
Mauricio.

Chris Fields wrote:
> To the BioPerl community at large,
> 
> I have noticed a problem with some BioPerl tests when converting to  
> Test::More.  When using the following:
> 
>      while ($seq = $seqin->next_seq) {
>          my $acc = $seq->accession;
>          ok exists $result{ $acc };
>          is $seq->length, $result{ $acc };
>          delete $result{$acc};
>      }
> 
> if $seq is undef then the test plan is off by a factor of 2 for every  
> iteration of the loop.  Two serious problems:
> 
> 1) No specific failures are seen until the end of the test suite when  
> the test plan doesn't match the number of tests (which could be  
> several hundred lines away from the actual failure).
> 2) Worse, if one were lazy enough to not track the actual number of  
> tests (heh, not that would happen) they could inadvertently change  
> the test plan to match the final number of tests.
> 
> There are several ways to work around this, such as using a counter  
> to track the number of iterations and check to make sure they pass:
> 
>      $ct = 0;
>      while ($seq = $seqin->next_seq) {
>          $ct++;
>          my $acc = $seq->accession;
>          ok exists $result{ $acc };
>          is $seq->length, $result{ $acc };
>          delete $result{$acc};
>      }
>      is($ct, 3);
> 
> Here, if $ct is 0 you'll get an error.  However, the test count will  
> still be off at the end (the test plan will be off by 6 tests).
> 
> My opinion is that we should try to match the plan, as a single fail  
> doesn't reflect the severity of the bug (i.e. it should fail each  
> test per iteration, as expected).  Skipping to match is an option as  
> well (one I've used) but again doesn't reflect the severity of the  
> problem in my opinion.  The flip side is that some consider any  
> failed test significant, so there is no reason to try matching the  
> tests up.
> 
> What I would like to do is hammer out something we can add to the  
> Writing Tests HOWTO which addresses some ways to deal with the above  
> for those who want to contribute code and tests to BioPerl.  I'm  
> looking for some (any) additional opinions on the matter (or, if you  
> have the initiative, adding some ideas to the HOWTO itself).
> 
> http://www.bioperl.org/wiki/Special:Recentchanges
> 
> Thanks!
> 
> chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Wed Apr 18 15:41:56 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 14:41:56 -0500
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
References: <46264963.9020306@sendu.me.uk>
	<6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
	<6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
Message-ID: <208DCD0F-6A0B-4054-A1C7-D599D32AC344@uiuc.edu>

The problem appears to be with eutils.  Using bare accession numbers  
no longer works with esearch (which Bio::DB::Query::GenBank uses).   
Using them via efetch still works, which explains why  
Bio::DB::GenBank passes tests using the same accession/GI mix.

NCBI has added an extra field descriptor specifically for accessions  
in esearch, which means any queries with accessions must look like  
the following (the last is a GI):

'J00522[accession] OR AF303112[accession] OR 2981014'

'J00522[accession] | AF303112[accession] | 2981014' also works.

We could separate them into two groups based on presence of letters  
and set up the query that way, or we can define exactly what kind of  
ID is acceptable for passing to ids() (GI or accession), or have ids 
() be GI and have a new method for accessions (or vice versa).   
Thoughts?

chris

On Apr 18, 2007, at 1:32 PM, Baik, Ki wrote:

> I have had similar problems in which a couple of accession numbers out
> of a series were not retrieved, yet they do exist in ncbi.
>
> Ki Baik
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
> Fields
> Sent: Wednesday, April 18, 2007 10:05 AM
> To: Sendu Bala
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
>
> I can verify on this end.  Not sure why, but the same accessions are
> used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)
> with success.
>
> chris
>
> On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:
>
>> Hi all,
>>
>> t/DB.t is currently failing tests 40 and 41:
>>
>> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>>                                           '-ids' => [qw(J00522
>> AF303112
>> 2981014)],
>>                                           -verbose => 1);
>>
>> cmp_ok $query->count, '>', 0;
>>
>> You can see that
>> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?
>> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522%
>> 2CAF303112%2C2981014&retmax=100
>> gives no results, where presumably it used to give 3. querying on
>> the 3
>> ids individually works fine. So... what changed and how do we get
>> around it?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From boconnor at ucla.edu  Wed Apr 18 15:00:32 2007
From: boconnor at ucla.edu (Brian O'Connor)
Date: Wed, 18 Apr 2007 12:00:32 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>
References: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>	
	<C2340DDA.D83F%bosborne11@verizon.net>
	<6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>
Message-ID: <46266AD0.3070209@ucla.edu>

Hey Lincoln,

This looks good but the configuration step is about to change for 
Biopackages.  I'm writing config RPMs today so the end user can just 
install the config RPM for their distro and they don't have to manually 
change the yum.conf file.  It will also install the biopackages gpg key 
too so we can support signed packages.  I'll update the wiki when these 
config RPMs are available.

--Brian

Lincoln Stein wrote:

> Hi,
>
> I've been updating the WIKI in anticipation of a new GBrowse release 
> and have added a "stub" for the biopackages.net 
> <http://biopackages.net> install. Since I don't use yum (I've been 
> running Slackware for ages and have recently started working with 
> Ubuntu) I'm not sure I got the details right. Could someone check?
>
>
>         http://www.gmod.org/wiki/index.php/GBrowse_RPM_HOWTO
>
> Also, I think some verbiage on how to use yum to install MySQL and 
> Apache would be great, since it will be consistent with the Ubuntu 
> install page.
>
> Thanks,
>
> Lincoln
>
> On 3/31/07, *Brian Osborne* <bosborne11 at verizon.net 
> <mailto:bosborne11 at verizon.net>> wrote:
>
>     Allen et al.,
>
>     What happened to the "GMOD" package or packages? I've had some
>     conversations
>     in the past few months with you-all suggesting that a GMOD package, or
>     packages, would be useful.
>
>     Brian O.
>
>
>
>
>     On 3/30/07 8:30 PM, "Allen Day" <allenday at gmail.com
>     <mailto:allenday at gmail.com>> wrote:
>
>     > Hi Alex,
>     >
>     > You've aptly noted that there are several classes of packages being
>     > discussed here, and that they should not be treated
>     equally.  From my
>     > point of view and of specific relevance to the Bioperl community we
>     > have at least:
>     >
>     > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran
>     > dependencies.  These should all be in Fedora Extras, as they are of
>     > general utility.  Biopackages.net <http://Biopackages.net>
>     currently hosts about 200 packages
>     > (.spec files, specifically) that are like this.  Maybe 80 of
>     these are
>     > needed for Bioperl.
>     >
>     > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan,
>     > etc.  From what I've seen, these typically have strange/custom
>     > licenses that may not be valid for some users.  BLAT has a dual
>     > licensing scheme for academic and non-academic licensees, for
>     > instance.  These packages are not of general utility.  For these two
>     > reasons, my stance is that they should not be included in Fedora
>     > Extras.
>     >
>     > 3) Bioperl packages.  Several subsets here.  The Bioperl-run
>     libraries
>     > depend directly on type (2) packages, so aren't appropriate to
>     include
>     > in Fedora Extras.  Bioperl-live is not really that useful
>     without type
>     > (2) packages.  It is also sensible to all of the keep the Bioperl-*
>     > packages in the same repository.  For these reasons, my stance
>     is that
>     > they should not be included in Fedora Extras.
>     >
>     > 4) Bioinformatics / Comp. Bio. data sets.  These don't have
>     licensing
>     > problems, but they tend to be large.  Usually in the 10E7 -
>     10E10 byte
>     > range.  RPM can not even generate correct metadata for some of them
>     > correctly if the files are too large (overflow problems).  Probably
>     > not appropriate to put in Fedora Extras because they are too
>     large and
>     > not generally useful.
>     >
>     > 5) Bioinformatics-specific System databases / daemons.  These
>     > high-level packages depend on types (2), (3), and (4), and so
>     are not
>     > appropriate to put into Fedora Extras.  An example is a BLAT daemon,
>     > which relies on the BLAT server, as well as NIB-formatted genome
>     > sequence files.
>     >
>     > That said, there are a lot of type (1) packages in the
>     Biopackages.net <http://Biopackages.net>
>     > repository.  If you're interested in migrating the spec files
>     from our
>     > repository to the Fedora project it would save us (the
>     Biopackages.net <http://Biopackages.net>
>     > maintainers) a ton of build and maintenance time, so please feel
>     free
>     > to take them, just let us know.  If we can reach some agreement on
>     > where the bioinformatics-specific packages should be
>     maintained/built
>     > we may be able to work together on these as well.
>     >
>     > -Allen
>     >
>     >
>     > On 3/30/07, Alex Lancaster < alexl at users.sourceforge.net
>     <mailto:alexl at users.sourceforge.net>> wrote:
>     >>>>>>> "AD" == Allen Day  writes:
>     >>
>     >> AD> Hi Alex, The Biopackages.net <http://Biopackages.net>
>     project is still active, we are
>     >> AD> regularly adding packages to it, mostly R packages
>     lately.  Most
>     >> AD> of the systems we use are running CentOS at this point,
>     which is
>     >> AD> why you have not seen support for FC6 yet.  There is nothing
>     >> AD> preventing building FC6 packages aside from lack of time to
>     set up
>     >> AD> the FC6 build farm nodes.
>     >>
>     >> Hi Allen and other,
>     >>
>     >> Great news to hear that Biopackages.net
>     <http://Biopackages.net> is still active!  I would like
>     >> to help out if possible.  I don't believe in "FUD" either... ;)
>     >>
>     >> AD> If you're interested in packaging BioPerl or other
>     >> AD> bioinformatics-related software, please join the Biopackages
>     >> AD> project on SourceForge.  We object to the Fedora Extras FUD
>     >> AD> tactics used to discourage people from using 3rd party
>     >> AD> repositories, and suspect they may not want to host some of our
>     >> AD> data packages, such as the >2GB genome packages.  Biopackages
>     >> AD> project is likely to partially merge with RPMForge.  We are
>     >> AD> already discussing with them how best to do it.
>     >>
>     >> The packages that I created which are currently available in Fedora
>     >> Packages are Perl dependencies which, as I said are useful for
>     >> packages outside the bioinformatics purview.  I do have a (base)
>     >> bioperl package in review, but it is not yet released.
>     >>
>     >> As for third-party repos, I don't object to them at all, and
>     for some
>     >> kinds of projects they are indeed appropriate. (e.g. for non-free
>     >> stuff like Livna or Freshrpms).  However I do have practical
>     concerns
>     >> about repository mixing, but I think that it does need to be
>     handled
>     >> carefully but that co-operation between Fedora and third-party
>     repos
>     >> can make it work.
>     >>
>     >> For example, one practical concern is that as of the
>     >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there
>     >> will be no distinction at the repository-level between formerly
>     Extras
>     >> packages and formerly Core packages (as of now there are only
>     "Fedora
>     >> Packages"), which means that it will not be possible for
>     third-party
>     >> repos to limit their dependencies to just those in a former
>     base set
>     >> (i.e. excluding Extras).
>     >>
>     >> I agree that a few years ago (circa 2003-2004) there was
>     concern about
>     >> the way some third party repositories were treated somewhat
>     badly by
>     >> the (then) Fedora Extras (with some people going so far as to
>     say that
>     >> third-party repos were bad in principle and should always be
>     ignored
>     >> which I disagree with too).  But it seems to me that culture has
>     >> shifted since, with some notable packagers such as Matthias
>     Saou (of
>     >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to
>     >> Fedora itself.  The process of contributing has also become much
>     >> simpler and reviews are conducted speedily and efficiently, I had
>     >> packages in the repository in a matter of a few days from initial
>     >> submission.  Freshrpms itself now enables and depends on the (old)
>     >> Extras.
>     >>
>     >> The real question for me, then is what packages it makes sense
>     to go
>     >> in Fedora, and what packages go in third party
>     repositories.  It seems
>     >> to me that in the case of Perl packages which could be
>     dependencies
>     >> for other packages not specific to the third-party repo in
>     question,
>     >> it makes sense for them to go into Fedora itself, so I think I will
>     >> continue to package them.  This lessens the load on the
>     third-party
>     >> repo, while making them available for all other third-party repos.
>     >> (This is approach that Freshrpms seems to be taking, Matthias has
>     >> contributed most packages back to Fedora now other than the
>     non-free
>     >> ones).
>     >>
>     >> At the other end of the spectrum are packages like you mention,
>     genome
>     >> packages, which may be of concern because of their size and/or
>     highly
>     >> specialised nature, and, as you say, may make sense to go in a
>     >> third-party repo like Biopackages.net
>     <http://Biopackages.net>.  Also packages which can't be
>     >> packaged by Fedora for legal reasons like Clustal could/should
>     go in
>     >> Biopackages.net <http://Biopackages.net>.
>     >>
>     >> In the middle are packages like bioperl itself which are
>     potentially
>     >> useful to perhaps a wider group of people than the genome
>     packages but
>     >> may not necessarily be dependencies for other packages.  I lean
>     >> towards making them part of Fedora so that they will be
>     available of
>     >> out the box on the planned "Everything" DVD ISO, but I welcome a
>     >> discussion on this.
>     >>
>     >> As I said, I'm glad to hear that Biopackages.net
>     <http://Biopackages.net> is alive and well and
>     >> I welcome a discussion on how upstream Fedora can usefully interact
>     >> with Biopackages.net <http://Biopackages.net> (I guess perhaps
>     on the Biopackages.net <http://Biopackages.net> list).
>     >>
>     >> Regards,
>     >> Alex
>     >>
>     >> PS.  As the upstream author If you could clarify the license on
>     >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be
>     great.
>     >> --
>     >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology,
>     University of Arizona
>     >>
>     >>
>     >>
>     >> _______________________________________________
>     >> Bioperl-l mailing list
>     >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>     >>
>     > _______________________________________________
>     > Bioperl-l mailing list
>     > Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>     _______________________________________________
>     Bioperl-l mailing list
>     Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu <mailto:michelse at cshl.edu> 


From alexl at users.sourceforge.net  Wed Apr 18 21:17:34 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 18 Apr 2007 18:17:34 -0700
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <4626058B.8090801@sendu.me.uk> (Sendu Bala's message of "Wed\,
	18 Apr 2007 12\:48\:27 +0100")
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <e43b2x6u35.fsf@delpy.biol.berkeley.edu>

>>>>> "SB" == Sendu Bala  writes:

[...]

SB> I can remove the modules from cvs and create
SB> bioperl-run-1.5.2_101, resolving the packaging issue. I plan on
SB> doing precisely this within the next seven days unless someone
SB> puts a hand up to stop me.

In the meantime, until bioperl-run-1.5.2_101 is available, is it safe
just to remove these four .pm files during the packaging so they
don't get installed?  It looks like these four files are
self-contained and are only required/used by each other:

$ grep -r AccessorMaker *
Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar class min_version)]);
Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(input_file output_file)]);

$ grep -r AbstractRunner *
Tools/Run/JavaRunner.pm:use Bio::Tools::Run::AbstractRunner;
Tools/Run/JavaRunner.pm:our @ISA=qw(Bio::Tools::Run::AbstractRunner);
Tools/Run/AbstractRunner.pm:package Bio::Tools::Run::AbstractRunner;
Tools/Run/AbstractRunner.pm:Bio::Tools::Run::AbstractRunner

$ grep -r JavaRunner *
Tools/Run/Phylo/Forester/SDI.pm:use Bio::Tools::Run::JavaRunner;
Tools/Run/Phylo/Forester/SDI.pm:our @ISA=qw(Bio::Tools::Run::JavaRunner);
Tools/Run/JavaRunner.pm:package Bio::Tools::Run::JavaRunner;
Tools/Run/JavaRunner.pm: Usage   : $runner = Bio::Tools::Run::JavaRunner->new(-jar => $jar)
Tools/Run/JavaRunner.pm: Function: Builds a new Bio::Tools::Run::JavaRunner object
Tools/Run/JavaRunner.pm: Returns : Bio::Tools::Run::JavaRunner
Tools/Run/JavaRunner.pm:Bio::Tools::Run::JavaRunner - run java programs
Tools/Run/JavaRunner.pm:   my $runner = Bio::Tools::Run::JavaRunner->new(-jar => $jar);

$ grep -r Forester *
Tools/Run/Phylo/Forester/SDI.pm:package Bio::Tools::Run::Phylo::Forester::SDI;
Tools/Run/Phylo/Forester/SDI.pm:Bio::Tools::Run::Phylo::Forester::SDI
Tools/Run/Phylo/Forester/SDI.pm:    my $runner = Bio::Tools::Run::Phylo::Forester::SDI->new();
Tools/Run/Phylo/Forester/SDI.pm:This wrapper is for SDI in Forester package. 
Tools/Run/Phylo/Forester/SDI.pm:For more details on Forester, please see 

Alex


From sac at bioperl.org  Thu Apr 19 01:14:02 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 18 Apr 2007 22:14:02 -0700
Subject: [Bioperl-l] GenericHit->start/end needs tiled hsps?
In-Reply-To: <461F3FBA.2010101@sendu.me.uk>
References: <461F3FBA.2010101@sendu.me.uk>
Message-ID: <8f200b4c0704182214j77a4accy72f71b2061764d5b@mail.gmail.com>

Sendu,

Your thinking here seems correct and in fact agrees with the documentation
for those methods:

start():  If there is more than one HSP, the lowest start
           value of all HSPs is returned.

end():  If there is more than one HSP, the largest end
          value of all HSPs is returned.

It would be fine with me to change the implementation in GenericHit as you
suggest and to not tile the HSPs. Tiling is only necessary for data that is
summed across the region covered by all HSPs, as is done by these methods:
matches(), gaps(), frac_* and percent_*.

Steve

On 4/13/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Hi all,
>
> I want to double-check my thinking regarding
> Bio::Search::Hit::GenericHit->start() and end(). Right now the docs
> claim that hsps of the hit object must be tiled before the answer can be
> produced. The code is implemented in that way
> (Bio::Search::SearchUtils::tile_hsps($self)).
>
> Yet as far as I can see, all you need to do is loop through all hsps and
> pick out the smallest start and largest end respectively in terms of
> subject and query.
>
> This comes up because I have a blast report where a single hit contains
> over 80000 hsps and the tiling takes over an hour (I gave up on it,
> don't know how long it really takes). The simple loop through hsps takes
> seconds or less.
>
> Now in this situation the answer isn't especially useful (to me). An
> alternative way of fixing the problem would be to re-write the tiling
> algorithm (again) to somehow make it hundreds of times faster, then
> provide some way in start() and end() for the user to request the start
> and end of the best contig, or other contig of choice. Easier said than
> done though!
>
>
> What do people think?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From bix at sendu.me.uk  Thu Apr 19 06:52:45 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 11:52:45 +0100
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <e43b2x6u35.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>	<4626058B.8090801@sendu.me.uk>
	<e43b2x6u35.fsf@delpy.biol.berkeley.edu>
Message-ID: <462749FD.3080603@sendu.me.uk>

Alex Lancaster wrote:
>>>>>> "SB" == Sendu Bala  writes:
> 
> [...]
> 
> SB> I can remove the modules from cvs and create
> SB> bioperl-run-1.5.2_101, resolving the packaging issue. I plan on
> SB> doing precisely this within the next seven days unless someone
> SB> puts a hand up to stop me.
> 
> In the meantime, until bioperl-run-1.5.2_101 is available, is it safe
> just to remove these four .pm files during the packaging so they
> don't get installed?

Sure, go ahead with that.

From bix at sendu.me.uk  Thu Apr 19 06:51:53 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 11:51:53 +0100
Subject: [Bioperl-l] To be deprecated: Bio::Tools::Run::AbstractRunner,
 Bio::Tools::Run::Phylo::Forester::SDI and
 Bio::Tools::Run::JavaRunner
In-Reply-To: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
Message-ID: <462749C9.1040503@sendu.me.uk>

[repost under new subject to make sure it is seen by those it may concern]

[BCC: Juguang Xiao at a variety of possible email addresses]


Alex Lancaster wrote:
> In packaging bioperl-run for Fedora, I think I stumbled across a bug
> in the bioperl-run package.  It appears from this edit:
> 
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
> 
> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
> bioperl-run 1.5.2_100 still contains modules that use this module:
> 
> $ cd bioperl-run-1.5.2_100
> $ grep -r AccessorMaker  *
> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
> class min_version)]);
> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
> ('$'=>[qw(input_file output_file)]);

It looks like I've implemented a similar idea to AccessorMaker and
AbstractRunner in Bio::Root::Root->_set_from_args() and
Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
AbstractRunner I propose deprecating it immediately.

Forester::SDI and JavaRunner have no tests which is why we didn't notice
the problem. Since they've been out of use for a number of years now I
also propose their immediate deprecation. Alternatively, it may not be
too difficult to just update them to use _set_from_args and _setparams,
but I've nothing to test against (and JavaRunner is self-described as
"probably incomplete").


I can remove the modules from cvs and create bioperl-run-1.5.2_101,
resolving the packaging issue. I plan on doing precisely this within the
next seven days unless someone puts a hand up to stop me.


From bix at sendu.me.uk  Thu Apr 19 08:17:19 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 13:17:19 +0100
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
In-Reply-To: <200704050059.l350xNF07452@cricket.bio.indiana.edu>
References: <200704050059.l350xNF07452@cricket.bio.indiana.edu>
Message-ID: <46275DCF.6030103@sendu.me.uk>

Don Gilbert wrote:
> Dear Bioperl list,
> 
> There is a small bug in what I think is the current Bio::Tools::GFF.pm,
> that blocks output of Target attributes (in gff3 at least).  See a patch
> here
> 
> http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF

The patch was applied by Brian but is currently generating this warning:

./Build test --test_files t/GbrowseGFF.t --verbose
t/GbrowseGFF....1..5
ok 1 - use Bio::SearchIO;
ok 2 - use Bio::SearchIO::Writer::GbrowseGFF;
ok 3 - use Bio::Root::IO;
ok 4
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
ok 5
ok
All tests successful.

Can this patch be looked at again and rolled-back if the problem can't 
be fixed?


Cheers,
Sendu.

From sm8 at sanger.ac.uk  Thu Apr 19 07:49:30 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 12:49:30 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>

 is there an existing method for copying a Bio::Tree::Tree object by
value?

All the best,
Stephen


From bix at sendu.me.uk  Thu Apr 19 08:43:44 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 13:43:44 +0100
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
Message-ID: <46276400.2020207@sendu.me.uk>

Stephen Montgomery wrote:
>  is there an existing method for copying a Bio::Tree::Tree object by
> value?

What do you mean? Describe in a little more detail what you're trying to do.


From sm8 at sanger.ac.uk  Thu Apr 19 09:13:44 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 14:13:44 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>

my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object

as an example, a method like
my $tree_copy = $tree->clone; #copies by value (this method doesn't
exist) or
my $tree_copy = Storable::dclone($tree); 

Cheers,
Stephen

-----Original Message-----
From: Sendu Bala [mailto:bix at sendu.me.uk] 
Sent: 19 April 2007 13:44
To: Stephen Montgomery
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] tree copy by-value

Stephen Montgomery wrote:
>  is there an existing method for copying a Bio::Tree::Tree object by
> value?

What do you mean? Describe in a little more detail what you're trying to
do.


From jason at bioperl.org  Thu Apr 19 09:19:05 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 19 Apr 2007 06:19:05 -0700
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
Message-ID: <35813ADC-6597-46FC-8FB8-C70AA3541BEC@bioperl.org>

I don't think so, worst case you serialize to/from TreeIO and get a  
new one, but the _internal_id of the nodes will be necessarily  
different (and new).

-jason
On Apr 19, 2007, at 4:49 AM, Stephen Montgomery wrote:

>  is there an existing method for copying a Bio::Tree::Tree object by
> value?
>
> All the best,
> Stephen
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From bix at sendu.me.uk  Thu Apr 19 09:24:41 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 14:24:41 +0100
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>
Message-ID: <46276D99.2060108@sendu.me.uk>

Stephen Montgomery wrote:
> my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object
> 
> as an example, a method like
> my $tree_copy = $tree->clone; #copies by value (this method doesn't
> exist) or
> my $tree_copy = Storable::dclone($tree); 

Right, sorry for being a little slow on the uptake. As a matter of fact 
I recently added _clone() to Bio::Tree::TreeFunctionsI which does a 
"safe tree clone that doesn't seg fault". Its undocumented and I thought 
would only be needed by simplify_to_leaves_string(), but I guess I can 
document it and make it public (ie. remove the underscore from the name) 
if this might be popular.

Oh, it's also not that well tested, so proceed with caution and provide 
feedback if you can.


Cheers,
Sendu.

From ewijaya at gmail.com  Thu Apr 19 09:27:45 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Thu, 19 Apr 2007 21:27:45 +0800
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
	Enable Connector
Message-ID: <3521d3670704190627u6aba98b1nc3892833b6a77c1c@mail.gmail.com>

Dear expert,

My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png
is created with the script (down below).

How can I modify the script such that:

1. The arrow track is represented in negative form.
    I.e. instead of 1 to 300, we use -300 to 0.

I tried this, but won't do:

my $flen = Bio::SeqFeature::Generic->new(
        -start => -300,
        -end => 0, );

And how can I make these number to appear
for every gridpoints (not just two as I have now).


2. How can I enable the connector with grid just like
   I had in the first panel? (as you can see, my script
   has connector added, but still doesn't show).

All, in all, I am trying to mimic this figure:
http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg

And here is my script:

__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
use List::Compare;
use List::Util qw(max);

my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
my @seqid = keys %nofseq;
my @lenlist = values %nofseq;
my $maxlen = max (@lenlist);
#print Dumper \@seqid ;

my $panel = Bio::Graphics::Panel->new(
    -length    => 300,
    -width     => 500,
    -pad_left  => 70,
    -pad_right => 70,
    -key_style => 'left',
    -connector => 'solid',
);

my $flen = Bio::SeqFeature::Generic->new(
        -start => 1,   # tried -300
        -end => 300, # and 0, but failed.
);

    my $track1 = $panel->add_track(
        $flen,
        -glyph   => 'arrow',
        -tick    => 2,
        -fgcolor => 'black',
        -double  => 1,
    );


my %nlist;

while ( <DATA> ) {
    chomp;
    next if /^\#/;
    my ($sqi,$pos,$str,$progname) = split /\,/;
    my $start = $pos + $nofseq{$sqi};
    my $end = $start + length($str) + 1;
    push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
}

# Check which sequence has no motifs;
my @bssi = keys %nlist;

my $lc = List::Compare->new(\@seqid, \@bssi);
my @comp = $lc->get_unique;


foreach my $comp ( @comp  ) {
    push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";

}

my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );

foreach my $seqid ( sort keys %nlist ) {


    my $track = $panel->add_track(
        -glyph     => 'graded_segments',
        -key       => "SEQ ". $seqid,
        -connector => "dashed"
        -label     => 1,
        -bgcolor   => 'blue',
		-bump      =>  +1,
		-height    =>  8,
        -min_score => 0,
        -max_score => 5000
    );


    foreach my $range ( @{$nlist{$seqid}} ) {

        my ($st,$en,$progname) = split(" ", $range);
        my $dname = " ";
        if ( $st != 0 and $en !=0  ) {
           $dname = "Seq ". $seqid;
        }

        my $score;
        if ( $progname eq "WEEDER" ) {
            $score = $prog_color{$progname};

        }
        elsif ($progname eq "MEME" ) {
            $score = $prog_color{$progname};
        }

        my $feature = Bio::SeqFeature::Generic->new(
            -display_name => $dname,
            -start        => $st,
            -end          => $en,
            -score        => $score
        );

        $track->add_feature($feature);

    }

}

print $panel->png;

#The DATA is simply just list of string and its location in their
respective sequence.
# The figure is just the plot of it out.
__DATA__
# sequence number,pos,binding sites,program
4,-63,AGCTTTCTCT,MEME
0,-22,AACTTTGTAC,WEEDER
1,-13,AAGTTTCTCT,WEEDER
5,-228,ACCTTTGCCA,MEME
5,-121,AAGTTTGTCT,WEEDER
5,-88,AAGTTTTTCC,SPACE
3,-148,AACTTAGTCA,MEME
0,-184,AACTTTGTCT,MEME
__END__


Thanks and hope to hear from you again.

--
Regards,
Edward WIJAYA

From sm8 at sanger.ac.uk  Thu Apr 19 09:33:18 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 14:33:18 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B038675FB@exchsrv2.internal.sanger.ac.uk>

Thanks Sendu!  That is perfect.
Cheers
Stephen

-----Original Message-----
From: Sendu Bala [mailto:bix at sendu.me.uk] 
Sent: 19 April 2007 14:25
To: Stephen Montgomery
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] tree copy by-value

Stephen Montgomery wrote:
> my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object
> 
> as an example, a method like
> my $tree_copy = $tree->clone; #copies by value (this method doesn't
> exist) or
> my $tree_copy = Storable::dclone($tree); 

Right, sorry for being a little slow on the uptake. As a matter of fact 
I recently added _clone() to Bio::Tree::TreeFunctionsI which does a 
"safe tree clone that doesn't seg fault". Its undocumented and I thought

would only be needed by simplify_to_leaves_string(), but I guess I can 
document it and make it public (ie. remove the underscore from the name)

if this might be popular.

Oh, it's also not that well tested, so proceed with caution and provide 
feedback if you can.


Cheers,
Sendu.


From ewijaya at i2r.a-star.edu.sg  Thu Apr 19 09:59:05 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Thu, 19 Apr 2007 21:59:05 +0800
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
	Enable Connector
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>


Dear expert,

My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png <http://defiant.i2r.a-star.edu.sg/%7Eewijaya/misc/foo2.png> 
is created with the script (down below).

How can I modify the script such that:

1. The arrow track is represented in negative form.
   I.e. instead of 1 to 300, we use -300 to 0.

I tried this, but won't do:

my $flen = Bio::SeqFeature::Generic->new(
       -start => -300,
       -end => 0, );

And how can I make these number to appear
for every gridpoints (not just two as I have now).


2. How can I enable the connector with grid just like
  I had in the first panel? (as you can see, my script
  has connector added, but still doesn't show).

All, in all, I am trying to mimic this figure:
http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg <http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg> 

And here is my script:

__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
use List::Compare;
use List::Util qw(max);

my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
my @seqid = keys %nofseq;
my @lenlist = values %nofseq;
my $maxlen = max (@lenlist);
#print Dumper \@seqid ;

my $panel = Bio::Graphics::Panel->new(
   -length    => 300,
   -width     => 500,
   -pad_left  => 70,
   -pad_right => 70,
   -key_style => 'left',
   -connector => 'solid',
);

my $flen = Bio::SeqFeature::Generic->new(
       -start => 1,   # tried -300
       -end => 300, # and 0, but failed.
);

   my $track1 = $panel->add_track(
       $flen,
       -glyph   => 'arrow',
       -tick    => 2,
       -fgcolor => 'black',
       -double  => 1,
   );


my %nlist;

while ( <DATA> ) {
   chomp;
   next if /^\#/;
   my ($sqi,$pos,$str,$progname) = split /\,/;
   my $start = $pos + $nofseq{$sqi};
   my $end = $start + length($str) + 1;
   push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
}

# Check which sequence has no motifs;
my @bssi = keys %nlist;

my $lc = List::Compare->new(\@seqid, \@bssi);
my @comp = $lc->get_unique;


foreach my $comp ( @comp  ) {
   push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";

}

my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );

foreach my $seqid ( sort keys %nlist ) {


   my $track = $panel->add_track(
       -glyph     => 'graded_segments',
       -key       => "SEQ ". $seqid,
       -connector => "dashed"
       -label     => 1,
       -bgcolor   => 'blue',
               -bump      =>  +1,
               -height    =>  8,
       -min_score => 0,
       -max_score => 5000
   );


   foreach my $range ( @{$nlist{$seqid}} ) {

       my ($st,$en,$progname) = split(" ", $range);
       my $dname = " ";
       if ( $st != 0 and $en !=0  ) {
          $dname = "Seq ". $seqid;
       }

       my $score;
       if ( $progname eq "WEEDER" ) {
           $score = $prog_color{$progname};

       }
       elsif ($progname eq "MEME" ) {
           $score = $prog_color{$progname};
       }

       my $feature = Bio::SeqFeature::Generic->new(
           -display_name => $dname,
           -start        => $st,
           -end          => $en,
           -score        => $score
       );

       $track->add_feature($feature);

   }

}

print $panel->png;

#The DATA is simply just list of string and its location in their
respective sequence.
# The figure is just the plot of it out.
__DATA__
# sequence number,pos,binding sites,program
4,-63,AGCTTTCTCT,MEME
0,-22,AACTTTGTAC,WEEDER
1,-13,AAGTTTCTCT,WEEDER
5,-228,ACCTTTGCCA,MEME
5,-121,AAGTTTGTCT,WEEDER
5,-88,AAGTTTTTCC,SPACE
3,-148,AACTTAGTCA,MEME
0,-184,AACTTTGTCT,MEME
__END__


Thanks and hope to hear from you again.

--
Regards,

Edward WIJAYA

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From ioanniskirmitzoglou at gmail.com  Thu Apr 19 10:06:06 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Thu, 19 Apr 2007 17:06:06 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
Message-ID: <b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>

I have reported it as a bug on the bugzilla but due to bugzilla problems I
was not able to attach my code and/or sample m10 files.
Nevertheless here is the code that converts an m10 fasta output to an m8
BLAST output which is parseable by the vast majority of software.

<----------- CODE BEGINS HERE ------------------->

#!/usr/bin/perl -w

=head1 NAME

fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular output

=head1 SYNOPSIS

 fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...

=head1 DESCRIPTION

Command line options:
  --header                -- boolean flag to print column header
  -o/--out                -- optional outputfile to write data,
                             otherwise will write to STDOUT
  -h/--help               -- show this documentation

Not technically a SearchIO script as this doesn't use any Bioperl
components but is a useful and fast.  The output is tabular output
with the standard NCBI -m8 columns.

 queryname
 hit name
 percent identity
 alignment length
 number mismatches
 number gaps
 query start  (if on rev-strand start > end)
 query end
 hit start (if on rev-strand start > end)
 hit end
 evalue
 bit score

Additionally 4 more columns are provided
 percent similar
 query length
 hit length
 query gaps
 hit gaps

=head1 AUTHOR - Ioannis Kirmitzoglou

Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org

=head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou

Headers as well as portions of code were taken
from fastam9_to_table.pl by Jason Stajich

=head1 DISCLAIMER

Copyright (c) <2007> <Ioannis Kirmitzolgou>

Permission to use, copy, modify, merge, publish and distribute
this software and its documentation, with or without modification,
for any purpose, and without fee or royalty to the copyright holder(s)
is hereby granted with no restictions and/or prerequisites.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

=cut

use strict;
use Getopt::Long;

my %data=();

my $outfile=''; my $header='';
GetOptions(
    'header'              => \$header,
    'o|out|outfile:s'     => \$outfile,
    'h|help'              => sub { exec('perldoc',$0); exit; }
       );

my $outfh;
if( $outfile ) {
    open($outfh, ">$outfile") || die("$outfile: $!");
} else {
    $outfh = \*STDOUT;
}


$/="\n>>>";

my @fields = qw(qname hname percid alen mmcount gapcount
        qstart qend hstart hend evalue bits percsim qlen hlen qgap hgap);

print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)), "\n" if
$header;

while (<>) {

        chomp;
        if ($_=~/^>/ || $_=~/^\#/) {next;}
        my @hits = split(/\d+>>/, $_);
        @hits= split("\n>>", $hits[0]);

        my $hit = shift @hits;

        ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d+)/));

        foreach my $align (@hits) {

            my @details= split ("\n>", $align);
           my $detail = shift @details;
            ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
            $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
            $data{'bits'}=$1;
            $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
            $data{'evalue'}=$1;

            my $term = quotemeta("; sw_score");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'score'}=$1;

            $term = quotemeta("; sw_ident:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'percid'}=$1;

            $term = quotemeta("; sw_sim:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'percsim'}=$1;

            $term = quotemeta("; sw_overlap:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'alen'}=$1;

            $detail = shift @details;

            $term = quotemeta("; al_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'qstart'}=$1;

            $term = quotemeta("; al_stop:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'qend'}=$1;

            $term = quotemeta("; al_display_start:");
            $term =~ s/\\ /\\s/;
            my $lakis ='';
            $detail=~ /$term.+\s\-*([\-\w\s]+)/g;

            $data{'qgap'}=($1 =~ tr/\-//);

            $detail = shift @details;

            $term = quotemeta("; sq_len:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hlen'}=$1;

            $term = quotemeta("; al_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hstart'}=$1;

            $term = quotemeta("; al_stop:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hend'}=$1;

            $term = quotemeta("; al_display_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
            $data{'hgap'}=($1 =~ tr/-//);
            $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
            $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
$data{'alen'}) + $data{'gapcount'});

for ( $data{'percid'}, $data{'percsim'} ) {
    $_ = sprintf("%.2f",$_*100);
}

            print $outfh join( "\t",map { $data{$_} } @fields),"\n"
        }

}

<----------------- CODE ENDS HERE ---------------------->

-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus

From gilbertd at cricket.bio.indiana.edu  Thu Apr 19 13:38:05 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Thu, 19 Apr 2007 12:38:05 -0500 (EST)
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
Message-ID: <200704191738.l3JHc5s10658@cricket.bio.indiana.edu>


I'm not sure what kind of test data would have bad Target strings,
but this should clear up those warnings -- insert the '+' line:

  sub _gff3_string:
    for my $tag ( @all_tags ) {
       ##dgg.patch.was# next if $tag eq 'Target';
      if ($tag eq 'Target'
         and ! $origfeat->isa('Bio::SeqFeature::FeaturePair'))
       {  
       my($target_id, $b,$e,$strand)= $feat->get_tag_values($tag); 
+       next unless(defined($e) && defined($b) && $target_id);
       ($b,$e)= ($e,$b) if(defined $strand && $strand<0);
       $target_id =~ s/([\t\n\r%&\=;,])/sprintf("%%%X",ord($1))/ge;    
       push @groups, sprintf("Target=%s %d %d", $target_id,$b,$e);
       next;
       }

-- Don

From stefan.kirov at bms.com  Thu Apr 19 14:01:28 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 19 Apr 2007 14:01:28 -0400
Subject: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
In-Reply-To: <4626E1A3.4070405@i2r.a-star.edu.sg>
References: <462473B7.4070905@i2r.a-star.edu.sg> <4624D9F3.5050805@bms.com>
	<4626E1A3.4070405@i2r.a-star.edu.sg>
Message-ID: <4627AE78.200@bms.com>

I will see if I can post it or perhaps commit something to the bp 
scripts. In any case it won't be before Monday- I have deadlines to meet.
Stefan
Edward WIJAYA wrote:
>
> Hi Stefan,
>> I believe you can use Bio::Graphics for this. I have done so in the 
>> past and I find it quite straightforward.
> Do you still have that sample script? I don't find it simple to do.
> I was thinking of doing something like this:
>
> http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg 
>
>
> Appreciate if you can share it with us.
>
> -- 
> Edward
>
>
>>
>>
>> Edward WIJAYA wrote:
>>> Dear all,
>>>
>>> How do you usually construct a graph for TFBS (binding sites) position
>>> within their sequences? I was thinking to build something like this 
>>> kind of
>>> visualization tool:
>>>
>>> http://research.i2r.a-star.edu.sg/Dragon/Motif_Search/cgi-bin/tmp/29740M1.html 
>>>
>>>
>>> or
>>>
>>> http://wingless.cs.washington.edu:8080/assessment/servlet?filenameID=submission/SPACE.D9F26D506DE90E9A0A0010BB6BCCAEF3&pageType=visualizationForm&action=Visualize+It 
>>>
>>>
>>> Is there a BioPerl module to do that?
>>>
>>> -- 
>>> Edward
>>>
>>>
>>>
>>> ------------ Institute For Infocomm Research - Disclaimer -------------
>>> This email is confidential and may be privileged.  If you are not 
>>> the intended recipient, please delete it and notify us immediately. 
>>> Please do not copy or use it for any purpose, or disclose its 
>>> contents to any other person. Thank you.
>>> --------------------------------------------------------
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>   
>>
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the 
> intended recipient, please delete it and notify us immediately. Please 
> do not copy or use it for any purpose, or disclose its contents to any 
> other person. Thank you.
> --------------------------------------------------------
>


From shameer at ncbs.res.in  Fri Apr 20 07:45:23 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Fri, 20 Apr 2007 17:15:23 +0530 (IST)
Subject: [Bioperl-l] Protparam using BioPerl
In-Reply-To: <200704180718.48811.sdavis2@mail.nih.gov>
References: <4624E32A.6010704@bms.com>
	<36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
	<200704180718.48811.sdavis2@mail.nih.gov>
Message-ID: <45682.192.168.1.1.1177069523.squirrel@mail.ncbs.res.in>

Hi,

I would like to know whether Bioperl have a wrapper for protparam from
Expasy.
I need to calculate Instability Index using Guruprasad et.al 1990 values
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2075190&dopt=Abstract)
for 100 sequences I did some googling and I didnt get any valid
information.

Thanks,
-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From basu at pharm.sunysb.edu  Fri Apr 20 12:37:57 2007
From: basu at pharm.sunysb.edu (Siddhartha Basu)
Date: Fri, 20 Apr 2007 12:37:57 -0400
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
 Enable Connector
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
Message-ID: <4628EC65.7070505@pharm.sunysb.edu>

Hi,

Wijaya Edward wrote:
> Dear expert,
> 
> My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png <http://defiant.i2r.a-star.edu.sg/%7Eewijaya/misc/foo2.png> 
> is created with the script (down below).
> 
> How can I modify the script such that:
> 
> 1. The arrow track is represented in negative form.
>    I.e. instead of 1 to 300, we use -300 to 0.
> 
> I tried this, but won't do:
> 
> my $flen = Bio::SeqFeature::Generic->new(
>        -start => -300,
>        -end => 0, );

It works if you pass the 'SeqFeature' object to the '-segment' option of 
  "Bio::Graphics::Panel".

  my $panel = Bio::Graphics::Panel->new(
    -length    => 300,
   -width     => 500,
   -pad_left  => 70,
   -pad_right => 70,
    -key_style => 'left',
   -connector => 'solid',
      -segment => $flen,
);

For more, read one of the previous posting,
http://article.gmane.org/gmane.comp.lang.perl.bio.general/1721/match=negative+seqfeature

-siddhartha

> 
> And how can I make these number to appear
> for every gridpoints (not just two as I have now).
> 
> 
> 2. How can I enable the connector with grid just like
>   I had in the first panel? (as you can see, my script
>   has connector added, but still doesn't show).
> 
> All, in all, I am trying to mimic this figure:
> http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg <http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg> 
> 
> And here is my script:
> 
> __BEGIN__
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> use List::Compare;
> use List::Util qw(max);
> 
> my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
> my @seqid = keys %nofseq;
> my @lenlist = values %nofseq;
> my $maxlen = max (@lenlist);
> #print Dumper \@seqid ;
> 
> my $panel = Bio::Graphics::Panel->new(
>    -length    => 300,
>    -width     => 500,
>    -pad_left  => 70,
>    -pad_right => 70,
>    -key_style => 'left',
>    -connector => 'solid',
> );
> 
> my $flen = Bio::SeqFeature::Generic->new(
>        -start => 1,   # tried -300
>        -end => 300, # and 0, but failed.
> );
> 
>    my $track1 = $panel->add_track(
>        $flen,
>        -glyph   => 'arrow',
>        -tick    => 2,
>        -fgcolor => 'black',
>        -double  => 1,
>    );
> 
> 
> 
> my %nlist;
> 
> while ( <DATA> ) {
>    chomp;
>    next if /^\#/;
>    my ($sqi,$pos,$str,$progname) = split /\,/;
>    my $start = $pos + $nofseq{$sqi};
>    my $end = $start + length($str) + 1;
>    push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
> }
> 
> # Check which sequence has no motifs;
> my @bssi = keys %nlist;
> 
> my $lc = List::Compare->new(\@seqid, \@bssi);
> my @comp = $lc->get_unique;
> 
> 
> foreach my $comp ( @comp  ) {
>    push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";
> 
> }
> 
> my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );
> 
> foreach my $seqid ( sort keys %nlist ) {
> 
> 
>    my $track = $panel->add_track(
>        -glyph     => 'graded_segments',
>        -key       => "SEQ ". $seqid,
>        -connector => "dashed"
>        -label     => 1,
>        -bgcolor   => 'blue',
>                -bump      =>  +1,
>                -height    =>  8,
>        -min_score => 0,
>        -max_score => 5000
>    );
> 
> 
>    foreach my $range ( @{$nlist{$seqid}} ) {
> 
>        my ($st,$en,$progname) = split(" ", $range);
>        my $dname = " ";
>        if ( $st != 0 and $en !=0  ) {
>           $dname = "Seq ". $seqid;
>        }
> 
>        my $score;
>        if ( $progname eq "WEEDER" ) {
>            $score = $prog_color{$progname};
> 
>        }
>        elsif ($progname eq "MEME" ) {
>            $score = $prog_color{$progname};
>        }
> 
>        my $feature = Bio::SeqFeature::Generic->new(
>            -display_name => $dname,
>            -start        => $st,
>            -end          => $en,
>            -score        => $score
>        );
> 
>        $track->add_feature($feature);
> 
>    }
> 
> }
> 
> print $panel->png;
> 
> #The DATA is simply just list of string and its location in their
> respective sequence.
> # The figure is just the plot of it out.
> __DATA__
> # sequence number,pos,binding sites,program
> 4,-63,AGCTTTCTCT,MEME
> 0,-22,AACTTTGTAC,WEEDER
> 1,-13,AAGTTTCTCT,WEEDER
> 5,-228,ACCTTTGCCA,MEME
> 5,-121,AAGTTTGTCT,WEEDER
> 5,-88,AAGTTTTTCC,SPACE
> 3,-148,AACTTAGTCA,MEME
> 0,-184,AACTTTGTCT,MEME
> __END__
> 
> 
> Thanks and hope to hear from you again.
> 
> --
> Regards,
> 
> Edward WIJAYA
> 
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Fri Apr 20 15:47:30 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 20 Apr 2007 15:47:30 -0400
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
In-Reply-To: <200704191738.l3JHc5s10658@cricket.bio.indiana.edu>
Message-ID: <C24E9112.DD2B%bosborne11@verizon.net>

Applied.


On 4/19/07 1:38 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu> wrote:

> 
> I'm not sure what kind of test data would have bad Target strings,
> but this should clear up those warnings -- insert the '+' line:
> 
>   sub _gff3_string:
>     for my $tag ( @all_tags ) {
>        ##dgg.patch.was# next if $tag eq 'Target';
>       if ($tag eq 'Target'
>          and ! $origfeat->isa('Bio::SeqFeature::FeaturePair'))
>        {  
>        my($target_id, $b,$e,$strand)= $feat->get_tag_values($tag);
> +       next unless(defined($e) && defined($b) && $target_id);
>        ($b,$e)= ($e,$b) if(defined $strand && $strand<0);
>        $target_id =~ s/([\t\n\r%&\=;,])/sprintf("%%%X",ord($1))/ge;
>        push @groups, sprintf("Target=%s %d %d", $target_id,$b,$e);
>        next;
>        }
> 
> -- Don
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ewijaya at i2r.a-star.edu.sg  Sat Apr 21 10:44:08 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sat, 21 Apr 2007 22:44:08 +0800
Subject: [Bioperl-l] Getting Gene Sequences with Bioperl
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06168D@mailbe01.teak.local.net>


Hi all,
 
Is there a BioPerl module that allow us to extract
gene sequence given a list of gene names (gene symbol)?
 
In particular we would pass window size of the sequence,
then returning  upstream, downstream or ORF sequences for that list of genes.
We may also prespecify the on specific organism or all organsims.
 
Is there also a freely downloadable gene database that support
BioPerl module for that task?
 
Thanks and hope to hear from you again.
 
--
Edward WIJAYA
SINGAPORE

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From hlapp at gmx.net  Sat Apr 21 13:14:10 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 21 Apr 2007 13:14:10 -0400
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
Message-ID: <19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>

I haven't kept track of this - did this go anywhere? Do we not have  
an -m10 fasta output parser in SearchIO? (I.e., my first thought  
would be that that would be the desired solution; am I misled in this?)

	-hilmar

On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:

> I have reported it as a bug on the bugzilla but due to bugzilla  
> problems I
> was not able to attach my code and/or sample m10 files.
> Nevertheless here is the code that converts an m10 fasta output to  
> an m8
> BLAST output which is parseable by the vast majority of software.
>
> <----------- CODE BEGINS HERE ------------------->
>
> #!/usr/bin/perl -w
>
> =head1 NAME
>
> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular  
> output
>
> =head1 SYNOPSIS
>
>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>
> =head1 DESCRIPTION
>
> Command line options:
>   --header                -- boolean flag to print column header
>   -o/--out                -- optional outputfile to write data,
>                              otherwise will write to STDOUT
>   -h/--help               -- show this documentation
>
> Not technically a SearchIO script as this doesn't use any Bioperl
> components but is a useful and fast.  The output is tabular output
> with the standard NCBI -m8 columns.
>
>  queryname
>  hit name
>  percent identity
>  alignment length
>  number mismatches
>  number gaps
>  query start  (if on rev-strand start > end)
>  query end
>  hit start (if on rev-strand start > end)
>  hit end
>  evalue
>  bit score
>
> Additionally 4 more columns are provided
>  percent similar
>  query length
>  hit length
>  query gaps
>  hit gaps
>
> =head1 AUTHOR - Ioannis Kirmitzoglou
>
> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>
> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>
> Headers as well as portions of code were taken
>> from fastam9_to_table.pl by Jason Stajich
>
> =head1 DISCLAIMER
>
> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>
> Permission to use, copy, modify, merge, publish and distribute
> this software and its documentation, with or without modification,
> for any purpose, and without fee or royalty to the copyright holder(s)
> is hereby granted with no restictions and/or prerequisites.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>
> =cut
>
> use strict;
> use Getopt::Long;
>
> my %data=();
>
> my $outfile=''; my $header='';
> GetOptions(
>     'header'              => \$header,
>     'o|out|outfile:s'     => \$outfile,
>     'h|help'              => sub { exec('perldoc',$0); exit; }
>        );
>
> my $outfh;
> if( $outfile ) {
>     open($outfh, ">$outfile") || die("$outfile: $!");
> } else {
>     $outfh = \*STDOUT;
> }
>
>
> $/="\n>>>";
>
> my @fields = qw(qname hname percid alen mmcount gapcount
>         qstart qend hstart hend evalue bits percsim qlen hlen qgap  
> hgap);
>
> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),  
> "\n" if
> $header;
>
> while (<>) {
>
>         chomp;
>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>         my @hits = split(/\d+>>/, $_);
>         @hits= split("\n>>", $hits[0]);
>
>         my $hit = shift @hits;
>
>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d 
> +)/));
>
>         foreach my $align (@hits) {
>
>             my @details= split ("\n>", $align);
>            my $detail = shift @details;
>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>             $data{'bits'}=$1;
>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>             $data{'evalue'}=$1;
>
>             my $term = quotemeta("; sw_score");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'score'}=$1;
>
>             $term = quotemeta("; sw_ident:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percid'}=$1;
>
>             $term = quotemeta("; sw_sim:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percsim'}=$1;
>
>             $term = quotemeta("; sw_overlap:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'alen'}=$1;
>
>             $detail = shift @details;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             my $lakis ='';
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>
>             $data{'qgap'}=($1 =~ tr/\-//);
>
>             $detail = shift @details;
>
>             $term = quotemeta("; sq_len:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hlen'}=$1;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>             $data{'hgap'}=($1 =~ tr/-//);
>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>             $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
> $data{'alen'}) + $data{'gapcount'});
>
> for ( $data{'percid'}, $data{'percsim'} ) {
>     $_ = sprintf("%.2f",$_*100);
> }
>
>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>         }
>
> }
>
> <----------------- CODE ENDS HERE ---------------------->
>
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Sat Apr 21 13:44:00 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 21 Apr 2007 10:44:00 -0700
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
Message-ID: <E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>

We don't have one yet. This is a new format introduced in the most  
recent release of FASTA.  Hopefully someone can make some time to add  
some code to SearchIO::fasta for it.

I do find that I when I need a fast FASTA to TAB converter that the  
simple script (fastam9_to_table) is more efficient that SearchIO  
framework so Ioannis is making a parallel one for the new m10  
output.  So I think having both is useful.

-jason
On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:

> I haven't kept track of this - did this go anywhere? Do we not have
> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> would be that that would be the desired solution; am I misled in  
> this?)
>
> 	-hilmar
>
> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>
>> I have reported it as a bug on the bugzilla but due to bugzilla
>> problems I
>> was not able to attach my code and/or sample m10 files.
>> Nevertheless here is the code that converts an m10 fasta output to
>> an m8
>> BLAST output which is parseable by the vast majority of software.
>>
>> <----------- CODE BEGINS HERE ------------------->
>>
>> #!/usr/bin/perl -w
>>
>> =head1 NAME
>>
>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
>> output
>>
>> =head1 SYNOPSIS
>>
>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>
>> =head1 DESCRIPTION
>>
>> Command line options:
>>   --header                -- boolean flag to print column header
>>   -o/--out                -- optional outputfile to write data,
>>                              otherwise will write to STDOUT
>>   -h/--help               -- show this documentation
>>
>> Not technically a SearchIO script as this doesn't use any Bioperl
>> components but is a useful and fast.  The output is tabular output
>> with the standard NCBI -m8 columns.
>>
>>  queryname
>>  hit name
>>  percent identity
>>  alignment length
>>  number mismatches
>>  number gaps
>>  query start  (if on rev-strand start > end)
>>  query end
>>  hit start (if on rev-strand start > end)
>>  hit end
>>  evalue
>>  bit score
>>
>> Additionally 4 more columns are provided
>>  percent similar
>>  query length
>>  hit length
>>  query gaps
>>  hit gaps
>>
>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>
>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>
>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>
>> Headers as well as portions of code were taken
>>> from fastam9_to_table.pl by Jason Stajich
>>
>> =head1 DISCLAIMER
>>
>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>
>> Permission to use, copy, modify, merge, publish and distribute
>> this software and its documentation, with or without modification,
>> for any purpose, and without fee or royalty to the copyright holder 
>> (s)
>> is hereby granted with no restictions and/or prerequisites.
>>
>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND  
>> NONINFRINGEMENT.
>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>
>> =cut
>>
>> use strict;
>> use Getopt::Long;
>>
>> my %data=();
>>
>> my $outfile=''; my $header='';
>> GetOptions(
>>     'header'              => \$header,
>>     'o|out|outfile:s'     => \$outfile,
>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>>        );
>>
>> my $outfh;
>> if( $outfile ) {
>>     open($outfh, ">$outfile") || die("$outfile: $!");
>> } else {
>>     $outfh = \*STDOUT;
>> }
>>
>>
>> $/="\n>>>";
>>
>> my @fields = qw(qname hname percid alen mmcount gapcount
>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
>> hgap);
>>
>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>> "\n" if
>> $header;
>>
>> while (<>) {
>>
>>         chomp;
>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>>         my @hits = split(/\d+>>/, $_);
>>         @hits= split("\n>>", $hits[0]);
>>
>>         my $hit = shift @hits;
>>
>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> +)/));
>>
>>         foreach my $align (@hits) {
>>
>>             my @details= split ("\n>", $align);
>>            my $detail = shift @details;
>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>>             $data{'bits'}=$1;
>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>>             $data{'evalue'}=$1;
>>
>>             my $term = quotemeta("; sw_score");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'score'}=$1;
>>
>>             $term = quotemeta("; sw_ident:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percid'}=$1;
>>
>>             $term = quotemeta("; sw_sim:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percsim'}=$1;
>>
>>             $term = quotemeta("; sw_overlap:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'alen'}=$1;
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             my $lakis ='';
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>
>>             $data{'qgap'}=($1 =~ tr/\-//);
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; sq_len:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hlen'}=$1;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>             $data{'hgap'}=($1 =~ tr/-//);
>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>>             $data{'mmcount'} = $data{'alen'} - ( int($data 
>> {'percid'} *
>> $data{'alen'}) + $data{'gapcount'});
>>
>> for ( $data{'percid'}, $data{'percsim'} ) {
>>     $_ = sprintf("%.2f",$_*100);
>> }
>>
>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>>         }
>>
>> }
>>
>> <----------------- CODE ENDS HERE ---------------------->
>>
>> -- 
>>
>> *Ioannis Kirmitzoglou*, MSc
>> PhD. Student,
>> Bioinformatics Research Laboratory
>> Department of Biological Sciences
>> University of Cyprus
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From akozik at atgc.org  Sat Apr 21 13:40:47 2007
From: akozik at atgc.org (Alexander Kozik)
Date: Sat, 21 Apr 2007 10:40:47 -0700
Subject: [Bioperl-l] ncbi blast -V T option
Message-ID: <462A4C9F.8010902@atgc.org>

Hi all,

It was many postings about parsing problems of stand-alone (local) NCBI 
Blast output of version 2.2.15 or later. Recently, I (re?)-discovered 
that Blast option '-V T' fixes the problem with old parsers I have. 
Option '-V T' generates detailed statistics after _each_ query sequence 
in Blast output, like:
... ... ...
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 17,650,109
Number of Sequences: 26534
Number of extensions: 430364
Number of successful extensions: 1496
Number of sequences better than 1.0e-020: 1
Number of HSP's better than  0.0 without gapping: 1400
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 1495
length of database: 11,047,616
effective HSP length: 96
effective length of database: 8,500,352
effective search space used: 1275052800
frameshift window, decay const: 40,  0.1
... ... ...

Option '-V F' (default) will generate statistics at the end of batch 
Blast output summarizing all query hits together.

Did I miss something from previous postings?
Sorry, if it was already discussed.

-Alex

-- 
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/


From gdorjee at hotmail.com  Sat Apr 21 15:14:05 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 21 Apr 2007 12:14:05 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
	<10024333.post@talk.nabble.com>
	<54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>
Message-ID: <10120148.post@talk.nabble.com>


hi
how do i check to see if i've installed the bioperl on my system properly. i
think i installed the bioperl-1.5.2_101 version, but i can't say for sure.
althought i can use some of the modules like Bio::SearchIO and
Bio::SearchIO, i can't seem to get the remote blast working for some reason.
is this something to do with the bioperl installation? i'm using perl v5.6.1
built for sun4-solaris-64int. 
i tried to install the same bioperl version on my Linux machine which has
perl v5.8.5 built for i386-linux-thread-multi, and it seem to give me the
same problem with the remote blast.
your help would be much appreciated.
thanks


Chris Fields wrote:
> 
> What version of bioperl are you using?  I get an error but it is b/c  
> the ID doesn't exist.
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: acc KPYK_ECOLI does not exist
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
> Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /Users/cjfields/src/bioperl- 
> live/Bio/DB/WebDBSeqI.pm:181
> STACK: genpept.pl:21
> -----------------------------------------------------------
> 
> The actual accession is 'KPYK1_ECOLI'.
> 
> chris
> 
> On Apr 16, 2007, at 3:42 PM, DeeGee wrote:
> 
>>
>> hi
>> i tried the following code just to check the network, and it worked  
>> fine
>> except for the SwissProt part, for which i got the error message  
>> instead of
>> the sequence:
>>
>> ------------- EXCEPTION  -------------
>> MSG: swissprot stream with no ID. Not swissprot in my book
>> STACK Bio::SeqIO::swiss::next_seq
>> /usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
>> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
>> /usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
>> STACK toplevel bbbbb.pl:21
>> --------------------------------------
>>
>> #### check #####
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::GenBank;
>> use Bio::DB::SwissProt;
>> use Bio::DB::GenPept;
>> use Bio::SeqIO;
>>
>> my $genpeptdb = new Bio::DB::GenPept();
>> my $genbankdb = new Bio::DB::GenBank();
>> my $swissdb = new Bio::DB::SwissProt();
>>
>> my $seqio = new Bio::SeqIO(-format => 'fasta',
>>                            -fh     => \*STDOUT);
>>
>> my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
>> $seqio->write_seq($protseq);
>>
>> my $seq = $genbankdb->get_Seq_by_acc('AF303112');
>> $seqio->write_seq($seq);
>>
>> $protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
>> $seqio->write_seq($protseq);
>>
>> thanks a lot.
>>
>>
>> Chris Fields wrote:
>>>
>>> The 'verbose' setting doesn't change the way the BLAST query is sent,
>>> it just sends the raw output from the repeated attempts to retrieve
>>> the report (using the RID) to STDERR.  The error you saw won't be
>>> fixed by doing so.
>>>
>>> What I was interested in was the raw HTML output dumped to the
>>> screen.  If it is querying the NCBI server it should dump stuff that
>>> includes something like this:
>>>
>>> ...
>>> <HTML>
>>> <p></p>
>>> <!--
>>> QBlastInfoBegin
>>>          Status=WAITING
>>> QBlastInfoEnd
>>> --><p></p>
>>> <SCRIPT LANGUAGE="JavaScript"><!--
>>> ...
>>>
>>> which indicates you have a request in the BLAST queue.  If you aren't
>>> seeing anything then the problem is likely network-related on your
>>> end, so getting the latest RemoteBlast won't help.  Do any other
>>> BioPerl modules requiring network access work (Bio::DB::GenBank, for
>>> instance)?  If not it could be a proxy issue...
>>>
>>> Just in case, here's the browsable CVS location for RemoteBlast:
>>>
>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>>> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
>>>
>>> Click on the download link and save over your local version.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
>>>
>>>>
>>>> hi Chris,
>>>> thanks for your reply. i set the RemoteBlast factory to a verbosity
>>>> of 1,
>>>> and i get the same error message. i'm new to all these. so, could
>>>> you plz
>>>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>>>
>>>> cheers!!!
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/error-while- 
>> remote-blast-against-swissprot-db-tf3577674.html#a10024333
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10120148
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Sat Apr 21 16:09:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 21 Apr 2007 15:09:48 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
Message-ID: <A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>

Ioannis's fastm10_to_table script is available in the bugzilla  
enhancement request (as an attachment) if anyone's interested:

http://bugzilla.open-bio.org/show_bug.cgi?id=2278

I haven't had a chance to really look into m10 output yet but it  
looks easy enough to parse; may not be hard to get something SearchIO- 
based up and running.

chris

On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:

> We don't have one yet. This is a new format introduced in the most
> recent release of FASTA.  Hopefully someone can make some time to add
> some code to SearchIO::fasta for it.
>
> I do find that I when I need a fast FASTA to TAB converter that the
> simple script (fastam9_to_table) is more efficient that SearchIO
> framework so Ioannis is making a parallel one for the new m10
> output.  So I think having both is useful.
>
> -jason
> On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
>
>> I haven't kept track of this - did this go anywhere? Do we not have
>> an -m10 fasta output parser in SearchIO? (I.e., my first thought
>> would be that that would be the desired solution; am I misled in
>> this?)
>>
>> 	-hilmar
>>
>> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>>
>>> I have reported it as a bug on the bugzilla but due to bugzilla
>>> problems I
>>> was not able to attach my code and/or sample m10 files.
>>> Nevertheless here is the code that converts an m10 fasta output to
>>> an m8
>>> BLAST output which is parseable by the vast majority of software.
>>>
>>> <----------- CODE BEGINS HERE ------------------->
>>>
>>> #!/usr/bin/perl -w
>>>
>>> =head1 NAME
>>>
>>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
>>> output
>>>
>>> =head1 SYNOPSIS
>>>
>>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>>
>>> =head1 DESCRIPTION
>>>
>>> Command line options:
>>>   --header                -- boolean flag to print column header
>>>   -o/--out                -- optional outputfile to write data,
>>>                              otherwise will write to STDOUT
>>>   -h/--help               -- show this documentation
>>>
>>> Not technically a SearchIO script as this doesn't use any Bioperl
>>> components but is a useful and fast.  The output is tabular output
>>> with the standard NCBI -m8 columns.
>>>
>>>  queryname
>>>  hit name
>>>  percent identity
>>>  alignment length
>>>  number mismatches
>>>  number gaps
>>>  query start  (if on rev-strand start > end)
>>>  query end
>>>  hit start (if on rev-strand start > end)
>>>  hit end
>>>  evalue
>>>  bit score
>>>
>>> Additionally 4 more columns are provided
>>>  percent similar
>>>  query length
>>>  hit length
>>>  query gaps
>>>  hit gaps
>>>
>>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>>
>>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>>
>>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>>
>>> Headers as well as portions of code were taken
>>>> from fastam9_to_table.pl by Jason Stajich
>>>
>>> =head1 DISCLAIMER
>>>
>>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>>
>>> Permission to use, copy, modify, merge, publish and distribute
>>> this software and its documentation, with or without modification,
>>> for any purpose, and without fee or royalty to the copyright holder
>>> (s)
>>> is hereby granted with no restictions and/or prerequisites.
>>>
>>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>> NONINFRINGEMENT.
>>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>>
>>> =cut
>>>
>>> use strict;
>>> use Getopt::Long;
>>>
>>> my %data=();
>>>
>>> my $outfile=''; my $header='';
>>> GetOptions(
>>>     'header'              => \$header,
>>>     'o|out|outfile:s'     => \$outfile,
>>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>>>        );
>>>
>>> my $outfh;
>>> if( $outfile ) {
>>>     open($outfh, ">$outfile") || die("$outfile: $!");
>>> } else {
>>>     $outfh = \*STDOUT;
>>> }
>>>
>>>
>>> $/="\n>>>";
>>>
>>> my @fields = qw(qname hname percid alen mmcount gapcount
>>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
>>> hgap);
>>>
>>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>>> "\n" if
>>> $header;
>>>
>>> while (<>) {
>>>
>>>         chomp;
>>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>>>         my @hits = split(/\d+>>/, $_);
>>>         @hits= split("\n>>", $hits[0]);
>>>
>>>         my $hit = shift @hits;
>>>
>>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>>> +)/));
>>>
>>>         foreach my $align (@hits) {
>>>
>>>             my @details= split ("\n>", $align);
>>>            my $detail = shift @details;
>>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>>>             $data{'bits'}=$1;
>>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>>>             $data{'evalue'}=$1;
>>>
>>>             my $term = quotemeta("; sw_score");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'score'}=$1;
>>>
>>>             $term = quotemeta("; sw_ident:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'percid'}=$1;
>>>
>>>             $term = quotemeta("; sw_sim:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'percsim'}=$1;
>>>
>>>             $term = quotemeta("; sw_overlap:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'alen'}=$1;
>>>
>>>             $detail = shift @details;
>>>
>>>             $term = quotemeta("; al_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'qstart'}=$1;
>>>
>>>             $term = quotemeta("; al_stop:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'qend'}=$1;
>>>
>>>             $term = quotemeta("; al_display_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             my $lakis ='';
>>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>>
>>>             $data{'qgap'}=($1 =~ tr/\-//);
>>>
>>>             $detail = shift @details;
>>>
>>>             $term = quotemeta("; sq_len:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hlen'}=$1;
>>>
>>>             $term = quotemeta("; al_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hstart'}=$1;
>>>
>>>             $term = quotemeta("; al_stop:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hend'}=$1;
>>>
>>>             $term = quotemeta("; al_display_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>>             $data{'hgap'}=($1 =~ tr/-//);
>>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>>>             $data{'mmcount'} = $data{'alen'} - ( int($data
>>> {'percid'} *
>>> $data{'alen'}) + $data{'gapcount'});
>>>
>>> for ( $data{'percid'}, $data{'percsim'} ) {
>>>     $_ = sprintf("%.2f",$_*100);
>>> }
>>>
>>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>>>         }
>>>
>>> }
>>>
>>> <----------------- CODE ENDS HERE ---------------------->
>>>
>>> -- 
>>>
>>> *Ioannis Kirmitzoglou*, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at i2r.a-star.edu.sg  Sun Apr 22 07:59:28 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sun, 22 Apr 2007 19:59:28 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	
	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com><3ACF03E372996
	C4EACD542EA8A05E66A061684@mailbe01.teak.local.net><AAF82F3A-3C75-4D51-AFD4-
	FDE358391A03@fruitfly.org>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061690@mailbe01.teak.local.net>


Hi Chris,
 
I've downloaded GO Database.
Which of these we should install in our MySQL database,
so that it can be used for GO::AppHandle task below?
 
-rw-rw-r--   1 ewijaya ewijaya 1.6G Apr  9 12:23 go_200704-assocdb-data
-rw-rw-r--   1 ewijaya ewijaya 483M Apr  9 12:23 go_200704-assocdb.rdf-xml
-rw-rw-r--   1 ewijaya ewijaya 3.2K Apr  9 12:23 go_200704-assocdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  7 00:41 go_200704-assocdb-tables
-rw-rw-r--   1 ewijaya ewijaya 3.3K Apr  9 12:23 go_200704-obo-xml.dtd
-rw-rw-r--   1 ewijaya ewijaya 4.5K Apr  9 12:23 go_200704-rdf.dtd
-rw-rw-r--   1 ewijaya ewijaya  29K Apr  9 12:23 go_200704-schema-mysql.sql
-rw-rw-r--   1 ewijaya ewijaya 3.1G Apr  9 12:25 go_200704-seqdb-data
-rw-rw-r--   1 ewijaya ewijaya  93M Apr  9 12:26 go_200704-seqdb.fasta
-rw-rw-r--   1 ewijaya ewijaya 3.2K Apr  9 12:25 go_200704-seqdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  8 05:38 go_200704-seqdb-tables
-rw-rw-r--   1 ewijaya ewijaya  51M Apr  9 12:26 go_200704-termdb-data
-rw-rw-r--   1 ewijaya ewijaya  18M Apr  9 12:26 go_200704-termdb.obo-xml
-rw-rw-r--   1 ewijaya ewijaya  39M Apr  9 12:26 go_200704-termdb.owl
-rw-rw-r--   1 ewijaya ewijaya  29M Apr  9 12:26 go_200704-termdb.rdf-xml
-rw-rw-r--   1 ewijaya ewijaya  749 Apr  9 12:26 go_200704-termdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  2 00:31 go_200704-termdb-tables
drwxrwxr-x  22 ewijaya ewijaya 4.0K Apr  1 23:35 go_200704-utilities-src

Or is there a way we can upload all of them automatically to mysql database?
Thanks and hope to hear from you again.
 
--
Edward
 

________________________________

From: Chris Mungall [mailto:cjm at fruitfly.org]
Sent: Tue 4/17/2007 2:49 AM
To: Wijaya Edward
Cc: spiros at lokku.com; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


Download:
http://search.cpan.org/~cmungall/go-db-perl

or do:

cpan GO::AppHandle

The API call you want is here:
http://search.cpan.org/~cmungall/go-db-perl/GO/
AppHandle.pm#get_deep_products

Here is an example snippet:

   use GO::AppHandle;
   my $apph=GO::AppHandle->connect(@ARGV);
   my $go_acc = shift @ARGV;
   my $gps = $apph->get_deep_products({term=>{acc=>$go_acc}});
   foreach my $gp (@$gps) {
     printf "%s %s\n", $gp->xref->acc, $gp->symbol;
   }

You will need to download the GO Database.

Cheers
Chris

On Apr 16, 2007, at 8:14 AM, Wijaya Edward wrote:

>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) 
> with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer 
>> -------------
>> This email is confidential and may be privileged.  If you are not 
>> the intended recipient, please delete it and notify us 
>> immediately. Please do not copy or use it for any purpose, or 
>> disclose its contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer 
> -------------
> This email is confidential and may be privileged.  If you are not 
> the intended recipient, please delete it and notify us immediately. 
> Please do not copy or use it for any purpose, or disclose its 
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------

From ioanniskirmitzoglou at gmail.com  Sun Apr 22 13:11:35 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Sun, 22 Apr 2007 20:11:35 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
Message-ID: <b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>

I agree with Jason. Both scripts (fastam9_to_table and fastam10_to_table)
are way faster and easier to use than the searchIO. Still, there are a lot
of cases where searchIO support for m10 would be useful (e.g when trying to
represent the alignment in a graphical way).
Nevertheless I do think that FASTA needs an output similar to the BLAST m8
one which is really compact. Although I haven't tried it yet I do believe
that both scripts can be piped, so one easy and rather fast way to produce
an tabular output from FASTA would be to pipe its output directly to one of
the scripts.
-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


On 21/04/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> Ioannis's fastm10_to_table script is available in the bugzilla
> enhancement request (as an attachment) if anyone's interested:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2278
>
> I haven't had a chance to really look into m10 output yet but it
> looks easy enough to parse; may not be hard to get something SearchIO-
> based up and running.
>
> chris
>
> On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:
>
> > We don't have one yet. This is a new format introduced in the most
> > recent release of FASTA.  Hopefully someone can make some time to add
> > some code to SearchIO::fasta for it.
> >
> > I do find that I when I need a fast FASTA to TAB converter that the
> > simple script (fastam9_to_table) is more efficient that SearchIO
> > framework so Ioannis is making a parallel one for the new m10
> > output.  So I think having both is useful.
> >
> > -jason
> > On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
> >
> >> I haven't kept track of this - did this go anywhere? Do we not have
> >> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> >> would be that that would be the desired solution; am I misled in
> >> this?)
> >>
> >>      -hilmar
> >>
> >> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
> >>
> >>> I have reported it as a bug on the bugzilla but due to bugzilla
> >>> problems I
> >>> was not able to attach my code and/or sample m10 files.
> >>> Nevertheless here is the code that converts an m10 fasta output to
> >>> an m8
> >>> BLAST output which is parseable by the vast majority of software.
> >>>
> >>> <----------- CODE BEGINS HERE ------------------->
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> =head1 NAME
> >>>
> >>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
> >>> output
> >>>
> >>> =head1 SYNOPSIS
> >>>
> >>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
> >>>
> >>> =head1 DESCRIPTION
> >>>
> >>> Command line options:
> >>>   --header                -- boolean flag to print column header
> >>>   -o/--out                -- optional outputfile to write data,
> >>>                              otherwise will write to STDOUT
> >>>   -h/--help               -- show this documentation
> >>>
> >>> Not technically a SearchIO script as this doesn't use any Bioperl
> >>> components but is a useful and fast.  The output is tabular output
> >>> with the standard NCBI -m8 columns.
> >>>
> >>>  queryname
> >>>  hit name
> >>>  percent identity
> >>>  alignment length
> >>>  number mismatches
> >>>  number gaps
> >>>  query start  (if on rev-strand start > end)
> >>>  query end
> >>>  hit start (if on rev-strand start > end)
> >>>  hit end
> >>>  evalue
> >>>  bit score
> >>>
> >>> Additionally 4 more columns are provided
> >>>  percent similar
> >>>  query length
> >>>  hit length
> >>>  query gaps
> >>>  hit gaps
> >>>
> >>> =head1 AUTHOR - Ioannis Kirmitzoglou
> >>>
> >>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
> >>>
> >>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
> >>>
> >>> Headers as well as portions of code were taken
> >>>> from fastam9_to_table.pl by Jason Stajich
> >>>
> >>> =head1 DISCLAIMER
> >>>
> >>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
> >>>
> >>> Permission to use, copy, modify, merge, publish and distribute
> >>> this software and its documentation, with or without modification,
> >>> for any purpose, and without fee or royalty to the copyright holder
> >>> (s)
> >>> is hereby granted with no restictions and/or prerequisites.
> >>>
> >>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> >>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> >>> NONINFRINGEMENT.
> >>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> >>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> >>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> >>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> >>>
> >>> =cut
> >>>
> >>> use strict;
> >>> use Getopt::Long;
> >>>
> >>> my %data=();
> >>>
> >>> my $outfile=''; my $header='';
> >>> GetOptions(
> >>>     'header'              => \$header,
> >>>     'o|out|outfile:s'     => \$outfile,
> >>>     'h|help'              => sub { exec('perldoc',$0); exit; }
> >>>        );
> >>>
> >>> my $outfh;
> >>> if( $outfile ) {
> >>>     open($outfh, ">$outfile") || die("$outfile: $!");
> >>> } else {
> >>>     $outfh = \*STDOUT;
> >>> }
> >>>
> >>>
> >>> $/="\n>>>";
> >>>
> >>> my @fields = qw(qname hname percid alen mmcount gapcount
> >>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
> >>> hgap);
> >>>
> >>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
> >>> "\n" if
> >>> $header;
> >>>
> >>> while (<>) {
> >>>
> >>>         chomp;
> >>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
> >>>         my @hits = split(/\d+>>/, $_);
> >>>         @hits= split("\n>>", $hits[0]);
> >>>
> >>>         my $hit = shift @hits;
> >>>
> >>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
> >>> +)/));
> >>>
> >>>         foreach my $align (@hits) {
> >>>
> >>>             my @details= split ("\n>", $align);
> >>>            my $detail = shift @details;
> >>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
> >>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
> >>>             $data{'bits'}=$1;
> >>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
> >>>             $data{'evalue'}=$1;
> >>>
> >>>             my $term = quotemeta("; sw_score");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'score'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_ident:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'percid'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_sim:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'percsim'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_overlap:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'alen'}=$1;
> >>>
> >>>             $detail = shift @details;
> >>>
> >>>             $term = quotemeta("; al_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'qstart'}=$1;
> >>>
> >>>             $term = quotemeta("; al_stop:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'qend'}=$1;
> >>>
> >>>             $term = quotemeta("; al_display_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             my $lakis ='';
> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
> >>>
> >>>             $data{'qgap'}=($1 =~ tr/\-//);
> >>>
> >>>             $detail = shift @details;
> >>>
> >>>             $term = quotemeta("; sq_len:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hlen'}=$1;
> >>>
> >>>             $term = quotemeta("; al_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hstart'}=$1;
> >>>
> >>>             $term = quotemeta("; al_stop:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hend'}=$1;
> >>>
> >>>             $term = quotemeta("; al_display_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
> >>>             $data{'hgap'}=($1 =~ tr/-//);
> >>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
> >>>             $data{'mmcount'} = $data{'alen'} - ( int($data
> >>> {'percid'} *
> >>> $data{'alen'}) + $data{'gapcount'});
> >>>
> >>> for ( $data{'percid'}, $data{'percsim'} ) {
> >>>     $_ = sprintf("%.2f",$_*100);
> >>> }
> >>>
> >>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
> >>>         }
> >>>
> >>> }
> >>>
> >>> <----------------- CODE ENDS HERE ---------------------->
> >>>
> >>> --
> >>>
> >>> *Ioannis Kirmitzoglou*, MSc
> >>> PhD. Student,
> >>> Bioinformatics Research Laboratory
> >>> Department of Biological Sciences
> >>> University of Cyprus
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org
> > http://jason.open-bio.org/
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

From jason at bioperl.org  Sun Apr 22 16:24:23 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 22 Apr 2007 13:24:23 -0700
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
Message-ID: <69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>

I do think that m9 is pretty compact if you don't need to see the  
alignment and just want the pairwise statistics and is analogous to  
BLAST m8/9 format.   I typically just use that + fastam9_to_table for  
input to MCL and other systems that can process tabular formats.

I cleaned up a few things in SearchIO::fasta but have not been able  
to see whether we can auto-detect m10 format and insert the necessary  
code just yet.

-jason
On Apr 22, 2007, at 10:11 AM, Ioannis Kirmitzoglou wrote:

> I agree with Jason. Both scripts (fastam9_to_table and  
> fastam10_to_table)
> are way faster and easier to use than the searchIO. Still, there  
> are a lot
> of cases where searchIO support for m10 would be useful (e.g when  
> trying to
> represent the alignment in a graphical way).
> Nevertheless I do think that FASTA needs an output similar to the  
> BLAST m8
> one which is really compact. Although I haven't tried it yet I do  
> believe
> that both scripts can be piped, so one easy and rather fast way to  
> produce
> an tabular output from FASTA would be to pipe its output directly  
> to one of
> the scripts.
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
>
>
>
> On 21/04/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>> Ioannis's fastm10_to_table script is available in the bugzilla
>> enhancement request (as an attachment) if anyone's interested:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2278
>>
>> I haven't had a chance to really look into m10 output yet but it
>> looks easy enough to parse; may not be hard to get something  
>> SearchIO-
>> based up and running.
>>
>> chris
>>
>> On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:
>>
>> > We don't have one yet. This is a new format introduced in the most
>> > recent release of FASTA.  Hopefully someone can make some time  
>> to add
>> > some code to SearchIO::fasta for it.
>> >
>> > I do find that I when I need a fast FASTA to TAB converter that the
>> > simple script (fastam9_to_table) is more efficient that SearchIO
>> > framework so Ioannis is making a parallel one for the new m10
>> > output.  So I think having both is useful.
>> >
>> > -jason
>> > On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
>> >
>> >> I haven't kept track of this - did this go anywhere? Do we not  
>> have
>> >> an -m10 fasta output parser in SearchIO? (I.e., my first thought
>> >> would be that that would be the desired solution; am I misled in
>> >> this?)
>> >>
>> >>      -hilmar
>> >>
>> >> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>> >>
>> >>> I have reported it as a bug on the bugzilla but due to bugzilla
>> >>> problems I
>> >>> was not able to attach my code and/or sample m10 files.
>> >>> Nevertheless here is the code that converts an m10 fasta  
>> output to
>> >>> an m8
>> >>> BLAST output which is parseable by the vast majority of software.
>> >>>
>> >>> <----------- CODE BEGINS HERE ------------------->
>> >>>
>> >>> #!/usr/bin/perl -w
>> >>>
>> >>> =head1 NAME
>> >>>
>> >>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8  
>> tabular
>> >>> output
>> >>>
>> >>> =head1 SYNOPSIS
>> >>>
>> >>>  fastam10_to_table [--header] [-o outfile] inputfile1  
>> inputfile2 ...
>> >>>
>> >>> =head1 DESCRIPTION
>> >>>
>> >>> Command line options:
>> >>>   --header                -- boolean flag to print column header
>> >>>   -o/--out                -- optional outputfile to write data,
>> >>>                              otherwise will write to STDOUT
>> >>>   -h/--help               -- show this documentation
>> >>>
>> >>> Not technically a SearchIO script as this doesn't use any Bioperl
>> >>> components but is a useful and fast.  The output is tabular  
>> output
>> >>> with the standard NCBI -m8 columns.
>> >>>
>> >>>  queryname
>> >>>  hit name
>> >>>  percent identity
>> >>>  alignment length
>> >>>  number mismatches
>> >>>  number gaps
>> >>>  query start  (if on rev-strand start > end)
>> >>>  query end
>> >>>  hit start (if on rev-strand start > end)
>> >>>  hit end
>> >>>  evalue
>> >>>  bit score
>> >>>
>> >>> Additionally 4 more columns are provided
>> >>>  percent similar
>> >>>  query length
>> >>>  hit length
>> >>>  query gaps
>> >>>  hit gaps
>> >>>
>> >>> =head1 AUTHOR - Ioannis Kirmitzoglou
>> >>>
>> >>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>> >>>
>> >>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>> >>>
>> >>> Headers as well as portions of code were taken
>> >>>> from fastam9_to_table.pl by Jason Stajich
>> >>>
>> >>> =head1 DISCLAIMER
>> >>>
>> >>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>> >>>
>> >>> Permission to use, copy, modify, merge, publish and distribute
>> >>> this software and its documentation, with or without  
>> modification,
>> >>> for any purpose, and without fee or royalty to the copyright  
>> holder
>> >>> (s)
>> >>> is hereby granted with no restictions and/or prerequisites.
>> >>>
>> >>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> >>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE  
>> WARRANTIES OF
>> >>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> >>> NONINFRINGEMENT.
>> >>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE  
>> FOR ANY
>> >>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF  
>> CONTRACT,
>> >>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> >>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>> >>>
>> >>> =cut
>> >>>
>> >>> use strict;
>> >>> use Getopt::Long;
>> >>>
>> >>> my %data=();
>> >>>
>> >>> my $outfile=''; my $header='';
>> >>> GetOptions(
>> >>>     'header'              => \$header,
>> >>>     'o|out|outfile:s'     => \$outfile,
>> >>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>> >>>        );
>> >>>
>> >>> my $outfh;
>> >>> if( $outfile ) {
>> >>>     open($outfh, ">$outfile") || die("$outfile: $!");
>> >>> } else {
>> >>>     $outfh = \*STDOUT;
>> >>> }
>> >>>
>> >>>
>> >>> $/="\n>>>";
>> >>>
>> >>> my @fields = qw(qname hname percid alen mmcount gapcount
>> >>>         qstart qend hstart hend evalue bits percsim qlen hlen  
>> qgap
>> >>> hgap);
>> >>>
>> >>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) }  
>> @fields)),
>> >>> "\n" if
>> >>> $header;
>> >>>
>> >>> while (<>) {
>> >>>
>> >>>         chomp;
>> >>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>> >>>         my @hits = split(/\d+>>/, $_);
>> >>>         @hits= split("\n>>", $hits[0]);
>> >>>
>> >>>         my $hit = shift @hits;
>> >>>
>> >>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> >>> +)/));
>> >>>
>> >>>         foreach my $align (@hits) {
>> >>>
>> >>>             my @details= split ("\n>", $align);
>> >>>            my $detail = shift @details;
>> >>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>> >>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>> >>>             $data{'bits'}=$1;
>> >>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>> >>>             $data{'evalue'}=$1;
>> >>>
>> >>>             my $term = quotemeta("; sw_score");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'score'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_ident:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'percid'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_sim:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'percsim'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_overlap:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'alen'}=$1;
>> >>>
>> >>>             $detail = shift @details;
>> >>>
>> >>>             $term = quotemeta("; al_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'qstart'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_stop:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'qend'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_display_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             my $lakis ='';
>> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>> >>>
>> >>>             $data{'qgap'}=($1 =~ tr/\-//);
>> >>>
>> >>>             $detail = shift @details;
>> >>>
>> >>>             $term = quotemeta("; sq_len:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hlen'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hstart'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_stop:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hend'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_display_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>> >>>             $data{'hgap'}=($1 =~ tr/-//);
>> >>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>> >>>             $data{'mmcount'} = $data{'alen'} - ( int($data
>> >>> {'percid'} *
>> >>> $data{'alen'}) + $data{'gapcount'});
>> >>>
>> >>> for ( $data{'percid'}, $data{'percsim'} ) {
>> >>>     $_ = sprintf("%.2f",$_*100);
>> >>> }
>> >>>
>> >>>             print $outfh join( "\t",map { $data{$_} }  
>> @fields),"\n"
>> >>>         }
>> >>>
>> >>> }
>> >>>
>> >>> <----------------- CODE ENDS HERE ---------------------->
>> >>>
>> >>> --
>> >>>
>> >>> *Ioannis Kirmitzoglou*, MSc
>> >>> PhD. Student,
>> >>> Bioinformatics Research Laboratory
>> >>> Department of Biological Sciences
>> >>> University of Cyprus
>> >>> _______________________________________________
>> >>> Bioperl-l mailing list
>> >>> Bioperl-l at lists.open-bio.org
>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> >> --
>> >> ===========================================================
>> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> >> ===========================================================
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> > --
>> > Jason Stajich
>> > jason at bioperl.org
>> > http://jason.open-bio.org/
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From ioanniskirmitzoglou at gmail.com  Mon Apr 23 05:45:53 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Mon, 23 Apr 2007 12:45:53 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
References: <10034698.post@talk.nabble.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
	<69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
Message-ID: <b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>

I don't know about older versions but the latest version of FASTA starts its
output with a line similar to those:
# fasta34.exe -m9 -d0 -Q test.faa test.faa OR
# fasta34.exe -m10 -Q test.faa test.faa

This very first line is also the only one in the output that starts with
'#'.
Isn't this an easy way to determine the output type?


-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus

From cjfields at uiuc.edu  Mon Apr 23 08:46:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Apr 2007 07:46:40 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
	<69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
	<b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>
Message-ID: <333A7BEF-71E3-4E15-B2EC-384AEBAA13B7@uiuc.edu>

That's true, but older versions of fasta don't do this.  For  
instance, the example files in the bioperl distribution in t/data  
(HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this  
line.

 From the fasta changelog:

-------------------------------------------------------------
 >>Nov 14-22, 2002  CVS fa34t20b6

Include compile-time define (-DPGM_DOC) that causes all the fasta
programs to provide the same command line echo that is provided by the
PVM and MPI parallel programs.  Thus, if you run the program:

     fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12

the first lines of output from FASTA will be:

     # fasta34_t -q gtt1_drome.aa /slib/swissprot
      FASTA searches a protein or DNA sequence data bank
      version 3.4t20 Nov 10, 2002
     Please cite:
      W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

This has been turned on by default in most FASTA Makefiles.
-------------------------------------------------------------

We could only support newer fasta output (newer that the above  
version) since there have been several bug fixes and changes to  
output; not sure how everyone else feels about this.

chris

On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:

> I don't know about older versions but the latest version of FASTA  
> starts its
> output with a line similar to those:
> # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
> # fasta34.exe -m10 -Q test.faa test.faa
>
> This very first line is also the only one in the output that starts  
> with
> '#'.
> Isn't this an easy way to determine the output type?
>
>
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Apr 23 09:49:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Apr 2007 08:49:45 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>
References: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>
Message-ID: <12707EA8-F245-4AE7-BFD1-EE861F431F3D@uiuc.edu>

Aaron,

I find -m 10 defined way back in fasta2 notes:

--------------------------------------------------------------
Changes with 2.0x4  (January, 1996)

The major change in with 2.0x4 is the ability to get a parseable
output from FASTA/TFASTA/SSEARCH.  This can be done using output
option -m 10.  ...
--------------------------------------------------------------

It goes on to define it in more detail (which is nice to have  
around!).  It's possible it wasn't implemented until recently for  
fasta3 but I find references to it in the various fasta3 notes going  
back to at least 2001, so maybe it wasn't not compiled by default  
until recently?  The extra '#' line was added in 2002 to all output  
as far as I can tell.

We could just have SearchIO::fasta fall back to default parsing if  
'#' isn't present.  The default format and m10 are sufficiently  
different enough that we probably want to separate m10 parsing into  
it's own parser subroutine so we don't screw with the default parsing  
too much.

chris

On Apr 23, 2007, at 8:29 AM, aaron.j.mackey at gsk.com wrote:

> Since -m10 is newer than PGM_DOC, you should be fine to use the  
> first line
> as a detection for m10, when that first line exists (when it does  
> not, the
> format cannot be m10, unless someone has re-compiled FASTA with an
> undefined PGM_DOC).
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 04/23/2007 08:46:40 AM:
>
>> That's true, but older versions of fasta don't do this.  For
>> instance, the example files in the bioperl distribution in t/data
>> (HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this
>> line.
>>
>>  From the fasta changelog:
>>
>> -------------------------------------------------------------
>>>> Nov 14-22, 2002  CVS fa34t20b6
>>
>> Include compile-time define (-DPGM_DOC) that causes all the fasta
>> programs to provide the same command line echo that is provided by  
>> the
>> PVM and MPI parallel programs.  Thus, if you run the program:
>>
>>      fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12
>>
>> the first lines of output from FASTA will be:
>>
>>      # fasta34_t -q gtt1_drome.aa /slib/swissprot
>>       FASTA searches a protein or DNA sequence data bank
>>       version 3.4t20 Nov 10, 2002
>>      Please cite:
>>       W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
>>
>> This has been turned on by default in most FASTA Makefiles.
>> -------------------------------------------------------------
>>
>> We could only support newer fasta output (newer that the above
>> version) since there have been several bug fixes and changes to
>> output; not sure how everyone else feels about this.
>>
>> chris
>>
>> On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:
>>
>>> I don't know about older versions but the latest version of FASTA
>>> starts its
>>> output with a line similar to those:
>>> # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
>>> # fasta34.exe -m10 -Q test.faa test.faa
>>>
>>> This very first line is also the only one in the output that starts
>>> with
>>> '#'.
>>> Isn't this an easy way to determine the output type?
>>>
>>>
>>> -- 
>>>
>>> *Ioannis Kirmitzoglou*, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From aaron.j.mackey at gsk.com  Mon Apr 23 09:29:39 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 23 Apr 2007 09:29:39 -0400
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <333A7BEF-71E3-4E15-B2EC-384AEBAA13B7@uiuc.edu>
Message-ID: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>

Since -m10 is newer than PGM_DOC, you should be fine to use the first line 
as a detection for m10, when that first line exists (when it does not, the 
format cannot be m10, unless someone has re-compiled FASTA with an 
undefined PGM_DOC).

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 04/23/2007 08:46:40 AM:

> That's true, but older versions of fasta don't do this.  For 
> instance, the example files in the bioperl distribution in t/data 
> (HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this 
> line.
> 
>  From the fasta changelog:
> 
> -------------------------------------------------------------
>  >>Nov 14-22, 2002  CVS fa34t20b6
> 
> Include compile-time define (-DPGM_DOC) that causes all the fasta
> programs to provide the same command line echo that is provided by the
> PVM and MPI parallel programs.  Thus, if you run the program:
> 
>      fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12
> 
> the first lines of output from FASTA will be:
> 
>      # fasta34_t -q gtt1_drome.aa /slib/swissprot
>       FASTA searches a protein or DNA sequence data bank
>       version 3.4t20 Nov 10, 2002
>      Please cite:
>       W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
> 
> This has been turned on by default in most FASTA Makefiles.
> -------------------------------------------------------------
> 
> We could only support newer fasta output (newer that the above 
> version) since there have been several bug fixes and changes to 
> output; not sure how everyone else feels about this.
> 
> chris
> 
> On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:
> 
> > I don't know about older versions but the latest version of FASTA 
> > starts its
> > output with a line similar to those:
> > # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
> > # fasta34.exe -m10 -Q test.faa test.faa
> >
> > This very first line is also the only one in the output that starts 
> > with
> > '#'.
> > Isn't this an easy way to determine the output type?
> >
> >
> > -- 
> >
> > *Ioannis Kirmitzoglou*, MSc
> > PhD. Student,
> > Bioinformatics Research Laboratory
> > Department of Biological Sciences
> > University of Cyprus
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bix at sendu.me.uk  Tue Apr 24 06:21:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 24 Apr 2007 11:21:29 +0100
Subject: [Bioperl-l] WrapperBase / StandAloneBlast executable() method
	confusion
Message-ID: <462DDA29.4090104@sendu.me.uk>

Hi,

I'm a little unsure of the intent for executable() in wrapper modules. 
The WrapperBase version of the method and the StandAloneBlast version 
have the same POD but different implementations.

WrapperBase takes as a first arg an 'exe' which it will blindly trust is 
the path to a working executable. (That doesn't seem sensible already.) 
It is only capable of storing one such path.

If no arg is supplied it uses program_path() (which uses program_name()) 
to find the executable. Failing that it does a further direct test on 
program_name() to see if its executable.


StandAloneBlast takes as a first arg merely the name of your exe and 
also (undocumented) the path to the corresponding executable (which is 
tested to see if it really executable). It can store executable paths 
for multiple different exenames (corresponding better with the docs for 
the first arg: "name of executable to set path to").

If no second arg is supplied it does something similar to WrapperBase, 
except that it uses the first arg exename (or a default if that wasn't 
supplied) in place of program_name().


I'm trying to generalize this so StandAloneBlast can just use the 
WrapperBase version (and so other wrappers can then store executable 
paths for different sub-programs). Any suggestions for a good way of 
melding these two together whilst somehow retaining backward compatibility?

From cjfields at uiuc.edu  Tue Apr 24 08:55:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 24 Apr 2007 07:55:43 -0500
Subject: [Bioperl-l] WrapperBase / StandAloneBlast executable() method
	confusion
In-Reply-To: <462DDA29.4090104@sendu.me.uk>
References: <462DDA29.4090104@sendu.me.uk>
Message-ID: <8F1427D6-8654-461E-B9AA-E51CC3A20318@uiuc.edu>

I'm not sure, but you might want to bring Torsten in on this as he  
took over maintaining StandAloneBlast.  Much of the confusion may  
stem from the independent evolution of StandAloneBlast and WrapperBase.

Also, (a bit unrelated), there were plans for unifying the  
Bio::Tools::Run BLAST modules described here:

http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

Seemed like there was a general consensus at the time on the need to  
refactor StandAloneBlast and RemoteBlast code, so maybe the best  
place to start is StandAloneBlast (the others could be added in from  
there).  We could just deprecate use of the older modules at some  
point in favor of the new scheme.

chris

On Apr 24, 2007, at 5:21 AM, Sendu Bala wrote:

> Hi,
>
> I'm a little unsure of the intent for executable() in wrapper modules.
> The WrapperBase version of the method and the StandAloneBlast version
> have the same POD but different implementations.
>
> WrapperBase takes as a first arg an 'exe' which it will blindly  
> trust is
> the path to a working executable. (That doesn't seem sensible  
> already.)
> It is only capable of storing one such path.
>
> If no arg is supplied it uses program_path() (which uses  
> program_name())
> to find the executable. Failing that it does a further direct test on
> program_name() to see if its executable.
>
>
> StandAloneBlast takes as a first arg merely the name of your exe and
> also (undocumented) the path to the corresponding executable (which is
> tested to see if it really executable). It can store executable paths
> for multiple different exenames (corresponding better with the docs  
> for
> the first arg: "name of executable to set path to").
>
> If no second arg is supplied it does something similar to WrapperBase,
> except that it uses the first arg exename (or a default if that wasn't
> supplied) in place of program_name().
>
>
> I'm trying to generalize this so StandAloneBlast can just use the
> WrapperBase version (and so other wrappers can then store executable
> paths for different sub-programs). Any suggestions for a good way of
> melding these two together whilst somehow retaining backward  
> compatibility?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From avilella at gmail.com  Tue Apr 24 12:10:19 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 24 Apr 2007 17:10:19 +0100
Subject: [Bioperl-l] lack of markers for some genotypes in some
	Bio::PopGen::Statistics methods
Message-ID: <358f4d650704240910u4c90864cqd6c4e38ecedef4c5@mail.gmail.com>

Hi,

I have some genotype data where some individuals don't have a given marker
in the population.

This means that some methods in Bio::PopGen::Statistics will fail when
trying to get them, so I've added a couple of "next unless (defined($sth));"
around to overcome this. But I am not sure if this breaks any assumption
made when implementing the methods.

Anyone able to check this?

Thanks,

    Albert.

avilella at magneto:~$ diff -u
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm.modif
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm
---
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm.modif
2007-04-24 15:05:51.000000000 +0100
+++
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm
2007-04-22 16:03:24.000000000 +0100
@@ -546,7 +546,6 @@
        # separate genotypes into 'chromosomes'
        for my $marker_name( @marker_names ) {
           my ($genotype) = $ind->get_Genotypes(-marker => $marker_name);
-           next unless defined($genotype); #FIXME -- is this correct?
           my $i =0;
           for my $allele ( $genotype->get_Alleles ) {
               push @{$chromosomes[$i]},

avilella at magneto:~$ diff -u
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm.modif
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm
---
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm.modif
2007-04-24 15:04:51.000000000 +0100
+++
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm
2007-04-22 16:03:24.000000000 +0100
@@ -656,8 +656,6 @@
                return 0;
            }
            foreach my $m ( @marker_names ) {
-              my $genotype = $ind->get_Genotypes($m);
-              next unless defined($genotype); #FIXME -- is this correct?
                foreach my $allele (map { $_->get_Alleles}
                               $ind->get_Genotypes($m) ) {
                    $data{$m}->{$allele}++;

From MEC at stowers-institute.org  Thu Apr 26 12:48:45 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 26 Apr 2007 11:48:45 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
Message-ID: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>

Lincoln, et al,

I find that the gff3_string for Bio::DB::SeqFeature objects retreived
from a Bio::DB::SeqFeature::Store that were initially created with
-seqments (i.e. whose location was discontiguous) does not display any
other attributes in column 9 than "Name".

What do you think of the following patch to Bio::Graphics::FeatureBase,
whose effect is to "contrive to return (duplicated) common group values"
(which otherwise get lost when "collapsing" "homogenous" parent/child
features) 

Another approach would be to copy the attributes from the parent to the
children when the -seqments are first created.

Another approach would be to use Bio::SeqFeature::Generic  as the db's
-seqfeature_class and save with -location being a Bio::Location::Split,
but this was wrougth with other problems.

Any other suggestions?  Do you want me to commit this patch?

Cheers,

Malcolm
 
Patch follows:


Index: FeatureBase.pm
===================================================================
RCS file:
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
retrieving revision 1.29
diff -c -r1.29 FeatureBase.pm
*** FeatureBase.pm	16 Apr 2007 19:55:33 -0000	1.29
--- FeatureBase.pm	26 Apr 2007 16:30:23 -0000
***************
*** 581,587 ****
      foreach (@children) { 
        s/Parent=/ID=/g; 
      } # replace Parent tag with ID
!     return join "\n", at children;
    }
  
    return join("\n",$p, at children);
--- 581,589 ----
      foreach (@children) { 
        s/Parent=/ID=/g; 
      } # replace Parent tag with ID
!     #return join "\n", at children;
!     # Instead of above, additionally, contrive to return (duplicated)
common group values
!     return(join("$group\n", at children) . $group);
    }
  
    return join("\n",$p, at children);


From emeric.sevin at univ-rennes1.fr  Thu Apr 26 04:48:37 2007
From: emeric.sevin at univ-rennes1.fr (Emeric Sevin)
Date: Thu, 26 Apr 2007 10:48:37 +0200
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
	<7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>
Message-ID: <4ef54906af35b3cbf231303285527055@univ-rennes1.fr>

hi! sorry for the delay, took a little vacation ;-)

indeed I don't see any trouble in coding a supplementary test, I'm just 
not at all familiar with the patch release/bioperl package update and 
would prefer leave that to you. For that purpouse I'll take care of 
that bug post in the coming hours!
Thank you very much
Emeric

Le 13 avr. 07, ? 22:13, Jason Stajich a ?crit :

> I think it just needs an edit the code in the to_string which checks
> for the type of algorithm.  You'd need to add to the if/elsif cascade
> and add something for the RPSBLAST type and codes the query and
> target dbs and query and target sequence types properly.  This would
> be very trivial to code in, have you tried adding this to see if it
> works?
>
> if you submit a bug with and example report we'd be able to make
> appropriate changes faster.
>
> -jason
> On Apr 11, 2007, at 6:32 AM, Emeric Sevin wrote:
>
>> Hi everybody,
>>
>> I'm sorry to bug, but either I missed something so obvious nobody
>> bothered to answer, either I'm being a little boycotted here...
>> A little help would be very much appreciated
>>
>> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>>
>>> Hello,
>>>
>>> I am new to this community, and apologize if this subject has been
>>> posted before.
>>>
>>> I want to print out only selected results from a multiple blast-
>>> alignments results file. Problem is, the algorithm used is
>>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the
>>> actual writing task yields "unclean" warnings. Although an ouput
>>> is actually written, the writer
>>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by
>>> the fact rpsblast DBs are not labeled with
>>> "protein"/"nucleic"/"translated".
>>> Does anybody know of an easy fix to that bug, or of another way to
>>> come around it?
>>>
>>> Thank you very much
>>>
>>> Emeric SEVIN
>>> Universit? de Rennes 1_______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From budd at embl-heidelberg.de  Thu Apr 26 06:18:11 2007
From: budd at embl-heidelberg.de (Aidan Budd)
Date: Thu, 26 Apr 2007 12:18:11 +0200 (CEST)
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
Message-ID: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>

Hi Bioperlers,

I'm trying to parse a FASTA search output file (see attached .out file) 
using Bioperl 1.4. My Bioperl installation has otherwise been working 
fine, however I currently get the following error when running a simple 
script that attempts to access result from this outfile via bioperl.

Is this a problem with the parser?
Or have I executed FASTA wrongly creating output that isn't covered by the 
parser?

Any suggestions on how to deal with this much appreciated.

Best wishes,

Aidan

Script:

#!/usr/bin/perl -w
$^W=1;
use strict;
use Bio::SearchIO;

my $fasta_report = new Bio::SearchIO ('-format' => 'fasta',
                                      '-file'   => $ARGV[0]);
                                      
my $result = $fasta_report->next_result();            

Errors:

Use of uninitialized value in concatenation (.) or string at 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/GenericHSP.pm 
line 231, <GEN3> line 47.

------------- EXCEPTION  -------------
MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm 
FASTP -score 62.4 -hit_frame 0 -hsp_length 180 -hit_seq  -hit_length 0 
-query_length 128 -query_frame 0 -swscore 122 -rank 1 -query_seq 
GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS--PALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD-LYCHKSD 
-homology_seq                              
MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKRAKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RKCSL...LL.SVNL.K.ADHE.A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTMEPATLSPKSMR 
-hit_name YFL031W -bits 19.4 -query_name CREB1_MONKEY -evalue 1.1 (qs='
STACK Bio::Search::HSP::GenericHSP::new 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/GenericHSP.pm:231
STACK Bio::Search::HSP::FastaHSP::new 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/FastaHSP.pm:97
STACK Bio::Factory::ObjectFactory::create_object 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Factory/ObjectFactory.pm:150
STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/SearchResultEventBuilder.pm:275
STACK Bio::SearchIO::fasta::end_element 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/fasta.pm:872
STACK Bio::SearchIO::fasta::next_result 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/fasta.pm:403
STACK toplevel 
/Users/budd/scripts/test_scripts/test_parsing_fasta_output.pl:22

--------------------------------------

-- 
----------------------------------------------------------------------
Aidan Budd, PhD                               tel:+49 (0)6221 387 8530
EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
Meyerhofstr. 1, 69117 Heidelberg, Germany

URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html
-------------- next part --------------
# fasta34 -m 2 creb1_human.fasta yeast_bzips_from_ensembl.fasta
FASTA searches a protein or DNA sequence data bank
 version 34.26 January 12, 2007
Please cite:
 W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

Query library creb1_human.fasta vs yeast_bzips_from_ensembl.fasta library
searching yeast_bzips_from_ensembl.fasta library

  1>>>CREB1_MONKEY 341 aa - 341 aa
 vs  yeast_bzips_from_ensembl.fasta library

   3683 residues in    10 sequences
 MLE_cen statistics: Lambda= 0.0338;  K=8.757e-05 (cen=0)

FASTA (3.5 Sept 2006) function [optimized, BL50 matrix (15:-5)] ktup: 2
 join: 37, opt: 25, open/ext: -10/-2, width:  16
 Scan time:  0.000
The best scores are:                                      opt bits E(10)
YFL031W                                            ( 238)  122 19.4     1.1
YEL009C                                            ( 281)  121 19.4     1.3
YIL036W                                            ( 587)  129 19.8       2
YIR017C                                            ( 187)   83 17.5     2.9
YVNL167C                                           ( 647)  119 19.3     2.9
YIR018W                                            ( 245)   67 16.7     5.3
YER045C                                            ( 489)   73 17.0     7.1
YDR259C                                            ( 383)   62 16.5     7.5
YOR028C                                            ( 296)   41 15.5     8.9
YHL009C                                            ( 330)   33 15.1     9.6

>>YFL031W                                                 (238 aa)
 initn: 107 init1: 107 opt: 122  Z-score: 62.4  bits: 19.4 E():  1.1
Smith-Waterman score: 122;  27.660% identity (63.830% similar) in 94 aa overlap (248-337:2-95)

       220       230       240       250       260       270       
CREB1_ GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS--PALP
YFL031                              MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKR

         280       290       300       310       320        330    
CREB1_ TQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD
YFL031 AKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RKCSL...LL.SVNL.K.ADHE.

           340                                                     
CREB1_ -LYCHKSD                                                    
YFL031 A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTMEPATLSPKSMR

>>YEL009C                                                 (281 aa)
 initn: 138 init1:  83 opt: 121  Z-score: 60.8  bits: 19.4 E():  1.3
Smith-Waterman score: 121;  29.412% identity (55.462% similar) in 119 aa overlap (219-335:165-277)

      190       200       210       220       230       240        
CREB1_ GAIQLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGD
YEL009 VSLADKAIESTEEVSLVPSNLEVSTTSFLP.PV.ED.KL.QTRKVKK.NS--..KKSHHV

      250       260       270         280       290       300      
CREB1_ VQTYQIRTAPTSTIAPGVVMASSPALPTQP--AEEAARKREVRLMKNREAARECRRKKKE
YEL009 GKDDES.LDHLGVV.YNRKQR.I.LS.IV.ESSDP..L..----AR.T....RS.AR.LQ

        310       320       330       340 
CREB1_ YVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
YEL009 RM.Q..DK.EE.LSK.YH.EN.VAR..K.VGER  

>>YIL036W                                                 (587 aa)
 initn: 132 init1:  70 opt: 129  Z-score: 57.2  bits: 19.8 E():    2
Smith-Waterman score: 129;  18.750% identity (55.682% similar) in 352 aa overlap (2-335:137-477)

                                            10        20           
CREB1_                              MTMESGAENQQSGDAAVTEAENQQM--TVQA
YIL036 RVVKPSANSNYQQAAYLRQQQQQDQRQQSPS.KTEE.S.LY..ILMNSGVV.D.HQNLAT

      30        40        50        60        70        80         
CREB1_ QPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQSS
YIL036 HTNLSQ.SSTRKS.PNDSTT...-NASNIA.--.AS.NKQMYFMNMNMNNN.HALNDP.I

      90         100       110         120       130       140     
CREB1_ CKDLKRLFS--GTQISTIAESEDS--QESVDSVTDSQKRREILSRRPSYRKILNDL----
YIL036 LET.SPF.QPF.VDVAHLPMTNPPIF.S.LPGCDEPIR..R.SISNGQISQLGE.IETLE

                150       160          170        180       190    
CREB1_ ---SSDAPGVPRIEEEKSEEET---SAPAITTVTVP-TPIYQTSSGQYIAITQGGAIQLA
YIL036 NLHNTQP.PM.NFHNYNGLSQ.RNV.NKPVFNQA..VSS.P.YNAKKV.NP.KDS.--.G

          200       210       220       230       240       250    
CREB1_ NNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQI
YIL036 DQSVIYSKSQ.RNFVNAPSKNT.AES.----SDLE.MTTFA.TTGGENRGK.ALRESHSN

           260       270       280       290       300       310   
CREB1_ RT-APTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLEN
YIL036 PSFT.K.QGSHLNLA.NTQGN.I-.GT-T.W..ARL.ER..I..SK..QR..VAQLQ.QK

           320       330       340                                 
CREB1_ RVAVLENQNKTLIEELKALKDLYCHKSD                                
YIL036 EFNEIKDE.RI.LKK.NYYEK.ISKFKKFSKIHLREHEKLNKDSDNNVNGTNSSNKNESM

>>YIR017C                                                 (187 aa)
 initn:  43 init1:  43 opt:  83  Z-score: 54.0  bits: 17.5 E():  2.9
Smith-Waterman score: 84;  22.785% identity (56.962% similar) in 158 aa overlap (176-330:9-148)

         150       160       170       180       190       200     
CREB1_ PGVPRIEEEKSEEETSAPAITTVTVPTPIYQTSSGQYIAITQGGAIQLANNGTDGVQGLQ
YIR017                       MSAKQGWEKK.TNID..SRK.MNV---..LSEHL.N.I

         210       220       230       240        250       260    
CREB1_ TLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASG-DVQTYQIRTAPTS--TI
YIR017 S------SDSEL.SRL.SLLLVSS.N-----AEELISMINN.Q..SQFKKLRE.RKGKVA

            270       280       290       300       310       320  
CREB1_ APGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQN
YIR017 .TTA.VVKEEEA.VSTSN.LDKIKQE.RR..T..SQRF.IR..Q--.NF..-MNK.Q.L.

            330       340                             
CREB1_ KTLIEELKALKDLYCHKSD                            
YIR017 -.Q.NK.RDRIEQLNKENEFWKAKLNDINEIKSLKLLNDIKRRNMGR

>>YVNL167C                                                (647 aa)
 initn: 142 init1: 119 opt: 119  Z-score: 53.8  bits: 19.3 E():  2.9
Smith-Waterman score: 119;  39.623% identity (62.264% similar) in 53 aa overlap (280-332:426-478)

     250       260       270       280       290       300         
CREB1_ QTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVK
YVNL16 RKNSAVTTAPAQKDDVENNKISNNVTLDEN..QE...KEF.ER..V..SKF.KR....I.

     310       320       330       340                             
CREB1_ CLENRVAVLENQNKTLIEELKALKDLYCHKSD                            
YVNL16 KI..DLQFY.SEYDD.TQVIGK.CGIIPSSSSNSQFNVNVSTPSSSSPPSTSLIALLESS

>>YIR018W                                                 (245 aa)
 initn:  61 init1:  61 opt:  67  Z-score: 47.6  bits: 16.7 E():  5.3
Smith-Waterman score: 67;  25.455% identity (61.818% similar) in 55 aa overlap (280-334:55-109)

     250       260       270       280       290       300         
CREB1_ QTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVK
YIR018 SKNWKLPPRLPHRAAQRRKRVHRLHEDYET..NDEELQKKKRQ..D.Q.AY.ER.NNKLQ

     310       320       330       340                             
CREB1_ CLENRVAVLENQNKTLIEELKALKDLYCHKSD                            
YIR018 V..ETIES.SKVV.NYETK.NR.QNELQAKESENHALKQKLETLTLKQASVPAQDPILQN

>>YER045C                                                 (489 aa)
 initn: 111 init1:  70 opt:  73  Z-score: 43.8  bits: 17.0 E():  7.1
Smith-Waterman score: 97;  22.826% identity (67.391% similar) in 92 aa overlap (3-92:210-300)

                                           10        20         30 
CREB1_                             MTMESGAENQQSGDAAVTEAE-NQQMTVQAQP
YER045 QTGSKNIYAAMTPYDSNIKLNIPAVAATCDIP.ATPSIP...STMNQ.YI.M.LRL...M

              40        50        60         70        80        90
CREB1_ QIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGV-IQAAQPSVIQSPQVQTVQSSC
YER045 .TKAWKNAQL-NV.PCTP.SNSSVSSSSSC.NIND.NIEN.SVHS.ISHGVNHH..NN..

              100       110       120       130       140       150
CREB1_ KDLKRLFSGTQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPR
YER045 QNAELNISSSLPYESKCPDVNLTHANSKPQYKDATSALKNNINSEKDVHTAPFSSMHTTA

>>YDR259C                                                 (383 aa)
 initn:  84 init1:  52 opt:  62  Z-score: 42.8  bits: 16.5 E():  7.5
Smith-Waterman score: 81;  33.333% identity (64.583% similar) in 48 aa overlap (289-330:227-274)

      260       270       280       290       300       310        
CREB1_ TSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVL
YDR259 NDNNDNVTKPVPDKDTQLISSSGKTLRNTR.AAQ..T.QKAF.QR.EK.I.N..QKSKIF

           320        330       340                                
CREB1_ -----ENQN-KTLIEELKALKDLYCHKSD                               
YDR259 DDLLA..N.F.S.NDS.RNDNNILIAQHEAIRNAITMLRSEYDVLCNENNMLKNENSIIK

>>YOR028C                                                 (296 aa)
 initn:  35 init1:  35 opt:  41  Z-score: 39.3  bits: 15.5 E():  8.9
Smith-Waterman score: 80;  33.962% identity (66.038% similar) in 53 aa overlap (289-334:243-295)

      260       270       280       290       300       310        
CREB1_ TSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVL
YOR028 LSEQVFNEGERYNNDGQLIGKTGKPLRNTK.AAQ..S.QKAF.QRREK.I.N..EKSKLF

           320        330        340 
CREB1_ -----ENQN-KTLIEELKA-LKDLYCHKSD
YOR028 DGLMK..SEL.KM..S..SK..E*      

>>YHL009C                                                 (330 aa)
 initn:  33 init1:  33 opt:  33  Z-score: 36.4  bits: 15.1 E():  9.6
Smith-Waterman score: 91;  21.667% identity (57.500% similar) in 120 aa overlap (222-333:79-194)

             200       210       220       230             240     
CREB1_ QLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQI-LVP-----SNQVVVQAA
YHL009 EQTAPFPILEDQCPALNLDRSNNDLLLQNNISFPKGS.L.A.Q.T.ISGDY.TY.MADNN

         250         260       270       280       290       300   
CREB1_ SGDVQTYQIRT--APTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRK
YHL009 NN.NDS.SNTNYFSKNNG.S.SSRSP.VAHNENV.DDSK.K.KA----Q..A.QKAF.ER

           310       320       330       340                       
CREB1_ KKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD                      
YHL009 .EARM.E.QDKLLES.RNRQS.LK.IEE.RKANTEINAENRLLLRSGNENFSKDIEDDTN


341 residues in 1 query   sequences
3683 residues in 10 library sequences
 Scomplib [34.26]
 start: Thu Apr 26 11:52:16 2007 done: Thu Apr 26 11:52:16 2007
 Total Scan time:  0.000 Total Display time:  0.010

Function used was FASTA [version 34.26 January 12, 2007]
-------------- next part --------------
>CREB1_MONKEY
MTMESGAENQQSGDAAVTEAENQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPN
GQTVQVHGVIQAAQPSVIQSPQVQTVQSSCKDLKRLFSGTQISTIAESEDSQESVDSVTD
SQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETSAPAITTVTVPTPIYQTSSG
QYIAITQGGAIQLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQV
VVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAAREC
RRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
-------------- next part --------------
>YIL036W
MFTGQEYHSVDSNSNKQKDNNKRGIDDTSKILNNKIPHSVSDTSAAATTTSTMNNSALSR
SLDPTDINYSTNMAGVVDQIHDYTTSNRNSLTPQYSIAAGNVNSHDRVVKPSANSNYQQA
AYLRQQQQQDQRQQSPSMKTEEESQLYGDILMNSGVVQDMHQNLATHTNLSQLSSTRKSA
PNDSTTAPTNASNIANTASVNKQMYFMNMNMNNNPHALNDPSILETLSPFFQPFGVDVAH
LPMTNPPIFQSSLPGCDEPIRRRRISISNGQISQLGEDIETLENLHNTQPPPMPNFHNYN
GLSQTRNVSNKPVFNQAVPVSSIPQYNAKKVINPTKDSALGDQSVIYSKSQQRNFVNAPS
KNTPAESISDLEGMTTFAPTTGGENRGKSALRESHSNPSFTPKSQGSHLNLAANTQGNPI
PGTTAWKRARLLERNRIAASKCRQRKKVAQLQLQKEFNEIKDENRILLKKLNYYEKLISK
FKKFSKIHLREHEKLNKDSDNNVNGTNSSNKNESMTVDSLKIIEELLMIDSDVTEVDKDT
GKIIAIKHEPYSQRFGSDTDDDDIDLKPVEGGKDPDNQSLPNSEKIK
>YIR017C
MSAKQGWEKKSTNIDIASRKGMNVNNLSEHLQNLISSDSELGSRLLSLLLVSSGNAEELI
SMINNGQDVSQFKKLREPRKGKVAATTAVVVKEEEAPVSTSNELDKIKQERRRKNTEASQ
RFRIRKKQKNFENMNKLQNLNTQINKLRDRIEQLNKENEFWKAKLNDINEIKSLKLLNDI
KRRNMGR
>YVNL167C
MSSEERSRQPSTVSTFDLEPNPFEQSFASSKKALSLPGTISHPSLPKELSRNNSTSTITQ
HSQRSTHSLNSIPEENGNSTVTDNSNHNDVKKDSPSFLPGQQRPTIISPPILTPGGSKRL
PPLLLSPSILYQANSTTNPSQNSHSVSVSNSNPSAIGVSSTSGSLYPNSSSPSGTSLIRQ
PRNSNVTTSNSGNGFPTNDSQMPGFLLNLSKSGLTPNESNIRTGLTPGILTQSYNYPVLP
SINKNTITGSKNVNKSVTVNGSIENHPHVNIMHPTVNGTPLTPGLSSLLNLPSTGVLANP
VFKSTPTTNTTDGTVNNSISNSNFSPNTSTKAAVKMDNPAEFNAIEHSAHNHKENENLTT
QIENNDQFNNKTRKRKRRMSSTSSTSKASRKNSISRKNSAVTTAPAQKDDVENNKISNNV
TLDENEEQERKRKEFLERNRVAASKFRKRKKEYIKKIENDLQFYESEYDDLTQVIGKLCG
IIPSSSSNSQFNVNVSTPSSSSPPSTSLIALLESSISRSDYSSAMSVLSNMKQLICETNF
YRRGGKNPRDDMDGQEDSFNKDTNVVKSENAGYPSVNSRPIILDKKYSLNSGANISKSNT
TTNNVGNSAQNIINSCYSVTNPLVINANSDTHDTNKHDVLSTLPHNN
>YER045C
MDYKHNFATSPDSFLDGRQNPLLYTDFLSSNKELIYKQPSGPGLVDSAYNFHHQNSLHDR
SVQENLGPMFQPFGVDISHLPITNPPIFQSSLPAFDQPVYKRRISISNGQISQLGEDLET
VENLYNCQPPILSSKAQQNPNPQQVANPSAAIYPSFSSNELQNVPQPHEQATVIPEAAPQ
TGSKNIYAAMTPYDSNIKLNIPAVAATCDIPSATPSIPSGDSTMNQAYINMQLRLQAQMQ
TKAWKNAQLNVHPCTPASNSSVSSSSSCQNINDHNIENQSVHSSISHGVNHHTVNNSCQN
AELNISSSLPYESKCPDVNLTHANSKPQYKDATSALKNNINSEKDVHTAPFSSMHTTATF
QIKQEARPQKIENNTAGLKDGAKAWKRARLLERNRIAASKCRQRKKMSQLQLQREFDQIS
KENTMMKKKIENYEKLVQKMKKISRLHMQECTINGGNNSYQSLQNKDSDVNGFLKMIEEM
IRSSSLYDE
>YIR018W
MALPLIKPKESEESHLALLSKIHVSKNWKLPPRLPHRAAQRRKRVHRLHEDYETEENDEE
LQKKKRQNRDAQRAYRERKNNKLQVLEETIESLSKVVKNYETKLNRLQNELQAKESENHA
LKQKLETLTLKQASVPAQDPILQNLIENFKPMKAIPIKYNTAIKRHQHSTELPSSVKCGF
CNDNTTCVCKELETDHRKSDDGVATEQKDMSMPHAECNNKDNPNGLCSNCTNIDKSCIDI
RSIIH
>YHL009C
MTPSNMDDNTSGFMKFINPQCQEEDCCIRNSLFQEDSKCIKQQPDLLSEQTAPFPILEDQ
CPALNLDRSNNDLLLQNNISFPKGSDLQAIQLTPISGDYSTYVMADNNNNDNDSYSNTNY
FSKNNGISPSSRSPSVAHNENVPDDSKAKKKAQNRAAQKAFRERKEARMKELQDKLLESE
RNRQSLLKEIEELRKANTEINAENRLLLRSGNENFSKDIEDDTNYKYSFPTKDEFFTSMV
LESKLNHKGKYSLKDNEIMKRNTQYTDEAGRHVLTVPATWEYLYKLSEERDFDVTYVMSK
LQGQECCHTHGPAYPRSLIDFLVEEATLNE
>YOR028C
MLMQIKMDNHPFNFQPILASHSMTRDSTKPKKMTDTAFVPSPPVGFIKEENKADLHTISV
VASNVTLPQIQLPKIATLEEPGYESRTGSLTDLSGRRNSVNIGALCEDVPNTAGPHIARP
VTINNLIPPSLPRLNTYQLRPQLSDTHLNCHFNSNPYTTASHAPFESSYTTASTFTSQPA
ASYFPSNSTPATRKNSATTNLPSEERRRVSVSLSEQVFNEGERYNNDGQLIGKTGKPLRN
TKRAAQNRSAQKAFRQRREKYIKNLEEKSKLFDGLMKENSELKKMIESLKSKLKE*
>YEL009C
MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD
TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW
TSLFDNDIPVTTDDVSLADKAIESTEEVSLVPSNLEVSTTSFLPTPVLEDAKLTQTRKVK
KPNSVVKKSHHVGKDDESRLDHLGVVAYNRKQRSIPLSPIVPESSDPAALKRARNTEAAR
RSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
>YDR259C
MQNPPLIRPDMYNQGSSSMATYNASEKNLNEHPSPQIAQPSTSQKLPYRINPTTTNGDTD
ISVNSNPIQPPLPNLMHLSGPSDYRSMHQSPIHPSYIIPPHSNERKQSASYNRPQNAHVS
IQPSVVFPPKSYSISYAPYQINPPLPNGLPNQSISLNKEYIAEEQLSTLPSRNTSVTTAP
PSFQNSADTAKNSADNNDNNDNVTKPVPDKDTQLISSSGKTLRNTRRAAQNRTAQKAFRQ
RKEKYIKNLEQKSKIFDDLLAENNNFKSLNDSLRNDNNILIAQHEAIRNAITMLRSEYDV
LCNENNMLKNENSIIKNEHNMSRNENENLKLENKRFHAEYIRMIEDIENTKRKEQEQRDE
IEQLKKKIRSLEEIVGRHSDSAT
>YFL031W
MEMTDFELTSNSQSNLAIPTNFKSTLPPRKRAKTKEEKEQRRIERILRNRRAAHQSREKK
RLHLQYLERKCSLLENLLNSVNLEKLADHEDALTCSHDAFVASLDEYRDFQSTRGASLDT
RASSHSSSDTFTPSPLNCTMEPATLSPKSMRDSASDQETSWELQMFKTENVPESTTLPAV
DNNNLFDAVASPLADPLCDDIAGNSLPFDNSIDLDNWRNPEAQSGLNSFELNDFFITS

From jason at bioperl.org  Thu Apr 26 15:27:24 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Apr 2007 12:27:24 -0700
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
In-Reply-To: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
References: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
Message-ID: <7C782DA2-5A80-413A-9B5A-94EEBEA9EF6E@bioperl.org>

Unfortunately there are some changes in the FASTA output in that  
version. The latest version of Bioperl 1.52 can handle it though so  
you'll need to upgrade Bioperl.

-jason
On Apr 26, 2007, at 3:18 AM, Aidan Budd wrote:

> Hi Bioperlers,
>
> I'm trying to parse a FASTA search output file (see attached .out  
> file)
> using Bioperl 1.4. My Bioperl installation has otherwise been working
> fine, however I currently get the following error when running a  
> simple
> script that attempts to access result from this outfile via bioperl.
>
> Is this a problem with the parser?
> Or have I executed FASTA wrongly creating output that isn't covered  
> by the
> parser?
>
> Any suggestions on how to deal with this much appreciated.
>
> Best wishes,
>
> Aidan
>
> Script:
>
> #!/usr/bin/perl -w
> $^W=1;
> use strict;
> use Bio::SearchIO;
>
> my $fasta_report = new Bio::SearchIO ('-format' => 'fasta',
>                                       '-file'   => $ARGV[0]);
>
> my $result = $fasta_report->next_result();
>
> Errors:
>
> Use of uninitialized value in concatenation (.) or string at
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> GenericHSP.pm
> line 231, <GEN3> line 47.
>
> ------------- EXCEPTION  -------------
> MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm
> FASTP -score 62.4 -hit_frame 0 -hsp_length 180 -hit_seq  -hit_length 0
> -query_length 128 -query_frame 0 -swscore 122 -rank 1 -query_seq
> GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS-- 
> PALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD- 
> LYCHKSD
> -homology_seq
> MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKRAKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RK 
> CSL...LL.SVNL.K.ADHE.A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTM 
> EPATLSPKSMR
> -hit_name YFL031W -bits 19.4 -query_name CREB1_MONKEY -evalue 1.1  
> (qs='
> STACK Bio::Search::HSP::GenericHSP::new
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> GenericHSP.pm:231
> STACK Bio::Search::HSP::FastaHSP::new
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> FastaHSP.pm:97
> STACK Bio::Factory::ObjectFactory::create_object
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Factory/ 
> ObjectFactory.pm:150
> STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> SearchResultEventBuilder.pm:275
> STACK Bio::SearchIO::fasta::end_element
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> fasta.pm:872
> STACK Bio::SearchIO::fasta::next_result
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> fasta.pm:403
> STACK toplevel
> /Users/budd/scripts/test_scripts/test_parsing_fasta_output.pl:22
>
> --------------------------------------
>
> -- 
> ----------------------------------------------------------------------
> Aidan Budd, PhD                               tel:+49 (0)6221 387 8530
> EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
> Meyerhofstr. 1, 69117 Heidelberg, Germany
>
> URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html
> <creb_vs_yeast_manual_fasta_changed_infile_formats.out>
> <creb1_human.fasta>
> <yeast_bzips_from_ensembl.fasta>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From dmessina at wustl.edu  Thu Apr 26 15:42:02 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 26 Apr 2007 14:42:02 -0500
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
In-Reply-To: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
References: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
Message-ID: <D41F5BDD-B992-4787-91C5-732B41683908@wustl.edu>

Hi Aidan,

Bioperl 1.4 is ~3 years old now, and FASTA output has probably  
changed since then. Your code should work if you install Bioperl  
1.5.2, the latest release.

	http://www.bioperl.org/wiki/Installing_BioPerl

Please let us know if that doesn't solve the problem.

Dave

From gopu_36 at yahoo.com  Thu Apr 26 21:29:03 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Thu, 26 Apr 2007 18:29:03 -0700 (PDT)
Subject: [Bioperl-l] check for the continous segments to extract the
	sequences
Message-ID: <10211951.post@talk.nabble.com>


As a newbee to programming, thx for the support from this group. Please
ignore the message if this message is not relevant to this group as my
problem may be a typical computer science recursive one! (as I am not aware)

I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000, 5001, 6000,
6001, 7000, 7001, 8000, 12001, 13000);
The above array gives the posiiton of sequences like '1' shows the start
position and the second element '1000' gives the end of the sequence and so
on. All the even positions like 0,2,4... shows the starting points of the
sequence and odd positions like 1000, 2000, 5000 gives the END positions of
the sequences to be retrieved. basically I have to see whwther any continous
segments lie in the list and add them together to form a one whole chunk.
For example 1-1000 and 1001-2000 can be joined together to extract sequences
from 1-2000. In the same way 4001-8000 should be extracted and 12001-13000
and so on. As I said earlier, after checking the position, I will be able to
extract that part of sequence from a whole genome. Thanks for taking ur
time. Any tip or help would be greatly appreciated.

Regards
Gopu 
-- 
View this message in context: http://www.nabble.com/check-for-the-continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Thu Apr 26 21:54:59 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Apr 2007 18:54:59 -0700
Subject: [Bioperl-l] check for the continous segments to extract the
	sequences
In-Reply-To: <10211951.post@talk.nabble.com>
References: <10211951.post@talk.nabble.com>
Message-ID: <EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>

You want a connectivity algorithm.  One can be found on perlmonks.org  
as well as in Bio::Search::SearchUtils the method collapse_nums().  
You'll have to modify aspects of it to deal with ranges.

Good luck.
-jason
On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:

>
> As a newbee to programming, thx for the support from this group.  
> Please
> ignore the message if this message is not relevant to this group as my
> problem may be a typical computer science recursive one! (as I am  
> not aware)
>
> I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,  
> 5001, 6000,
> 6001, 7000, 7001, 8000, 12001, 13000);
> The above array gives the posiiton of sequences like '1' shows the  
> start
> position and the second element '1000' gives the end of the  
> sequence and so
> on. All the even positions like 0,2,4... shows the starting points  
> of the
> sequence and odd positions like 1000, 2000, 5000 gives the END  
> positions of
> the sequences to be retrieved. basically I have to see whwther any  
> continous
> segments lie in the list and add them together to form a one whole  
> chunk.
> For example 1-1000 and 1001-2000 can be joined together to extract  
> sequences
> from 1-2000. In the same way 4001-8000 should be extracted and  
> 12001-13000
> and so on. As I said earlier, after checking the position, I will  
> be able to
> extract that part of sequence from a whole genome. Thanks for  
> taking ur
> time. Any tip or help would be greatly appreciated.
>
> Regards
> Gopu
> -- 
> View this message in context: http://www.nabble.com/check-for-the- 
> continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From MEC at stowers-institute.org  Fri Apr 27 09:52:10 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 27 Apr 2007 08:52:10 -0500
Subject: [Bioperl-l] check for the continous segments to extract
	thesequences
In-Reply-To: <EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>
References: <10211951.post@talk.nabble.com>
	<EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>
Message-ID: <CED81D34E37D5043A1211565277A51E507E22F28@exchkc02.stowers-institute.org>

Gopu/Jason,

Another option is Set::IntSpan, available on CPAN at
http://search.cpan.org/~swmcd/Set-IntSpan-1.11/IntSpan.pm

Here's a perl one-liner that shows you how easy it is:

perl -MSet::IntSpan -e 'my @array = ( 1, 1000, 1001, 2000, 4001, 5000,
5001, 6000, 6001, 7000, 7001, 8000, 12001, 13000); my $is =
Set::IntSpan->new;  while (@array) {$is->U(shift(@array) . "-" .
shift(@array))}; print $is;'
1-2000,4001-8000,12001-13000

I use it all the time to great effect and have utility functions that
convert between bioperl split locations and IntSpans.

There is another module which extends it nicely, Set::IntSpan::Island,
worth a gander.

Cheers,

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Thursday, April 26, 2007 8:55 PM
> To: gopu_36
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] check for the continous segments to 
> extract thesequences
> 
> You want a connectivity algorithm.  One can be found on 
> perlmonks.org  
> as well as in Bio::Search::SearchUtils the method collapse_nums().  
> You'll have to modify aspects of it to deal with ranges.
> 
> Good luck.
> -jason
> On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:
> 
> >
> > As a newbee to programming, thx for the support from this group.  
> > Please
> > ignore the message if this message is not relevant to this 
> group as my
> > problem may be a typical computer science recursive one! (as I am  
> > not aware)
> >
> > I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,  
> > 5001, 6000,
> > 6001, 7000, 7001, 8000, 12001, 13000);
> > The above array gives the posiiton of sequences like '1' shows the  
> > start
> > position and the second element '1000' gives the end of the  
> > sequence and so
> > on. All the even positions like 0,2,4... shows the starting points  
> > of the
> > sequence and odd positions like 1000, 2000, 5000 gives the END  
> > positions of
> > the sequences to be retrieved. basically I have to see whwther any  
> > continous
> > segments lie in the list and add them together to form a one whole  
> > chunk.
> > For example 1-1000 and 1001-2000 can be joined together to extract  
> > sequences
> > from 1-2000. In the same way 4001-8000 should be extracted and  
> > 12001-13000
> > and so on. As I said earlier, after checking the position, I will  
> > be able to
> > extract that part of sequence from a whole genome. Thanks for  
> > taking ur
> > time. Any tip or help would be greatly appreciated.
> >
> > Regards
> > Gopu
> > -- 
> > View this message in context: http://www.nabble.com/check-for-the- 
> > continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From lstein at cshl.edu  Fri Apr 27 13:44:59 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 27 Apr 2007 13:44:59 -0400
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>

Hi Malcom,

This is absolutely ok and you can go ahead and commit. Thanks for figuring
this out!

Lincoln

On 4/26/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, et al,
>
> I find that the gff3_string for Bio::DB::SeqFeature objects retreived
> from a Bio::DB::SeqFeature::Store that were initially created with
> -seqments (i.e. whose location was discontiguous) does not display any
> other attributes in column 9 than "Name".
>
> What do you think of the following patch to Bio::Graphics::FeatureBase,
> whose effect is to "contrive to return (duplicated) common group values"
> (which otherwise get lost when "collapsing" "homogenous" parent/child
> features)
>
> Another approach would be to copy the attributes from the parent to the
> children when the -seqments are first created.
>
> Another approach would be to use Bio::SeqFeature::Generic  as the db's
> -seqfeature_class and save with -location being a Bio::Location::Split,
> but this was wrougth with other problems.
>
> Any other suggestions?  Do you want me to commit this patch?
>
> Cheers,
>
> Malcolm
>
> Patch follows:
>
>
>
>
> Index: FeatureBase.pm
> ===================================================================
> RCS file:
> /home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
> retrieving revision 1.29
> diff -c -r1.29 FeatureBase.pm
> *** FeatureBase.pm      16 Apr 2007 19:55:33 -0000      1.29
> --- FeatureBase.pm      26 Apr 2007 16:30:23 -0000
> ***************
> *** 581,587 ****
>       foreach (@children) {
>         s/Parent=/ID=/g;
>       } # replace Parent tag with ID
> !     return join "\n", at children;
>     }
>
>     return join("\n",$p, at children);
> --- 581,589 ----
>       foreach (@children) {
>         s/Parent=/ID=/g;
>       } # replace Parent tag with ID
> !     #return join "\n", at children;
> !     # Instead of above, additionally, contrive to return (duplicated)
> common group values
> !     return(join("$group\n", at children) . $group);
>     }
>
>     return join("\n",$p, at children);
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu

From MEC at stowers-institute.org  Fri Apr 27 14:45:04 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 27 Apr 2007 13:45:04 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
	<6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E507E22F59@exchkc02.stowers-institute.org>

Hi Lincoln,
 
Cool.
 
The principal of what I figured out I still think holds but the
implementation is slightly broke.  Improved patch forthoming next week.
 

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Friday, April 27, 2007 12:45 PM
	To: Cook, Malcolm
	Cc: lstein at cshl.org; bioperl list
	Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Malcom,
	
	This is absolutely ok and you can go ahead and commit. Thanks
for figuring this out!
	
	Lincoln
	
	
	On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

		Lincoln, et al,
		
		I find that the gff3_string for Bio::DB::SeqFeature
objects retreived 
		from a Bio::DB::SeqFeature::Store that were initially
created with
		-seqments (i.e. whose location was discontiguous) does
not display any
		other attributes in column 9 than "Name".
		
		What do you think of the following patch to
Bio::Graphics::FeatureBase, 
		whose effect is to "contrive to return (duplicated)
common group values"
		(which otherwise get lost when "collapsing" "homogenous"
parent/child
		features)
		
		Another approach would be to copy the attributes from
the parent to the 
		children when the -seqments are first created.
		
		Another approach would be to use
Bio::SeqFeature::Generic  as the db's
		-seqfeature_class and save with -location being a
Bio::Location::Split,
		but this was wrougth with other problems. 
		
		Any other suggestions?  Do you want me to commit this
patch?
		
		Cheers,
		
		Malcolm
		
		Patch follows:
		
		
		Index: FeatureBase.pm
	
=================================================================== 
		RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
		retrieving revision 1.29
		diff -c -r1.29 FeatureBase.pm
		*** FeatureBase.pm      16 Apr 2007 19:55:33 -0000
1.29
		--- FeatureBase.pm       26 Apr 2007 16:30:23 -0000
		***************
		*** 581,587 ****
		      foreach (@children) {
		        s/Parent=/ID=/g;
		      } # replace Parent tag with ID
		!     return join "\n", at children;
		    }
		
		    return join("\n",$p, at children);
		--- 581,589 ----
		      foreach (@children) {
		        s/Parent=/ID=/g;
		      } # replace Parent tag with ID
		!     #return join "\n", at children; 
		!     # Instead of above, additionally, contrive to
return (duplicated)
		common group values
		!     return(join("$group\n", at children) . $group);
		    }
		
		    return join("\n",$p, at children); 
		

	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From bernd at kirx.de  Sat Apr 28 10:36:07 2007
From: bernd at kirx.de (Bernd Mueller)
Date: Sat, 28 Apr 2007 16:36:07 +0200
Subject: [Bioperl-l] bioperl::db
Message-ID: <46335BD7.8040306@kirx.de>

Hi,

I followed those instructions on bioperl.org for installing bioperl via 
cpan. But actually it is impossible for me to install the bioperl::db 
module.

How does this work?

Moreover none of these Birney distribution are installable on my system. 
After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always 
fail. So I have to install the CRAFFI bundle but it does not seem that 
Bio::DB module is included in this bundle because my programs using that 
module do not work.

Help would be appreciated :)

Cheers,
Bernd

Appendix:

cpan[6]> d /bioperl/
Distribution    BIRNEY/bioperl-1.2.1.tar.gz
Distribution    BIRNEY/bioperl-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-1.2.3.tar.gz
Distribution    BIRNEY/bioperl-1.2.tar.gz
Distribution    BIRNEY/bioperl-1.4.tar.gz
Distribution    BIRNEY/bioperl-db-0.1.tar.gz
Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-run-1.4.tar.gz
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
12 items found


-- 
Dipl.-Inform.(FH)
Bernd Mueller
phone: +49 179 2336692
email: bernd at kirx.de


From cydeweys at gmail.com  Sun Apr 29 09:43:55 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 09:43:55 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
Message-ID: <4634A11B.6090809@umd.edu>

I'm trying to load up a table of codon usage frequencies I've downloaded
from the web using Bio::CodonUsage::IO.  My code looks like this:

    use Bio::CodonUsage::Table;
    use Bio::CodonUsage::IO;
    # ...
    my $io = Bio::CodonUsage::IO->new(-file=>$freqFile);
    my $codonTable = $io->next_data();

Unfortunately, I can't seem to find any documentation on what format the
codon usage table file is expected to be in, and all of my best guesses
seem to be invalid, yielding the following error message:

-------------------- WARNING ---------------------
MSG: probable parsing error - should be 21 entries for 20aa + stop codon
---------------------------------------------------

I've tried using both formats that are available from the Codon Usage
Database (easily the largest source of codon usage frequencies),
available here: http://www.kazusa.or.jp/codon/

The two formats I've tried and failed look like this:

UUU 32.5( 45732)  UCU 15.3( 21588)  UAU 27.8( 39146)  UGU  6.3(  8796)
UUC 14.3( 20101)  UCC  3.2(  4458)  UAC  9.3( 13016)  UGC  2.1(  2971)
...


AND

AmAcid  Codon      Number    /1000     Fraction   ..

Gly     GGG     13198.00      9.38      0.14
Gly     GGA     34123.00     24.26      0.36
...


So, anyone know how to get this downloaded codon usage data loaded up
into a Bio::CodonUsage::Table object?  Bio::CodonUsage::IO doesn't seem
to like parsing the standard formats.  Thanks.

From cjfields at uiuc.edu  Sun Apr 29 10:05:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 09:05:59 -0500
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <4634A11B.6090809@umd.edu>
References: <4634A11B.6090809@umd.edu>
Message-ID: <469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>

One example file (MmCT) can be found in the test data directory in  
the bioperl distribution (t/data directory) and some tests relevant  
to codon table usage are found in DBCUTG.t.

chris

On Apr 29, 2007, at 8:43 AM, Ben McIlwain wrote:

> I'm trying to load up a table of codon usage frequencies I've  
> downloaded
> from the web using Bio::CodonUsage::IO.  My code looks like this:
>
>     use Bio::CodonUsage::Table;
>     use Bio::CodonUsage::IO;
>     # ...
>     my $io = Bio::CodonUsage::IO->new(-file=>$freqFile);
>     my $codonTable = $io->next_data();
>
> Unfortunately, I can't seem to find any documentation on what  
> format the
> codon usage table file is expected to be in, and all of my best  
> guesses
> seem to be invalid, yielding the following error message:
>
> -------------------- WARNING ---------------------
> MSG: probable parsing error - should be 21 entries for 20aa + stop  
> codon
> ---------------------------------------------------
>
> I've tried using both formats that are available from the Codon Usage
> Database (easily the largest source of codon usage frequencies),
> available here: http://www.kazusa.or.jp/codon/
>
> The two formats I've tried and failed look like this:
>
> UUU 32.5( 45732)  UCU 15.3( 21588)  UAU 27.8( 39146)  UGU  6.3(  8796)
> UUC 14.3( 20101)  UCC  3.2(  4458)  UAC  9.3( 13016)  UGC  2.1(  2971)
> ...
>
>
> AND
>
> AmAcid  Codon      Number    /1000     Fraction   ..
>
> Gly     GGG     13198.00      9.38      0.14
> Gly     GGA     34123.00     24.26      0.36
> ...
>
>
> So, anyone know how to get this downloaded codon usage data loaded up
> into a Bio::CodonUsage::Table object?  Bio::CodonUsage::IO doesn't  
> seem
> to like parsing the standard formats.  Thanks.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cydeweys at gmail.com  Sun Apr 29 10:06:12 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 10:06:12 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
Message-ID: <4634A654.7010708@gmail.com>

Chris Fields wrote:
> One example file (MmCT) can be found in the test data directory in the
> bioperl distribution (t/data directory) and some tests relevant to codon
> table usage are found in DBCUTG.t.

I still get the same warning message even when running on the given test
data?  That doesn't sound right.

-------------------- WARNING ---------------------
MSG: probable parsing error - should be 21 entries for 20aa + stop codon
---------------------------------------------------


From cjfields at uiuc.edu  Sun Apr 29 17:50:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 16:50:15 -0500
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <4634A654.7010708@gmail.com>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
	<4634A654.7010708@gmail.com>
Message-ID: <DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>

Odd, when I run 'perl -I. t/DBCUTG.t' from CVS it works fine.  Of  
course, I am assuming that you are running the latest release (1.5.2).

Could you post a bug report with a script that generates the error?

chris

On Apr 29, 2007, at 9:06 AM, Ben McIlwain wrote:

> Chris Fields wrote:
>> One example file (MmCT) can be found in the test data directory in  
>> the
>> bioperl distribution (t/data directory) and some tests relevant to  
>> codon
>> table usage are found in DBCUTG.t.
>
> I still get the same warning message even when running on the given  
> test
> data?  That doesn't sound right.
>
> -------------------- WARNING ---------------------
> MSG: probable parsing error - should be 21 entries for 20aa + stop  
> codon
> ---------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cydeweys at gmail.com  Sun Apr 29 18:15:32 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 18:15:32 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
	<4634A654.7010708@gmail.com>
	<DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>
Message-ID: <46351904.4070202@gmail.com>

Chris Fields wrote:
> Odd, when I run 'perl -I. t/DBCUTG.t' from CVS it works fine.  Of
> course, I am assuming that you are running the latest release (1.5.2).
> 
> Could you post a bug report with a script that generates the error?

Sorry, it was my mistake.  I had turned off warnings and strict earlier
for debugging purposes and then forgot to turn them back on.  It turns
out I was trying to read in the codon frequencies when the filename was
an uninitialized string variable (I typoed the name).  Whoops.  Now that
I've spelled the variable name correctly, it is working.

From bernd at kirx.de  Sun Apr 29 18:57:53 2007
From: bernd at kirx.de (Bernd Mueller)
Date: Mon, 30 Apr 2007 00:57:53 +0200
Subject: [Bioperl-l] bioperl::db
In-Reply-To: <46335BD7.8040306@kirx.de>
References: <46335BD7.8040306@kirx.de>
Message-ID: <463522F1.2010406@kirx.de>

Hello list,

I figured out my problem. Actually it was because of problems in the 
versioning of bioperl. It is described to figure out the available 
versions of bioperl in CPAN but afterwards it is described to install a 
much higher version wich is not listed as distribution in CPAN. So it 
works fine now. Thanks anyway. Proficiency in reading results in success ;-)

But I have another question: Does anyone know how to retrieve free 
fulltext documents with EUtilities from Pubmed Central? All my queries 
result in a corpora of free and non-free articles.

Thanks and regards,

Bernd


Bernd Mueller wrote:
> Hi,
> 
> I followed those instructions on bioperl.org for installing bioperl via 
> cpan. But actually it is impossible for me to install the bioperl::db 
> module.
> 
> How does this work?
> 
> Moreover none of these Birney distribution are installable on my system. 
> After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always 
> fail. So I have to install the CRAFFI bundle but it does not seem that 
> Bio::DB module is included in this bundle because my programs using that 
> module do not work.
> 
> Help would be appreciated :)
> 
> Cheers,
> Bernd
> 
> Appendix:
> 
> cpan[6]> d /bioperl/
> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    BIRNEY/bioperl-1.2.tar.gz
> Distribution    BIRNEY/bioperl-1.4.tar.gz
> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
> 12 items found
> 
> 

-- 
Dipl.-Inform.(FH)
Bernd Mueller
phone: +49 179 2336692
email: bernd at kirx.de


From cjfields at uiuc.edu  Sun Apr 29 20:16:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:16:11 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
Message-ID: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>

Allen (or anyone),

What is the status of this module?  It requires a module not listed  
in the dependencies (WWW:Mechanize) and has no tests.

chris

From allenday at ucla.edu  Sun Apr 29 20:21:19 2007
From: allenday at ucla.edu (Allen Day)
Date: Sun, 29 Apr 2007 17:21:19 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
Message-ID: <5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>

Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
ago.  I only implemented for a few journals, so it never worked for a
large fraction of publications.  Probably it barely works or does not
work at all now b/c of how the PDF are scraped out of the HTML.  The
publisher sites are always modifying their HTML, presumably trying to
prevent automated download like this.

-Allen

On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Allen (or anyone),
>
> What is the status of this module?  It requires a module not listed
> in the dependencies (WWW:Mechanize) and has no tests.
>
> chris
>

From cjfields at uiuc.edu  Sun Apr 29 20:28:47 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:28:47 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
Message-ID: <29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>

Quick response!  Yep, I've run into this with a few publishers.   
Though they're supposed to have 'permanent' links for those of us who  
like to link to our pubs they frequently change (scary if that's  
their definition of permanent).

Did you want us to remove the code from CVS?

chris

On Apr 29, 2007, at 7:21 PM, Allen Day wrote:

> Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
> ago.  I only implemented for a few journals, so it never worked for a
> large fraction of publications.  Probably it barely works or does not
> work at all now b/c of how the PDF are scraped out of the HTML.  The
> publisher sites are always modifying their HTML, presumably trying to
> prevent automated download like this.
>
> -Allen
>
> On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> Allen (or anyone),
>>
>> What is the status of this module?  It requires a module not listed
>> in the dependencies (WWW:Mechanize) and has no tests.
>>
>> chris
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Apr 29 20:31:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:31:15 -0500
Subject: [Bioperl-l] PMC and EUtilities, was  bioperl::db
In-Reply-To: <463522F1.2010406@kirx.de>
References: <46335BD7.8040306@kirx.de> <463522F1.2010406@kirx.de>
Message-ID: <01DC1D72-8AFA-4C11-8795-D4787506C602@uiuc.edu>

There may be a way to limit the initial query to full text docs from  
esearch, then use the history to retrieve only the XML docs you  
want.  Is that what you mean?

BioPerl-based access to PMC is limited at best.  Bio::DB::EUtilities  
only returns raw PMC XML with no post-processing of raw data (for  
good reason, as EUtilities is meant to be an intermediate step).   
Allen Day's Bio::DB::Biblio::eutils module supposedly allows PMC  
queries.  I'm also pretty sure that PubMedXML != PMC XML, in other  
words the Bio::Biblio XML format parsers may not work on PMC XML.

chris

On Apr 29, 2007, at 5:57 PM, Bernd Mueller wrote:

> Hello list,
>
> I figured out my problem. Actually it was because of problems in the
> versioning of bioperl. It is described to figure out the available
> versions of bioperl in CPAN but afterwards it is described to  
> install a
> much higher version wich is not listed as distribution in CPAN. So it
> works fine now. Thanks anyway. Proficiency in reading results in  
> success ;-)
>
> But I have another question: Does anyone know how to retrieve free
> fulltext documents with EUtilities from Pubmed Central? All my queries
> result in a corpora of free and non-free articles.
>
> Thanks and regards,
>
> Bernd
>
>
> Bernd Mueller wrote:
>> Hi,
>>
>> I followed those instructions on bioperl.org for installing  
>> bioperl via
>> cpan. But actually it is impossible for me to install the bioperl::db
>> module.
>>
>> How does this work?
>>
>> Moreover none of these Birney distribution are installable on my  
>> system.
>> After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always
>> fail. So I have to install the CRAFFI bundle but it does not seem  
>> that
>> Bio::DB module is included in this bundle because my programs  
>> using that
>> module do not work.
>>
>> Help would be appreciated :)
>>
>> Cheers,
>> Bernd
>>
>> Appendix:
>>
>> cpan[6]> d /bioperl/
>> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.tar.gz
>> Distribution    BIRNEY/bioperl-1.4.tar.gz
>> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
>> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
>> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
>> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
>> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
>> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
>> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
>> 12 items found
>>
>>
>
> -- 
> Dipl.-Inform.(FH)
> Bernd Mueller
> phone: +49 179 2336692
> email: bernd at kirx.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From allenday at ucla.edu  Sun Apr 29 20:57:55 2007
From: allenday at ucla.edu (Allen Day)
Date: Sun, 29 Apr 2007 17:57:55 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
Message-ID: <5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>

Doesn't matter to me if it stays or not.  If you're cleaning house
feel free to get rid of it.

-Allen

On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Quick response!  Yep, I've run into this with a few publishers.
> Though they're supposed to have 'permanent' links for those of us who
> like to link to our pubs they frequently change (scary if that's
> their definition of permanent).
>
> Did you want us to remove the code from CVS?
>
> chris
>
> On Apr 29, 2007, at 7:21 PM, Allen Day wrote:
>
> > Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
> > ago.  I only implemented for a few journals, so it never worked for a
> > large fraction of publications.  Probably it barely works or does not
> > work at all now b/c of how the PDF are scraped out of the HTML.  The
> > publisher sites are always modifying their HTML, presumably trying to
> > prevent automated download like this.
> >
> > -Allen
> >
> > On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> Allen (or anyone),
> >>
> >> What is the status of this module?  It requires a module not listed
> >> in the dependencies (WWW:Mechanize) and has no tests.
> >>
> >> chris
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>

From cjfields at uiuc.edu  Mon Apr 30 11:15:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Apr 2007 10:15:16 -0500
Subject: [Bioperl-l] PMC and EUtilities, was  bioperl::db
In-Reply-To: <4635B1BD.9030402@kirx.de>
References: <46335BD7.8040306@kirx.de> <463522F1.2010406@kirx.de>
	<01DC1D72-8AFA-4C11-8795-D4787506C602@uiuc.edu>
	<4635B1BD.9030402@kirx.de>
Message-ID: <D11CE380-EDEC-4F7F-80EA-09D915EA79F0@uiuc.edu>

Bernd,

As a pretext to this discussion, I am in the middle of refactoring  
EUtilities; the next incarnation should have a similar API but will  
likely set parameters via simpler methods (no need for all the getter/ 
setters).

You'll likely have to parse out the tags yourself, AFAIK there is no  
BioPerl XML parser for PMC XML and a quick grep search turns up  
nothing but PubMed parsers.  If you aren't familiar with XML parsing  
you could try XML::Simple to get at what you want.  I would pass the  
XML in as small chunks (maybe by retrieving them in batches of 100 or  
less) and initially use Data::Dumper to determine the data structure  
XML::Simple returns (PMC XML has attributes and elements, so the  
structure will be more complex).  Then just iterate through articles  
and grab what you want.

I think the predominant portion of articles in PubMed Central are  
free full-text access (if not all):

http://www.pubmedcentral.nih.gov/about/faq.html#q9

You can retrieve them via ftp:

ftp://ftp.ncbi.nlm.nih.gov/pub/pmc

which contains an index file of all articles and their dir. location  
(the readme gives more info).

chris

On Apr 30, 2007, at 4:07 AM, Bernd Mueller wrote:

> Hello,
>
> I think so. The ids from my wanted articles are retrieved by  
> Bio::DB::EUtilities::esearch. Afterwards I download the articles  
> with Bio::DB::EUtilities::efetch. It is only possible to download  
> in XML format from PMC. So post processing is actually needed  
> because I want the articles in plain format.
>
> But I don't know why I have results of non-free articles, i.e.  
> abstracts where full articles should be found with a query  
> constraining to only free fulltext. In the query I limit the search  
> with the filter "AND free fulltext[filter]".Probably this is a  
> matter concerning not directly bioperl but the eutilities interface  
> of PMC.
>
> Regards,
> Bernd


From allenday at ucla.edu  Mon Apr 30 12:44:12 2007
From: allenday at ucla.edu (Allen Day)
Date: Mon, 30 Apr 2007 09:44:12 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <4635FDD8.8030704@jouy.inra.fr>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
	<5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>
	<4635FDD8.8030704@jouy.inra.fr>
Message-ID: <5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>

DOI is definitely the right way to do this.  It wasn't implemented
widely at the time I wrote this module.

-Allen

On 4/30/07, St?phane T?letch?a <steletch at jouy.inra.fr> wrote:
> Allen Day a ?crit :
> > Doesn't matter to me if it stays or not.  If you're cleaning house
> > feel free to get rid of it.
> >
> > -Allen
> >
>
> I've worked on something on the other way around: get information about
> a pdf from the DOI if present. Most recent publications do have a doi,
> and i use this as a target for my request.
>
> This does not solve the problem, but may help others, feel free to ask
> if it can help the ongoing work, the code is quite dirty ...
>
> St?phane
>
>
> --
> St?phane T?letch?a, PhD.                  http://www.steletch.org
> Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig
> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>


From cjfields at uiuc.edu  Mon Apr 30 13:55:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Apr 2007 12:55:01 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
	<5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>
	<4635FDD8.8030704@jouy.inra.fr>
	<5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>
Message-ID: <34F19F02-1B7B-41A1-90B1-F373C49BC012@uiuc.edu>

Agreed; even some seq. records may have DOI now.  PubMed and PMC XML  
contain this, so it is possible to parse the DOI out if one were  
inclined to incorporate this into Bio::Biblio (I added a doi() getter/ 
setter into Bio::Annotation::Reference a few months back).

chris

On Apr 30, 2007, at 11:44 AM, Allen Day wrote:

> DOI is definitely the right way to do this.  It wasn't implemented
> widely at the time I wrote this module.
>
> -Allen
>
> On 4/30/07, St?phane T?letch?a <steletch at jouy.inra.fr> wrote:
>> Allen Day a ?crit :
>>> Doesn't matter to me if it stays or not.  If you're cleaning house
>>> feel free to get rid of it.
>>>
>>> -Allen
>>>
>>
>> I've worked on something on the other way around: get information  
>> about
>> a pdf from the DOI if present. Most recent publications do have a  
>> doi,
>> and i use this as a target for my request.
>>
>> This does not solve the problem, but may help others, feel free to  
>> ask
>> if it can help the ongoing work, the code is quite dirty ...
>>
>> St?phane
>>
>>
>> --
>> St?phane T?letch?a, PhD.                  http://www.steletch.org
>> Unit? Math?matique Informatique et G?nome http:// 
>> migale.jouy.inra.fr/mig
>> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
>> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr 30 16:05:45 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 30 Apr 2007 13:05:45 -0700 (PDT)
Subject: [Bioperl-l] generate a fasta file from the blast report
Message-ID: <10259461.post@talk.nabble.com>


hi all,
if i have the following script working on my blast report, can anyone plz
tell me how can i generate a fasta format file of just the hits (subject)
sequence.
thanks alot.
 
use strict;
 use Bio::SearchIO;
   
    my $in = new Bio::SearchIO(-format => 'blast', 
                               -file   => 'report.bls');
    while( my $result = $in->next_result ) {
      while( my $hit = $result->next_hit ) {
        while( my $hsp = $hit->next_hsp ) {
          if( $hsp->length('total') > 100 &&
              $hsp->percent_identity >= 75 ) {
              print "Hit= ", $hit->name, 
                    ", len=",$hsp->length('total'), 
                    ", percent_id=", $hsp->percent_identity, "\n";
          }
        }  
      }
    }
-- 
View this message in context: http://www.nabble.com/generate-a-fasta-file-from-the-blast-report-tf3671549.html#a10259461
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From Francoise.LECOMTE at biogemma.com  Mon Apr 30 06:35:03 2007
From: Francoise.LECOMTE at biogemma.com (Francoise.LECOMTE at biogemma.com)
Date: Mon, 30 Apr 2007 12:35:03 +0200
Subject: [Bioperl-l] Pb makefile
Message-ID: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>

Hi
I try to install biopoerl1.4 on Tru64 plateform and I've got a message 
"make:line too long" when I run the command make install
How can I solve it ? How disable man pages installaton in Makefile.PL if 
it can sove this problem 

Best regards 

Fran?oise Lecomte 


From torsten.seemann at infotech.monash.edu.au  Mon Apr 30 20:22:35 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 1 May 2007 10:22:35 +1000
Subject: [Bioperl-l] generate a fasta file from the blast report
In-Reply-To: <10259461.post@talk.nabble.com>
References: <10259461.post@talk.nabble.com>
Message-ID: <a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>

> if i have the following script working on my blast report, can anyone plz
> tell me how can i generate a fasta format file of just the hits (subject)
> sequence.

Do you want the WHOLE subject sequence, or just the region that hit the query?

The hit is available as $hsp->hit_string();
http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/GenericHSP.html#CODE11

The whole subject sequence would require the original Fasta input file.

By the way, are your questions for work related issues, or is this
your homework or assignment for a study course?

--Torsten

From dmessina at wustl.edu  Sun Apr  1 22:54:58 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 1 Apr 2007 21:54:58 -0500
Subject: [Bioperl-l] installation bioperl
Message-ID: <6EFFF13A-66E7-418F-8B8E-A8AA8826DE83@wustl.edu>

We need more information to be able to help you. Could you please  
show us the actual output you see when trying to install Bioperl?

Also, we need to know:

- what operating system you have
- what version of Bioperl you are trying to install

See

http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance

and please read the rest of the document, too.

Dave


From aharry2001 at yahoo.com  Mon Apr  2 06:09:25 2007
From: aharry2001 at yahoo.com (Ambrose)
Date: Mon, 2 Apr 2007 03:09:25 -0700 (PDT)
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <B04E1B58-9BE1-407A-91D2-6EA9C0BA2A38@uiuc.edu>
Message-ID: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>

Hello All,
             I have some problems parsing KEGG using bioperl. I get out of memory problem.I current have 1G RAM.Can some tell me why this is happening and how it can be solved.It is beacuse the objects passed to bioiperl are so big or what?

best regrads
Ambrose

 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.


From cjfields at uiuc.edu  Mon Apr  2 08:43:18 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 07:43:18 -0500
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>
References: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>
Message-ID: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu>

This doesn't really explain much beyond stating you are having  
problems.  You need to post some code (to the mail list!) and let us  
know what version of BioPerl you are using.

chris

On Apr 2, 2007, at 5:09 AM, Ambrose wrote:

> Hello All,
>              I have some problems parsing KEGG using bioperl. I get  
> out of memory problem.I current have 1G RAM.Can some tell me why  
> this is happening and how it can be solved.It is beacuse the  
> objects passed to bioiperl are so big or what?
>
> best regrads
> Ambrose
>
>
> ---------------------------------
> TV dinner still cooling?
> Check out "Tonight's Picks" on Yahoo! TV.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From aharry2001 at yahoo.com  Mon Apr  2 09:56:33 2007
From: aharry2001 at yahoo.com (Ambrose)
Date: Mon, 2 Apr 2007 06:56:33 -0700 (PDT)
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu>
Message-ID: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>


Hello ALL,

I have the code below,which parses my kegg files.A host of the files are parsed and the information is inserted into my databases but unfortunate after the program runs for some hours it stops showing the message out of memory.I assume that this happens because the bioperl object is too big.Please just check the code below

best regards Ambrose


#!/usr/local/ActivePerl/bin/perl
#
#

use strict;
use Bio::SeqIO;
use Bio::FASTASequence;
use DBI;
use Benchmark  qw(:all) ;

my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1,$pathway_id1,$pathway_name1,$ec_num);
my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,%dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
my( @kg_id);
my $db="gbdb";
my $host="localhost";
my $userid="root";
my $passwd="ubuntu";
my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/mysqld.sock";
my ($t1,$t2);
my $dbh = DBI->connect($connectionInfo,$userid,$passwd);
my $time_used;
 
 
 eval { $dbh->do("DROP TABLE kegginfo") };
 print "Dropping kegginfo failed: $@\n" if $@;
 $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL AUTO_INCREMENT,
                                   up_id INT UNSIGNED REFERENCES uniprotinfo(up_id),
                                                                  filename VARCHAR(50),
                                                    kegg_id VARCHAR(50),
                                   keggaccn VARCHAR(50),
                                                                  description VARCHAR(250),
                                   ec_numbers VARCHAR(250),
                                              pathway_id VARCHAR(250),
                                              pathway_name VARCHAR(250),
                                              crc64 VARCHAR(50),
                                   ko_id VARCHAR(50),
                                   pfam_id VARCHAR(50),
                                   ncbigi_id VARCHAR(50),
                                   ncbigeneid_id VARCHAR(50),
                                   uniprot_id VARCHAR(50),
                                   prosite_id VARCHAR(50),
                                   PRIMARY KEY (kg_id)
                                 )");
                                 

eval { $dbh->do("DROP TABLE keggntsequence") };
print "Dropping keggntsequence failed: $@\n" if $@;
$dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id),
                                                    keggaccn VARCHAR(50),
                                  nucleotidesequence text
                                 )");

eval { $dbh->do("DROP TABLE keggaasequence") };
print "Dropping keggaasequence failed: $@\n" if $@;
$dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id),
                                                    keggaccn VARCHAR(50),
                                                    crc64 VARCHAR(50),
                                  aminoacidsequence text
                                 )");
eval { $dbh->do("DROP TABLE timestable") };
print "Dropping timestable failed: $@\n" if $@;
$dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT NULL AUTO_INCREMENT,
                                   genome VARCHAR(100),
                                    totaltime_seconds int(100),
                                                                  PRIMARY KEY(aut_id))");


open (LIST, "genomes.list") || die "Cannot open input kegg genomes file genomes.list\n $! \n";
$t1=new Benchmark;
my @genomelist = ();
while (my $line=<LIST>) {
    #ignore comment lines
    if ($line !~ /^#/) {
        chomp $line;
                
        push (@genomelist, $line); #store the filename
    }
}

close LIST;
my $count=0;
foreach my $genomefile (@genomelist) {

    #in case the user fails to remove some strange files from
    #the genomes.list file.. check for the KEGG format
    my $check=checkKeggFormat($genomefile);
    if ($check==0) {
        #if file is not kegg, start with the next one...
        print "ERROR: $genomefile doesn't look like a KEGG file to me! \n";
        #<stdin>;
        next;
    }
#print $genomefile,"\n";
    my $stream = Bio::SeqIO->new(-file => $genomefile, -format => 'KEGG');

    while ( my $seq = $stream->next_seq() ) {

        my $primary_id = $seq->primary_id;
        my $display_id = $seq->display_id; #name
        my $keggaccn   = $seq->accession; #accn
        my @description = $seq->annotation->get_Annotations('description');
        
        my @dblinks     = $seq->annotation->get_Annotations('dblink');
        my @orthologs   = $seq->annotation->get_Annotations('ortholog');
        my @orthologs   = grep {$_->database eq 'KO'} $seq->annotation->get_Annotations('dblink');
        my @class       = $seq->annotation->get_Annotations('pathway');
         $ntseq{$keggaccn} = $seq->seq;
         $aaseq{$keggaccn} = $seq->translate->seq; 
         $aaseq{$keggaccn} =~s /\*$//;
                 my $fasta = ">".$count."\n".$aaseq{$keggaccn};
         my $newseq = Bio::FASTASequence->new($fasta);
         $crc64{$keggaccn}=$newseq->getCrc64();
#print $keggaccn,"crc64:$crc64{$keggaccn}\n";
        
        $count++;
        if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no keggaccn\n";
        next;}    

        if(@dblinks)
        {
                my @dblink_KO=();
                my @dblink_Pfam=();
                my @dblink_PROSITE=();
                my @dblink_NCBIGI=();
                my @dblink_NCBIGENEID=();
                my @dblink_UniProt=();
        
                foreach my $ele (@dblinks) {
                    if ($ele =~ /^KO:/){
                        $ele=~s/KO://;
                        push (@dblink_KO,$ele);
                        $dblink_KO{$keggaccn}=$ele;
                        next;
                    }
                        #parse Pfam: dblink
                    if ($ele =~ /^Pfam:/){
                        $ele=~s/Pfam://;
                        push (@dblink_Pfam,$ele);
                        $dblink_Pfam{$keggaccn}=$ele;
                        next;
                    }
                        #parse PROSITE: dblink
                    if ($ele =~ /^PROSITE:/){
                        $ele=~s/PROSITE://;
                        push (@dblink_PROSITE,$ele);
                        $dblink_PROSITE{$keggaccn}=$ele;
                        next;
                    }
                        #parse NCBI-GI: dblink
                    if ($ele =~ /^NCBI-GI:/){
                        $ele=~s/NCBI-GI://;
                        push (@dblink_NCBIGI,$ele);
                        $dblink_NCBIGI{$keggaccn}=$ele;
                        next;
                    }
                        #parse NCBI-GeneID: dblink
                    if ($ele =~ /^NCBI-GeneID:/){
                        $ele=~s/NCBI-GeneID://;
                        push (@dblink_NCBIGENEID,$ele);
                        $dblink_NCBIGENEID{$keggaccn}=$ele;
                        next;
                        }
                        #parse UniProt: dblink
                    if ($ele =~ /^UniProt:/){
                        $ele=~s/UniProt://;
                        push (@dblink_UniProt,$ele);
                        $dblink_UniProt{$keggaccn}=$ele;
                        next;
                    }
            
                }#end foreach     #finished parsing all dblinks    
        }#end if @dblinks
        if(@class)
        {
            foreach my $pathway (@class) {
    
                $pathway=~s/^\s+|\s+$//;
                my @arr = split (/\s+/,$pathway);
                my $pathway_id = $arr[0];
                shift @arr;
                my $pathway_name = join(" ", at arr);
                $pathway_name{$keggaccn}=$pathway_name;
                $pathway_id{$keggaccn}=$pathway_id;
                #print $pathway_id{$keggaccn},"\t",$pathway_name{$keggaccn},"\n";
                                    
            }
            
        }
        
        my @ecnumbers=();
        @ecnumbers = extractECnumbers(@description);
        if(@ecnumbers)
        {
                if (@ecnumbers!=0) 
                {
                    foreach my $ecn (@ecnumbers) 
                    {
                       $ecnumbers{$keggaccn}=$ecn;
                    }#end foreach
                }
                else {
                    #print "ECnumbers:\n";
                     }
        }
        
        
#         print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t",$dblink_NCBIGENEID{$keggaccn},
#                 "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers{$keggaccn}","\t",
#                  "p1:$pathway_id{$keggaccn}","\t","p2:$pathway_name{$keggaccn}","\n";
#         
                $dbh->do("INSERT INTO kegginfo VALUES (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)",
         undef,"NULL","NULL",$genomefile,$display_id,$keggaccn, at description,$ecnumbers{$keggaccn},
                  $pathway_id{$keggaccn},$pathway_name{$keggaccn},$crc64{$keggaccn},$dblink_KO{$keggaccn},
                 $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn},$dblink_NCBIGENEID{$keggaccn},
                 $dblink_UniProt{$keggaccn},$dblink_PROSITE{$keggaccn});
         

        $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)",
            undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn});
                        

        $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)",
            undef,"",$keggaccn,$ntseq{$keggaccn});
                
               
    }
     $t2=new Benchmark;
    $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile");
    $dbh->do("INSERT INTO timestable VALUES (?,?,?)",
    undef,"NULL",$genomefile,$time_used);
 
}


$dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)");
print "Index created on kegginfo\n";

$dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence (kg_id,keggaccn)");
print "Index created on keggaasequence\n";

$dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence (kg_id,keggaccn)");
print "Index created on keggntsequence\n";


print"Updating the tables................\n";

    
$dbh->do("update kegginfo,keggaasequence set keggaasequence.kg_id=kegginfo.kg_id 
         where 
                kegginfo.keggaccn=keggaasequence.keggaccn");
        print " keggaasequence kg_id\n";

$dbh->do("update kegginfo,keggntsequence set keggntsequence.kg_id=kegginfo.kg_id 
         where 
                kegginfo.keggaccn=keggntsequence.keggaccn");
        print " keggaasequence kg_id\n";


sub extractECnumbers ($) {
    #sample description lines
     #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 2.7.7.2]
    #ATP synthase F0 subunit c [EC:3.6.3.14]

    my @desc=shift;
    my $description = join ("", at desc);
    my @ecnumbers=();
    #print "parsing ec for $description..\n";
    #check if EC number exists
    if ($description=~/\[EC:/) {
        
        my @array = split (/\[EC:/,$description);
        $array[1]=~s/]//g;
        shift @array; #remove the annotation , only EC numbers remain
        foreach my $ele (@array) {
            $ele=~s/^\s+|\s+$//g;
            $ele= "EC:".$ele;
            push (@ecnumbers,$ele);
        }    
        return @ecnumbers;
    }
    else {
        #return an empty value
        return ;

    }

}


sub checkKeggFormat ($) {
=head2

checkKeggFormat

make sure that the file is a valid KEGG file
function checks the first two lines,
1st must start with ENTRY
2nd must start with DEFINITION

returns 0 or 1

=cut
    my $genomefile=shift;

    open (TEST,$genomefile) || die "Cannot open file $genomefile for reading \n";
    my $testline=<TEST>;
#print "$testline\n";
    if ($testline=~/^ENTRY/) {
        #continue
        #$testline=<TEST>;#double check
        #if ($testline=~/^NAME/) {
            #this looks like a valid kegg file
            return 1;
        #}
        #else {
        #    close TEST;
        #    return 0;
        #}
    }
    else {
        close TEST;    
        return 0;
    }

}

sub timeThis ($$$) 
{
    my ($start,$end,$message) = @_;
    my $td = timediff($end, $start);
    my $t = timestr($td);    
        print "$message : ",$t,"\n";
        my @array = split (/\s+/,$t);
#20 wallclock secs (14.23 usr +  0.84 sys = 15.07 CPU)
        return $array[0]; #return the no. of seconds.
}

   
---------------------------------
Looking for earth-friendly autos? 
 Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.  


From e-just at northwestern.edu  Mon Apr  2 10:12:33 2007
From: e-just at northwestern.edu (Eric Just)
Date: Mon, 2 Apr 2007 09:12:33 -0500
Subject: [Bioperl-l] Can't locate object method "seq_start" via package
	"Bio::DB::GenBank"
Message-ID: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>

Hello,

I am getting this error while running a bioperl script that I had been using
in bioperl 1.4.  On upgradeing to bioperl 1.5.2 I get the following fatal
error

Can't locate object method "seq_start" via package "Bio::DB::GenBank"

My script is as follows:


use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

my $gb = new Bio::DB::GenBank();

my $query = Bio::DB::Query::GenBank->new(
      -query   =>'txid44689[Organism:noexp]',
      -reldate => 60,
      -db      => 'nucleotide'

);

my $in = $gb->get_Stream_by_query($query);

while ( my $seq = $in->next_seq()) {
      print "do something";
      #....
}


I noticed that seq_start is created in the begin block of
Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have
expericence troubleshooting this kind of autoloaded method.  Any idea where
to start?

Thanks

Eric


From e-just at northwestern.edu  Mon Apr  2 10:15:28 2007
From: e-just at northwestern.edu (Eric Just)
Date: Mon, 2 Apr 2007 09:15:28 -0500
Subject: [Bioperl-l] Can't locate object method "seq_start" via package
	"Bio::DB::GenBank"
In-Reply-To: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>
References: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>
Message-ID: <fa1fe35c0704020715u1f14f273n100d4e21f848603d@mail.gmail.com>

Sorry about that.

As soon as I sent the email I found my problem ( an old NCBIHelper in my
inheritance path ).   There is no bug here.

Eric


On 4/2/07, Eric Just <e-just at northwestern.edu> wrote:
>
> Hello,
>
> I am getting this error while running a bioperl script that I had been
> using in bioperl 1.4.  On upgradeing to bioperl 1.5.2 I get the following
> fatal error
>
> Can't locate object method "seq_start" via package "Bio::DB::GenBank"
>
> My script is as follows:
>
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> my $gb = new Bio::DB::GenBank();
>
> my $query = Bio::DB::Query::GenBank->new(
>       -query   =>'txid44689[Organism:noexp]',
>       -reldate => 60,
>       -db      => 'nucleotide'
>
> );
>
> my $in = $gb->get_Stream_by_query($query);
>
> while ( my $seq = $in->next_seq()) {
>       print "do something";
>       #....
> }
>
>
>
> I noticed that seq_start is created in the begin block of
> Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have
> expericence troubleshooting this kind of autoloaded method.  Any idea where
> to start?
>
> Thanks
>
> Eric
>


From cjfields at uiuc.edu  Mon Apr  2 11:32:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 10:32:59 -0500
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>
References: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>
Message-ID: <38475C93-FB21-4BC4-BF5D-7F48493E8EE2@uiuc.edu>

Ambrose,

Data is persisting in your hashes (in particular DBLink objects),  
which is eating away at your memory.  If I take a sample KEGG gene  
file and simply parse it:

while (my $seq = $io->next_seq) {
     print $seq->accession,"\n";
}

there are no memory issues, but if I store the data in hashes  
declared outside the loop:

my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);

while (my $seq = $io->next_seq) {
     # store Bio::Seq data in hashes
}

I see problems with only one genome file with KEGG records.  You'll  
definitely run into memory issues if you are parsing many genome  
files, which you appear to be:

my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);

for my $genomefile (@genomelist) {
     while (my $seq = $io->next_seq) {
         # store Bio::Seq data in hashes
     }
}

Localizing the hashes to the genome or sequence loops should prevent  
the memory problem.

Note that the DBLink Annotation objects are overloaded so they act  
like a string ($ele =~ /^KO:/) but are actually  
Bio::Annotation::DBLink objects, something we will likely get rid of  
in the near future.

chris

On Apr 2, 2007, at 8:56 AM, Ambrose wrote:

>
>
> Hello ALL,
>
> I have the code below,which parses my kegg files.A host of the  
> files are parsed and the information is inserted into my databases  
> but unfortunate after the program runs for some hours it stops  
> showing the message out of memory.I assume that this happens  
> because the bioperl object is too big.Please just check the code below
>
> best regards Ambrose
>
>
> #!/usr/local/ActivePerl/bin/perl
> #
> #
>
> use strict;
> use Bio::SeqIO;
> use Bio::FASTASequence;
> use DBI;
> use Benchmark  qw(:all) ;
>
> my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1, 
> $pathway_id1,$pathway_name1,$ec_num);
> my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
> dblink_NCBIGENEID,%dblink_UniProt);
> my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
> my( @kg_id);
> my $db="gbdb";
> my $host="localhost";
> my $userid="root";
> my $passwd="ubuntu";
> my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/ 
> mysqld.sock";
> my ($t1,$t2);
> my $dbh = DBI->connect($connectionInfo,$userid,$passwd);
> my $time_used;
>
>
>
>  eval { $dbh->do("DROP TABLE kegginfo") };
>  print "Dropping kegginfo failed: $@\n" if $@;
>  $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL  
> AUTO_INCREMENT,
>                                    up_id INT UNSIGNED REFERENCES  
> uniprotinfo(up_id),
>                                                                    
> filename VARCHAR(50),
>                                                     kegg_id VARCHAR 
> (50),
>                                    keggaccn VARCHAR(50),
>                                                                    
> description VARCHAR(250),
>                                    ec_numbers VARCHAR(250),
>                                               pathway_id VARCHAR(250),
>                                               pathway_name VARCHAR 
> (250),
>                                               crc64 VARCHAR(50),
>                                    ko_id VARCHAR(50),
>                                    pfam_id VARCHAR(50),
>                                    ncbigi_id VARCHAR(50),
>                                    ncbigeneid_id VARCHAR(50),
>                                    uniprot_id VARCHAR(50),
>                                    prosite_id VARCHAR(50),
>                                    PRIMARY KEY (kg_id)
>                                  )");
>
>
> eval { $dbh->do("DROP TABLE keggntsequence") };
> print "Dropping keggntsequence failed: $@\n" if $@;
> $dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED  
> REFERENCES uniprotinfo(kg_id),
>                                                     keggaccn VARCHAR 
> (50),
>                                   nucleotidesequence text
>                                  )");
>
> eval { $dbh->do("DROP TABLE keggaasequence") };
> print "Dropping keggaasequence failed: $@\n" if $@;
> $dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED  
> REFERENCES uniprotinfo(kg_id),
>                                                     keggaccn VARCHAR 
> (50),
>                                                     crc64 VARCHAR(50),
>                                   aminoacidsequence text
>                                  )");
> eval { $dbh->do("DROP TABLE timestable") };
> print "Dropping timestable failed: $@\n" if $@;
> $dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT  
> NULL AUTO_INCREMENT,
>                                    genome VARCHAR(100),
>                                     totaltime_seconds int(100),
>                                                                    
> PRIMARY KEY(aut_id))");
>
>
>
> open (LIST, "genomes.list") || die "Cannot open input kegg genomes  
> file genomes.list\n $! \n";
> $t1=new Benchmark;
> my @genomelist = ();
> while (my $line=<LIST>) {
>     #ignore comment lines
>     if ($line !~ /^#/) {
>         chomp $line;
>
>         push (@genomelist, $line); #store the filename
>     }
> }
>
> close LIST;
> my $count=0;
> foreach my $genomefile (@genomelist) {
>
>     #in case the user fails to remove some strange files from
>     #the genomes.list file.. check for the KEGG format
>     my $check=checkKeggFormat($genomefile);
>     if ($check==0) {
>         #if file is not kegg, start with the next one...
>         print "ERROR: $genomefile doesn't look like a KEGG file to  
> me! \n";
>         #<stdin>;
>         next;
>     }
> #print $genomefile,"\n";
>     my $stream = Bio::SeqIO->new(-file => $genomefile, -format =>  
> 'KEGG');
>
>     while ( my $seq = $stream->next_seq() ) {
>
>         my $primary_id = $seq->primary_id;
>         my $display_id = $seq->display_id; #name
>         my $keggaccn   = $seq->accession; #accn
>         my @description = $seq->annotation->get_Annotations 
> ('description');
>
>         my @dblinks     = $seq->annotation->get_Annotations('dblink');
>         my @orthologs   = $seq->annotation->get_Annotations 
> ('ortholog');
>         my @orthologs   = grep {$_->database eq 'KO'} $seq- 
> >annotation->get_Annotations('dblink');
>         my @class       = $seq->annotation->get_Annotations 
> ('pathway');
>          $ntseq{$keggaccn} = $seq->seq;
>          $aaseq{$keggaccn} = $seq->translate->seq;
>          $aaseq{$keggaccn} =~s /\*$//;
>                  my $fasta = ">".$count."\n".$aaseq{$keggaccn};
>          my $newseq = Bio::FASTASequence->new($fasta);
>          $crc64{$keggaccn}=$newseq->getCrc64();
> #print $keggaccn,"crc64:$crc64{$keggaccn}\n";
>
>         $count++;
>         if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no  
> keggaccn\n";
>         next;}
>
>         if(@dblinks)
>         {
>                 my @dblink_KO=();
>                 my @dblink_Pfam=();
>                 my @dblink_PROSITE=();
>                 my @dblink_NCBIGI=();
>                 my @dblink_NCBIGENEID=();
>                 my @dblink_UniProt=();
>
>                 foreach my $ele (@dblinks) {
>                     if ($ele =~ /^KO:/){
>                         $ele=~s/KO://;
>                         push (@dblink_KO,$ele);
>                         $dblink_KO{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse Pfam: dblink
>                     if ($ele =~ /^Pfam:/){
>                         $ele=~s/Pfam://;
>                         push (@dblink_Pfam,$ele);
>                         $dblink_Pfam{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse PROSITE: dblink
>                     if ($ele =~ /^PROSITE:/){
>                         $ele=~s/PROSITE://;
>                         push (@dblink_PROSITE,$ele);
>                         $dblink_PROSITE{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse NCBI-GI: dblink
>                     if ($ele =~ /^NCBI-GI:/){
>                         $ele=~s/NCBI-GI://;
>                         push (@dblink_NCBIGI,$ele);
>                         $dblink_NCBIGI{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse NCBI-GeneID: dblink
>                     if ($ele =~ /^NCBI-GeneID:/){
>                         $ele=~s/NCBI-GeneID://;
>                         push (@dblink_NCBIGENEID,$ele);
>                         $dblink_NCBIGENEID{$keggaccn}=$ele;
>                         next;
>                         }
>                         #parse UniProt: dblink
>                     if ($ele =~ /^UniProt:/){
>                         $ele=~s/UniProt://;
>                         push (@dblink_UniProt,$ele);
>                         $dblink_UniProt{$keggaccn}=$ele;
>                         next;
>                     }
>
>                 }#end foreach     #finished parsing all dblinks
>         }#end if @dblinks
>         if(@class)
>         {
>             foreach my $pathway (@class) {
>
>                 $pathway=~s/^\s+|\s+$//;
>                 my @arr = split (/\s+/,$pathway);
>                 my $pathway_id = $arr[0];
>                 shift @arr;
>                 my $pathway_name = join(" ", at arr);
>                 $pathway_name{$keggaccn}=$pathway_name;
>                 $pathway_id{$keggaccn}=$pathway_id;
>                 #print $pathway_id{$keggaccn},"\t",$pathway_name 
> {$keggaccn},"\n";
>
>             }
>
>         }
>
>         my @ecnumbers=();
>         @ecnumbers = extractECnumbers(@description);
>         if(@ecnumbers)
>         {
>                 if (@ecnumbers!=0)
>                 {
>                     foreach my $ecn (@ecnumbers)
>                     {
>                        $ecnumbers{$keggaccn}=$ecn;
>                     }#end foreach
>                 }
>                 else {
>                     #print "ECnumbers:\n";
>                      }
>         }
>
>
> #         print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t", 
> $dblink_NCBIGENEID{$keggaccn},
> #                 "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers 
> {$keggaccn}","\t",
> #                  "p1:$pathway_id{$keggaccn}","\t","p2: 
> $pathway_name{$keggaccn}","\n";
> #
>                 $dbh->do("INSERT INTO kegginfo VALUES  
> (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)",
>          undef,"NULL","NULL",$genomefile,$display_id, 
> $keggaccn, at description,$ecnumbers{$keggaccn},
>                   $pathway_id{$keggaccn},$pathway_name{$keggaccn}, 
> $crc64{$keggaccn},$dblink_KO{$keggaccn},
>                  $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn}, 
> $dblink_NCBIGENEID{$keggaccn},
>                  $dblink_UniProt{$keggaccn},$dblink_PROSITE 
> {$keggaccn});
>
>
>         $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)",
>             undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn});
>
>
>         $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)",
>             undef,"",$keggaccn,$ntseq{$keggaccn});
>
>
>     }
>      $t2=new Benchmark;
>     $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile");
>     $dbh->do("INSERT INTO timestable VALUES (?,?,?)",
>     undef,"NULL",$genomefile,$time_used);
>
> }
>
>
> $dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)");
> print "Index created on kegginfo\n";
>
> $dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence  
> (kg_id,keggaccn)");
> print "Index created on keggaasequence\n";
>
> $dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence  
> (kg_id,keggaccn)");
> print "Index created on keggntsequence\n";
>
>
> print"Updating the tables................\n";
>
>
> $dbh->do("update kegginfo,keggaasequence set  
> keggaasequence.kg_id=kegginfo.kg_id
>          where
>                 kegginfo.keggaccn=keggaasequence.keggaccn");
>         print " keggaasequence kg_id\n";
>
> $dbh->do("update kegginfo,keggntsequence set  
> keggntsequence.kg_id=kegginfo.kg_id
>          where
>                 kegginfo.keggaccn=keggntsequence.keggaccn");
>         print " keggaasequence kg_id\n";
>
>
>
> sub extractECnumbers ($) {
>     #sample description lines
>      #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26  
> 2.7.7.2]
>     #ATP synthase F0 subunit c [EC:3.6.3.14]
>
>     my @desc=shift;
>     my $description = join ("", at desc);
>     my @ecnumbers=();
>     #print "parsing ec for $description..\n";
>     #check if EC number exists
>     if ($description=~/\[EC:/) {
>
>         my @array = split (/\[EC:/,$description);
>         $array[1]=~s/]//g;
>         shift @array; #remove the annotation , only EC numbers remain
>         foreach my $ele (@array) {
>             $ele=~s/^\s+|\s+$//g;
>             $ele= "EC:".$ele;
>             push (@ecnumbers,$ele);
>         }
>         return @ecnumbers;
>     }
>     else {
>         #return an empty value
>         return ;
>
>     }
>
> }
>
>
>
>
>
>
>
> sub checkKeggFormat ($) {
> =head2
>
> checkKeggFormat
>
> make sure that the file is a valid KEGG file
> function checks the first two lines,
> 1st must start with ENTRY
> 2nd must start with DEFINITION
>
> returns 0 or 1
>
> =cut
>     my $genomefile=shift;
>
>     open (TEST,$genomefile) || die "Cannot open file $genomefile  
> for reading \n";
>     my $testline=<TEST>;
> #print "$testline\n";
>     if ($testline=~/^ENTRY/) {
>         #continue
>         #$testline=<TEST>;#double check
>         #if ($testline=~/^NAME/) {
>             #this looks like a valid kegg file
>             return 1;
>         #}
>         #else {
>         #    close TEST;
>         #    return 0;
>         #}
>     }
>     else {
>         close TEST;
>         return 0;
>     }
>
> }
>
> sub timeThis ($$$)
> {
>     my ($start,$end,$message) = @_;
>     my $td = timediff($end, $start);
>     my $t = timestr($td);
>         print "$message : ",$t,"\n";
>         my @array = split (/\s+/,$t);
> #20 wallclock secs (14.23 usr +  0.84 sys = 15.07 CPU)
>         return $array[0]; #return the no. of seconds.
> }
>
>
>
>
> ---------------------------------
> Looking for earth-friendly autos?
>  Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Mon Apr  2 12:19:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 2 Apr 2007 11:19:51 -0500
Subject: [Bioperl-l] installation bioperl
Message-ID: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>

Hi Fahmi,

Please include the list on the reply so that others can comment, too.

Yes, it appears the machine you are installing on does not have an  
internet connection. You probably will want to resolve that problem  
before dealing with Bioperl. Alternatively, you could simply install  
and use Bioperl  on the machine which does have an internet connection.

If you really need to get Bioperl installed on that machine, however,  
probably the easiest way would be to find a machine that does have an  
internet connection, install CPAN::Mini, and use it to make a local  
mirror of CPAN. You could then copy that local mirror over to the  
machine without the internet connection and point that machine's cpan  
at the local mirror (read the CPAN documentation to find out how to  
do this). Also, the BioPerl install instructions list several  
external packages that you will need to use some parts of Bioperl  
(e.g. GD). Again, you can download those distributions using the  
machine with the internet connection and copy them over.

Dave


On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote:

> thank you for answer. I will give you the maximum of informations  
> inorder to be able to diagnostic the problem:
>
> i use the linux mandriva 2006
> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained  
> from the url:
> http://www.bioperl.org/wiki/Release_1.5.2
> afetr that i made these commands which i found in the url
> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph  
> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ')
>
> >gunzip bioperl-1.5.2_102.tar.gz
> >tar xvf bioperl-1.5.2_102.tar
> >cd bioperl-1.5.2_102
> after that i made the command
> >perl Build.PL
> i obtained the text
> this package requires Module::Build v0.2805 or greater to install  
> itself
> install Module::Build now from CPAN?[y]
> i pushed enter and i obtained many lines such as
> System call"/usr/bin/wget -0-"ftp://.perl.org/pub/CPAN/modules/ 
> modlist.data.gz">home/fahmi/.cpan/sources/modules/03modlist.data
> Not connected
> cant access URL ftp://ftp.perl.org/CPAN/modules/modlist.data.gz
> ...
> i'm trying to install bioperl whithout having internet connection  
> beacause i don't know whay linux didn't detect my ethernet card.
> please tell me what should i do.
> tahnk you for your collaboration.


From cjfields at uiuc.edu  Mon Apr  2 14:10:30 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 13:10:30 -0500
Subject: [Bioperl-l] Fwd: BLAST beta, URLAPI, and BioPerl (RemoteBlast users)
References: <CD04BF03C87B6240A342461CDE1DEC0304091DB4@NIHCESMLBX8.nih.gov>
Message-ID: <002E7937-10DF-43CE-96F6-71DC743C1314@uiuc.edu>

This may be of interest to anyone using RemoteBlast.

For anyone who uses RemoteBlast, the new changes to NCBI's BLAST  
interface shouldn't affect anything (Scott tested it out).  If there  
are any abnormalities with RemoteBlast queries over the next few  
weeks let us know.

chris

Begin forwarded message:

> From: "Mcginnis, Scott \(NIH/NLM/NCBI\) [E]"  
> <mcginnis at ncbi.nlm.nih.gov>
> Date: April 2, 2007 12:53:33 PM CDT
> To: "Chris Fields" <cjfields at uiuc.edu>
> Subject: RE: BLAST beta, URLAPI, and BioPerl
>
> Hi Chris:
>
> We are ready to make the new pages the defaults come April 16th. An  
> announcement is going out shortly. There are some very minor  
> changes to the URL API and I have listed them below. IT will be  
> part of the announcements. Please note we actually tested BioPerl  
> and it seems to me fine with the new pages. If you have a news on  
> your site or a mailing list you might want to pass this on.
>
> A Note About URLAPI
>
> The new BLAST pages support URLAPI, a protocol that scripts and
> programs use to run BLAST searches and retrieve results over
> HTTP. (For more on URLAPI, see
> http://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html). The following
> information only applies to you if you develop or are responsible
> for software that uses URLAPI.
>
> The new pages have been tested and produce correct results with
> the following URLAPI client programs:
>
> * the BioPERL RemoteBlast module
> * the NCBI demo script http://ncbi.nlm.nih.gov/blast/docs/web_blast.pl
> * various scripts used in-house at NCBI
>
> Users of URLAPI should be aware of the following minor
> changes. In the new interface:
>
> 1. The Request ID (RID) format will be shorter.  The new format
>     is 11 alphanumeric characters (e.g. RDEFEA5012) and will have no
>     internal structure. The previous RID format was 36 or more
>     characters long, including punctuation (e.g.,
>     1175172712-21345-42512597310.BLASTQ3).
>
> 2. BLAST reports will show masked regions as lower-case letters
>     by default (see
>     http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W6,
>     figure 2. The current default behavior is to show masked
>     regions as N's or X's. Users may recover the current behavior
>     by adding &MASK_CHAR=0 to the query string for a URLAPI
>     request.
>
> 3. BLAST reports will show alignments for 100 database sequences
>     by default. The current reports show only 50 alignments by
>     default.
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Mon 3/5/2007 11:50 AM
> To: Mcginnis, Scott (NIH/NLM/NCBI) [E]
> Subject: BLAST beta, URLAPI, and BioPerl
>
> The BioPerl project has several have several modules and parsers
> which currently parse XML/text/tabular BLAST output, as well as a
> module which is capable of posting BLAST queries via the URLAPI
> interface.  Will any of the BLAST changes affect these (particularly
> URLAPI)?
>
> Thanks!
>
> chris
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From steletch at jouy.inra.fr  Tue Apr  3 08:28:39 2007
From: steletch at jouy.inra.fr (=?ISO-8859-1?Q?St=E9phane_T=E9letch=E9a?=)
Date: Tue, 03 Apr 2007 14:28:39 +0200
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
Message-ID: <46124877.4020605@jouy.inra.fr>

Alex Lancaster a ?crit :
> Hello bioperl,
> 
> I'm new to the bioperl world, having just started a research position
> in which I need to manage a large bioperl-based codebase.  To this
> end, I'm working on packaging bioperl as an official Fedora Package
> (formerly "Fedora Extras") and I'm currently wading through and
> packaging the long laundry list of Perl dependencies (then I'm going
> to try and do the same for biopython).  You can see my some of my
> progress (including links to the reviews) here:
> 
> http://fedoraproject.org/wiki/AlexLancaster
> 
> Several issues have arisen during the packaging that I hope the
>

Nice, i was on my way to do it :-)
I'm a Mandriva packager and have been kindly "spushed" for maintaining 
the bioperl package for Mandriva.

You can have a look at the work already done by Mandriva at the addresses:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl/current/
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl-run/current/

(Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).

Feel free to contact me if you need more input for dependencies, since 
they are quite a lot.

Cheers,
St?phane

-- 
St?phane T?letch?a, PhD.                  http://www.steletch.org
Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig
INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901


From cjfields at uiuc.edu  Tue Apr  3 10:58:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 3 Apr 2007 09:58:44 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <46124877.4020605@jouy.inra.fr>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
Message-ID: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>

Once these are set up we should add a page to the bioperl wiki to  
describe them in more detail (along with Allen's Biopackages).

chris

On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:

> Alex Lancaster a ?crit :
>> Hello bioperl,
>>
>> I'm new to the bioperl world, having just started a research position
>> in which I need to manage a large bioperl-based codebase.  To this
>> end, I'm working on packaging bioperl as an official Fedora Package
>> (formerly "Fedora Extras") and I'm currently wading through and
>> packaging the long laundry list of Perl dependencies (then I'm going
>> to try and do the same for biopython).  You can see my some of my
>> progress (including links to the reviews) here:
>>
>> http://fedoraproject.org/wiki/AlexLancaster
>>
>> Several issues have arisen during the packaging that I hope the
>>
>
> Nice, i was on my way to do it :-)
> I'm a Mandriva packager and have been kindly "spushed" for maintaining
> the bioperl package for Mandriva.
>
> You can have a look at the work already done by Mandriva at the  
> addresses:
> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- 
> bioperl/current/
> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- 
> bioperl-run/current/
>
> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
>
> Feel free to contact me if you need more input for dependencies, since
> they are quite a lot.
>
> Cheers,
> St?phane
>
> -- 
> St?phane T?letch?a, PhD.                  http://www.steletch.org
> Unit? Math?matique Informatique et G?nome http:// 
> migale.jouy.inra.fr/mig
> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From allenday at gmail.com  Tue Apr  3 13:54:51 2007
From: allenday at gmail.com (Allen Day)
Date: Tue, 3 Apr 2007 10:54:51 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
	<67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
Message-ID: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>

You can link Biopackages now, it's been done for nearly 2 years.

-Allen

On 4/3/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Once these are set up we should add a page to the bioperl wiki to
> describe them in more detail (along with Allen's Biopackages).
>
> chris
>
> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:
>
> > Alex Lancaster a ?crit :
> >> Hello bioperl,
> >>
> >> I'm new to the bioperl world, having just started a research position
> >> in which I need to manage a large bioperl-based codebase.  To this
> >> end, I'm working on packaging bioperl as an official Fedora Package
> >> (formerly "Fedora Extras") and I'm currently wading through and
> >> packaging the long laundry list of Perl dependencies (then I'm going
> >> to try and do the same for biopython).  You can see my some of my
> >> progress (including links to the reviews) here:
> >>
> >> http://fedoraproject.org/wiki/AlexLancaster
> >>
> >> Several issues have arisen during the packaging that I hope the
> >>
> >
> > Nice, i was on my way to do it :-)
> > I'm a Mandriva packager and have been kindly "spushed" for maintaining
> > the bioperl package for Mandriva.
> >
> > You can have a look at the work already done by Mandriva at the
> > addresses:
> > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
> > bioperl/current/
> > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
> > bioperl-run/current/
> >
> > (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
> >
> > Feel free to contact me if you need more input for dependencies, since
> > they are quite a lot.
> >
> > Cheers,
> > St?phane
> >
> > --
> > St?phane T?letch?a, PhD.                  http://www.steletch.org
> > Unit? Math?matique Informatique et G?nome http://
> > migale.jouy.inra.fr/mig
> > INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> > 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Tue Apr  3 14:11:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 3 Apr 2007 13:11:19 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
	<67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
	<5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>
Message-ID: <0802E2EB-5E94-42D2-9CE1-B82DC103A5D1@uiuc.edu>

I added a small piece on Biopackages to the wiki installation page:

http://www.bioperl.org/wiki/Installing_BioPerl

We can move links to RPM (or similar) installations to their own page  
or section in the INSTALL docs when we have time.

chris

On Apr 3, 2007, at 12:54 PM, Allen Day wrote:

> You can link Biopackages now, it's been done for nearly 2 years.
>
> -Allen
>
> On 4/3/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> Once these are set up we should add a page to the bioperl wiki to
>> describe them in more detail (along with Allen's Biopackages).
>>
>> chris
>>
>> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:
>>
>>> Alex Lancaster a ?crit :
>>>> Hello bioperl,
>>>>
>>>> I'm new to the bioperl world, having just started a research  
>>>> position
>>>> in which I need to manage a large bioperl-based codebase.  To this
>>>> end, I'm working on packaging bioperl as an official Fedora Package
>>>> (formerly "Fedora Extras") and I'm currently wading through and
>>>> packaging the long laundry list of Perl dependencies (then I'm  
>>>> going
>>>> to try and do the same for biopython).  You can see my some of my
>>>> progress (including links to the reviews) here:
>>>>
>>>> http://fedoraproject.org/wiki/AlexLancaster
>>>>
>>>> Several issues have arisen during the packaging that I hope the
>>>>
>>>
>>> Nice, i was on my way to do it :-)
>>> I'm a Mandriva packager and have been kindly "spushed" for  
>>> maintaining
>>> the bioperl package for Mandriva.
>>>
>>> You can have a look at the work already done by Mandriva at the
>>> addresses:
>>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
>>> bioperl/current/
>>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
>>> bioperl-run/current/
>>>
>>> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
>>>
>>> Feel free to contact me if you need more input for dependencies,  
>>> since
>>> they are quite a lot.
>>>
>>> Cheers,
>>> St?phane
>>>
>>> --
>>> St?phane T?letch?a, PhD.                  http://www.steletch.org
>>> Unit? Math?matique Informatique et G?nome http://
>>> migale.jouy.inra.fr/mig
>>> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
>>> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Tue Apr  3 18:18:56 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 03 Apr 2007 23:18:56 +0100
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>	<1175258897.2668.21.camel@localhost.localdomain>	<6d648ierkz.fsf@delpy.biol.berkeley.edu>	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
Message-ID: <4612D2D0.7030202@sendu.me.uk>

Chris Fields wrote:
> On Mar 30, 2007, at 11:02 PM, Allen Day wrote:
> 
>> The majority of the Bioperl classes are file parsers, or manipulate
>> data that comes from the file parsers.  Yes there are exceptions like
>> the Eutils and Ensembl-intefacing classes, but they are the minority.
>> The types of files that are worked with are generally either A)
>> primary data sets such as genome data, or B) derivative data, such as
>> sequence alignments that are derived from primary data using an
>> algorithm.
>>
>> If we're in agreement that the primary data sets and
>> libraries/applications for producing derivative data should not be
>> present in Fedora Extras, then it follows that the Bioperl classes for
>> manipulating these primary and derivative data  should also not be
>> present in Fedora Extras as they are of little use without data to
>> manipulate.
>
> I respectfully disagree.

Likewise, but in a slightly different way: for myself and surely many 
others the primary data used either isn't publicly released or isn't in 
some major database and therefore won't be in any kind of repository. 
That doesn't mean I wouldn't want the parser for my files to be 
somewhere convenient.


From bix at sendu.me.uk  Tue Apr  3 18:09:27 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 03 Apr 2007 23:09:27 +0100
Subject: [Bioperl-l] installation bioperl
In-Reply-To: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>
References: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>
Message-ID: <4612D097.9060400@sendu.me.uk>

> On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote:
> 
>> thank you for answer. I will give you the maximum of informations  
>> inorder to be able to diagnostic the problem:
>>
>> i use the linux mandriva 2006
>> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained  
>> from the url:
>> http://www.bioperl.org/wiki/Release_1.5.2
>> afetr that i made these commands which i found in the url
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph  
>> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ')
[snip]
>> i'm trying to install bioperl whithout having internet connection  
>> beacause i don't know whay linux didn't detect my ethernet card.
>> please tell me what should i do.
>> tahnk you for your collaboration.

David's suggestion was a good one, but quite a lot (and possibly all you 
need) of BioPerl is usable just with the bioperl-1.5.2_102.tar.gz file 
you already have.

Just follow the 'hard way' instructions:
http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_MODULES_THE_HARD_WAY

Actually, its not that hard. Just extract the files from the .tat.gz and 
  have your perl lib point at the resulting Bio directory.


From t.r-a_ckright1 at tiscali.co.uk  Wed Apr  4 08:00:12 2007
From: t.r-a_ckright1 at tiscali.co.uk (Michael Pain)
Date: Wed, 4 Apr 2007 13:00:12 +0100
Subject: [Bioperl-l]  Re: read it immediately
Message-ID: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>

I have received three dics but i can not access the files as no ID or pasword was included in the package,I have paid for all this! Can you sort it out.

Regards Michael Pain


From thiago.venancio at gmail.com  Wed Apr  4 14:14:04 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Wed, 4 Apr 2007 15:14:04 -0300
Subject: [Bioperl-l] read it immediately
In-Reply-To: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>
References: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>
Message-ID: <44255ea80704041114pc284522tef2d3a3944763b90@mail.gmail.com>

I think you emailed the wrong list...

On 4/4/07, Michael Pain <t.r-a_ckright1 at tiscali.co.uk> wrote:
>
> I have received three dics but i can not access the files as no ID or
> pasword was included in the package,I have paid for all this! Can you sort
> it out.
>
> Regards Michael Pain
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gdorjee at hotmail.com  Wed Apr  4 14:17:57 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 11:17:57 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
Message-ID: <9842643.post@talk.nabble.com>


hi all,
can anyone plz help me out with this problem that i've been dealing with for
quite a while now. following is a part of my script that's not working for
some reason. it is suppose to get the sequence from 'result/fasta.faa' and
do the blast.

###my script ###########
......
my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/database/nr',
                                                 _READMETHOD => 'Blast'
                                                  );
$factory->outfile("result/out.blast");
my $blastreport = $factory->blastall($queryin);
.....

when i paste the protein sequence into the textarea of my html page and save
the same as 'result/fasta.faa', so that the above script would do the blast,
i get the following error: 

Software error:
------------- EXCEPTION  -------------
MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
STACK Bio::Tools::Run::StandAloneBlast::blastpgp
/usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
--------------------------------------
i would appreciate your help.
i would also like to add that the 'result/fasta.faa' has the sequence saved
in it.

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9842643
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gowthaman.ramasamy at sbri.org  Wed Apr  4 14:57:09 2007
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Wed, 4 Apr 2007 11:57:09 -0700
Subject: [Bioperl-l] How to patch something in installed bioperl module
Message-ID: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>


Hi List,
I am advised to patch (comment out some lines and add some) GFF.pm bioperl module.
How do i go about it?.
I have the latest Bioperl 1.5.2 version installed....via CPAN

I find GFF.pm in the following location...
/root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm


Do i have to recompile it after editing........
I am completely clue less......I have not done this earlier.....
Can any one help me to do this.

Many thanks in advance........

gowthaman


From dmessina at wustl.edu  Wed Apr  4 15:42:43 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 4 Apr 2007 14:42:43 -0500
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>

The code snippet worked fine for me. I believe the problem is that  
'result/fasta.faa' is not getting passed to your code properly. You  
might try specifying a complete path to your input and output file --  
relative paths, especially through a web app, can be tricky.

> when i paste the protein sequence into the textarea of my html page  
> and save
> the same as 'result/fasta.faa', so that the above script would do  
> the blast,

I'm not sure from what you wrote -- did you try running your script  
on the command line (having created 'result/fasta.faa' manually  
first)? If that is working for you, then the problem is with getting  
the data from the webpage into the script, not with the blasting part.

Dave

This is what I did:

  % ls test.pl testp*
test.pl       testp.fa

% formatdb -i testp.fa

% ls test.pl testp*
test.pl       testp.fa      testp.fa.phr  testp.fa.pin  testp.fa.psq

% perl test.pl testp.fa
%  head -10 out.blast
BLASTP 2.2.10 [Oct-19-2004]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens]
          (235 letters)


Your code: I changed only the input filename and the input database  
name, and saved the script as test.pl
-----------------------
#!/usr/bin/perl

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' =>
'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
'blastp',
                                                  'database' =>
'testp.fa',
                                                  _READMETHOD => 'Blast'
                                                   );
$factory->outfile("out.blast");
my $blastreport = $factory->blastall($queryin);
------------------------------------------------------------------------ 
-----------


From gdorjee at hotmail.com  Wed Apr  4 17:44:27 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 14:44:27 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>
References: <9842643.post@talk.nabble.com>
	<35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>
Message-ID: <9846257.post@talk.nabble.com>


Thanks for your reply Dave. I don't think that there's anything wrong with
the open(OUTPUT,">result/fasta.faa"); line as I could get the 'fasta.faa'
file with the sequence in it. I see it. It looks like the blast is not being
able to read from the result/fasta.faa. 
^ ^* 


Dave Messina-2 wrote:
> 
> The code snippet worked fine for me. I believe the problem is that  
> 'result/fasta.faa' is not getting passed to your code properly. You  
> might try specifying a complete path to your input and output file --  
> relative paths, especially through a web app, can be tricky.
> 
>> when i paste the protein sequence into the textarea of my html page  
>> and save
>> the same as 'result/fasta.faa', so that the above script would do  
>> the blast,
> 
> I'm not sure from what you wrote -- did you try running your script  
> on the command line (having created 'result/fasta.faa' manually  
> first)? If that is working for you, then the problem is with getting  
> the data from the webpage into the script, not with the blasting part.
> 
> Dave
> 
> This is what I did:
> 
>   % ls test.pl testp*
> test.pl       testp.fa
> 
> % formatdb -i testp.fa
> 
> % ls test.pl testp*
> test.pl       testp.fa      testp.fa.phr  testp.fa.pin  testp.fa.psq
> 
> % perl test.pl testp.fa
> %  head -10 out.blast
> BLASTP 2.2.10 [Oct-19-2004]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens]
>           (235 letters)
> 
> 
> Your code: I changed only the input filename and the input database  
> name, and saved the script as test.pl
> -----------------------
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> use Bio::SeqIO;
> use Bio::Tools::Run::StandAloneBlast;
> 
> my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' =>
> 'Fasta');
> my $queryin = $Seq_in->next_seq();
> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
> 'blastp',
>                                                   'database' =>
> 'testp.fa',
>                                                   _READMETHOD => 'Blast'
>                                                    );
> $factory->outfile("out.blast");
> my $blastreport = $factory->blastall($queryin);
> ------------------------------------------------------------------------ 
> -----------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9846257
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 20:17:10 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 10:17:10 +1000
Subject: [Bioperl-l] How to patch something in installed bioperl module
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>
Message-ID: <a79f6a4b0704041717q160be28eu472d32d3cd704eba@mail.gmail.com>

> I am advised to patch (comment out some lines and add some) GFF.pm bioperl module.
> How do i go about it?.

First, make a backup of the original file.
Then just edit the original (add/remove lines).

> I have the latest Bioperl 1.5.2 version installed....via CPAN
> I find GFF.pm in the following location...
> /root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm

This is not where it is installed. That is where the CPAN program
uncompressed it to before installing. It is more likely in a directory
like this:
/usr/lib/perl5/site_perl/5.8.5/Bio/Tools/GFF.pm
But it depends on how your Perl setup arranges things!

> Do i have to recompile it after editing........

No.

--Torsten


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 20:22:37 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 10:22:37 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>

> Software error:
> ------------- EXCEPTION  -------------
> MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50

> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => 'Fasta');

Does this still happen if you give the full path to the FASTA file?
eg. -file => /usr/local/apache2/htdocs/result/fasta.faa
(I'm guessing what the full path is here)

--Torsten


From gilbertd at cricket.bio.indiana.edu  Wed Apr  4 20:59:23 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Wed, 4 Apr 2007 19:59:23 -0500 (EST)
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
Message-ID: <200704050059.l350xNF07452@cricket.bio.indiana.edu>


Dear Bioperl list,

There is a small bug in what I think is the current Bio::Tools::GFF.pm,
that blocks output of Target attributes (in gff3 at least).  See a patch
here

http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF

-- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 21:34:17 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 11:34:17 +1000
Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports
Message-ID: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>

Dear all,

I have been migrating all our BLAST infrastructure to use the XML
output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
before, and encountered some issues I hope you can help me with:

1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
Bio::Search::Result::GenericResult object. This means I can not use
the PSI-BLAST functions like iterations() and psiblast() provided by
Bio::Search::Result::BlastResult. I'm guessing this is because the the
XML output reports itself as a plain BLASTP output:
<BlastOutput_program>blastp</BlastOutput_program>

How do I determine if it is a PSI-BLAST report?

2. Usually a PSI-BLAST report has multiple Iterations. The XML output
has <Iteration> tags but it took me a while to figure out that these
get mapped to Bio::SearchIO::Result objects accessible via
Bio::SearchIO->next_result().

Is this the proper way to process the iterations?

3. I also notice that only the first result (iteration) has the
query_name set. Subsequent ones are empty:
RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
query=MyProtein , db=uniprot_sprot
RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
, db=uniprot_sprot

Is this a bug or expected?

I'm guessing a lot of these problems are simply due to limitations of
the NCBI BLAST XML DTD?

--Torsten


From gdorjee at hotmail.com  Wed Apr  4 20:59:08 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 17:59:08 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>
Message-ID: <9848412.post@talk.nabble.com>


hi Torsten,
Yes, it still gives me the same error even if I give the full path to the
fasta file. Following is how I did: 

####### part of my script #######
my $Seq_in = Bio::SeqIO->new (-file =>
'/export/home/local/apache2/htdocs/result/fasta.faa', -format => 'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/dorjee/database/nrpart',
                                                 _READMETHOD => 'Blast'
                                                   );
$factory->outfile("/export/home/local/apache2/htdocs/result/out.blast");
my $blastreport = $factory->blastall($queryin);
....

thanks man.


Torsten Seemann wrote:
> 
>> Software error:
>> ------------- EXCEPTION  -------------
>> MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
>> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
> 
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
>> 'Fasta');
> 
> Does this still happen if you give the full path to the FASTA file?
> eg. -file => /usr/local/apache2/htdocs/result/fasta.faa
> (I'm guessing what the full path is here)
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9848412
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 22:57:09 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 12:57:09 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>

DeeGee,

Please add the following lines to help deduce the problem:

> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
> 'Fasta');

die "could not open fasta" if not defined $Seq_in;

> my $queryin = $Seq_in->next_seq();

die "could not get seq" if not defined $queryin;

Does anything happen now?

...

Some other comments:

> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
> STACK Bio::Tools::Run::StandAloneBlast::blastpgp

I'm not sure why it is in the blastpgp() method when you chose
$factory->blastall() ?

>                                                  _READMETHOD => 'Blast'

I don't think this is required anymore in modern Bioperl. Are you
using 1.5.x or bioperl-live ?

> when i paste the protein sequence into the textarea of my html page and
> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50

So this is a CGI script?
Does the script run as user 'apache' or 'httpd', or as yourself via SuEXEC?
Does 'apache' have permissions to READ/WRITE the result/ directory?

--Torsten


From cjfields at uiuc.edu  Thu Apr  5 00:14:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Apr 2007 23:14:46 -0500
Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports
In-Reply-To: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>
References: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>
Message-ID: <8EA4D933-9B99-485E-9CEA-AB39297F90B4@uiuc.edu>

On Apr 4, 2007, at 8:34 PM, Torsten Seemann wrote:

> Dear all,
>
> I have been migrating all our BLAST infrastructure to use the XML
> output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
> in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
> before, and encountered some issues I hope you can help me with:
>
> 1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
> Bio::Search::Result::GenericResult object. This means I can not use
> the PSI-BLAST functions like iterations() and psiblast() provided by
> Bio::Search::Result::BlastResult. I'm guessing this is because the the
> XML output reports itself as a plain BLASTP output:
> <BlastOutput_program>blastp</BlastOutput_program>
>
> How do I determine if it is a PSI-BLAST report?

I don't know if you can very easily, though I haven't tried myself.   
If I remember correctly there wasn't a substantial difference in the  
XML output between regular BLAST XML and PSI-BLAST XML.  We could add  
a parameter to the parser to treat the report as PSI-BLAST.

> 2. Usually a PSI-BLAST report has multiple Iterations. The XML output
> has <Iteration> tags but it took me a while to figure out that these
> get mapped to Bio::SearchIO::Result objects accessible via
> Bio::SearchIO->next_result().
>
> Is this the proper way to process the iterations?

The problem is in the way that NCBI now outputs multiple-query BLAST  
XML reports, which apparently changed sometime in the last year w/o  
notice.  This was also a problem with other Bio* parsers (I remember  
seeing something about it on the BioPython list).  Previously  
multiquery BLAST requests were output like single XML reports  
concatenated together, each with their own XML declaration, etc.  Now  
they are treated like iterations (query 1 = iteration 1, query 2 =  
iteration 2, etc) all in one long BLAST report.  There's an example  
of one in the SearchIO tests which I added to CVS in Jan-Feb,  
post-1.5.2.  The current parser handles both old and new cases.

The current behavior of the parser is to parse everything up front,  
building up the ResultI's and then returning them one-by-one upon  
next_result(), which is horrible on memory if you have tons of XML to  
wade through.  I will probably change that to carve the data up into  
report-sized chunks of XML and parse them piecemeal, but I haven't  
had time to work on it yet.

> 3. I also notice that only the first result (iteration) has the
> query_name set. Subsequent ones are empty:
> RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
> query=MyProtein , db=uniprot_sprot
> RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
> , db=uniprot_sprot
>
> Is this a bug or expected?

If you are using 1.5.2 then there is a bug related to that which was  
fixed in CVS a few months back (related to the multiquery issue  
above).  If it isn't let me know.

> I'm guessing a lot of these problems are simply due to limitations of
> the NCBI BLAST XML DTD?
>
> --Torsten

To tell the truth I'm not sure.  One would think they could add some  
designation to the report for PSI-BLAST!

chris


From cjfields at uiuc.edu  Thu Apr  5 13:40:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 12:40:41 -0500
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
Message-ID: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>

Roy Chaudhuri has raised an interesting question in a bug report  
filed regarding 'bless'-ing objects into another (similar) class.   
The bug report on this is here:

http://bugzilla.open-bio.org/show_bug.cgi?id=2262

The following code (from the bug report) illustrates the problem.   
Note some of this is taken from the Bio::Seq::Meta::Array POD, though  
the example sequence object is a LocatableSeq (PrimarySeqI) and not a  
SeqI:

use Bio::SeqIO;
use Bio::Seq::Meta::Array;
# $seq isa Bio::SeqI
my $seq=Bio::SeqIO->new(-fh=>\*ARGV, -format=>'genbank')->next_seq;
# $seq is still a Bio::SeqI
bless $seq, 'Bio::Seq::Meta::Array';
Bio::SeqIO->new(-format=>'genbank')->write_seq($seq);

This produces sequence output missing sequence data, a definition,  
and other odds and ends.  $seq is first a Bio::Seq::RichSeq and is  
blessed into a Bio::Seq::Meta::Array; both times $seq remains  
Bio::SeqI.  However, Bio::Seq::Meta::Array has an odd inheritance  
tree which also makes it a Bio::PrimarySeqI and a Bio::Seq::MetaI (ick):

use base qw(Bio::LocatableSeq Bio::Seq Bio::Seq::MetaI);

Bio::LocatableSeq has a seq() method inherited from Bio::PrimarySeq,  
for instance, so using $seq->seq() invokes Bio::PrimarySeq::seq()  
instead of Bio::Seq::seq().  No problem in most cases as long as  
PrimarySeqI is blessed into another PrimarySeqI, but if one blesses a  
Bio::SeqI into a Bio::Seq::Meta::Array (as in the example) then  
PrimarySeq::seq() expects a raw sequence and gets none (since the  
data is stored internally as a PrimarySeq in a different location)  
and no sequence is output.  This happens similarly for other stored  
object data.

I'm not sure why Bio::Seq::Meta::Array is set up this way.  Do we  
want to support using 'bless $obj, Class' with Bio::SeqI/PrimarySeqI,  
or should Bio::Seq::Meta::Array be changed so that it follows one  
interface or the other?

chris


From hlapp at gmx.net  Thu Apr  5 14:27:39 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 5 Apr 2007 14:27:39 -0400
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
In-Reply-To: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
Message-ID: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>


On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:

> Do we want to support using 'bless $obj, Class'

This smacks of over-clever programming and looks like a sure way to  
obfuscate what you're doing. I'm not sure why we need to support this  
construct.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Apr  5 14:44:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 13:44:38 -0500
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
In-Reply-To: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
	<421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
Message-ID: <F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>

I tend to agree on that front as it seems too prone to subtle issues  
with inheritance (as the bug demonstrates).

Related to that, do we want to have Bio::Seq::Meta::Array implement  
either PrimarySeqI or SeqI?  Having it implement both is definitely  
not working as expected.

chris

On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote:

>
> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:
>
>> Do we want to support using 'bless $obj, Class'
>
> This smacks of over-clever programming and looks like a sure way to  
> obfuscate what you're doing. I'm not sure why we need to support  
> this construct.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mkiwala at watson.wustl.edu  Thu Apr  5 15:11:22 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Thu, 05 Apr 2007 14:11:22 -0500
Subject: [Bioperl-l] Mixed bless-ings with
	Bio::Seq/Bio::PrimarySeq	(Bio::Seq::Meta::Array)
In-Reply-To: <F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>	<421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
	<F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>
Message-ID: <461549DA.90709@watson.wustl.edu>

My vote is for SeqI.

I was using the SeqWithQuality class and more recently switched over to 
Bio::Seq::Quality as we are upgrading from 1.4 to 1.5.2. The sequences 
I'm working with are destined for GenBank and have features and quality 
values. I've written a module (that I call GenBank::Tbl2Asn) that 
accepts a Bio::Seq::Quality with features and runs tbl2asn on it to 
produce a file that we send to GenBank. I don't know of any other class 
that suites my needs better than Bio::Seq::Quality inheriting from 
Bio::SeqI.


Chris Fields wrote:
> I tend to agree on that front as it seems too prone to subtle issues  
> with inheritance (as the bug demonstrates).
>
> Related to that, do we want to have Bio::Seq::Meta::Array implement  
> either PrimarySeqI or SeqI?  Having it implement both is definitely  
> not working as expected.
>
> chris
>
> On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote:
>
>   
>> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:
>>
>>     
>>> Do we want to support using 'bless $obj, Class'
>>>       
>> This smacks of over-clever programming and looks like a sure way to  
>> obfuscate what you're doing. I'm not sure why we need to support  
>> this construct.
>>
>> 	-hilmar
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From gdorjee at hotmail.com  Thu Apr  5 17:09:14 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Thu, 5 Apr 2007 14:09:14 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
Message-ID: <9864004.post@talk.nabble.com>


Thanks again, Torsten. I tried (die "could not get seq" if not defined
$queryin;) as you suggested, and now I get the following error message:

Software error:
could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.

Does this mean that next_seq() method in 'my $queryin =
$Seq_in->next_seq();' has some problem? How can I fix it? I would appreciate
your help.
Cheers!


Torsten Seemann wrote:
> 
> DeeGee,
> 
> Please add the following lines to help deduce the problem:
> 
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
>> 'Fasta');
> 
> die "could not open fasta" if not defined $Seq_in;
> 
>> my $queryin = $Seq_in->next_seq();
> 
> die "could not get seq" if not defined $queryin;
> 
> Does anything happen now?
> 
> ...
> 
> Some other comments:
> 
>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>
>> 'blastp',
>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
> 
> I'm not sure why it is in the blastpgp() method when you chose
> $factory->blastall() ?
> 
>>                                                  _READMETHOD => 'Blast'
> 
> I don't think this is required anymore in modern Bioperl. Are you
> using 1.5.x or bioperl-live ?
> 
>> when i paste the protein sequence into the textarea of my html page and
>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
> 
> So this is a CGI script?
> Does the script run as user 'apache' or 'httpd', or as yourself via
> SuEXEC?
> Does 'apache' have permissions to READ/WRITE the result/ directory?
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9864004
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Thu Apr  5 19:32:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 18:32:55 -0500
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9864004.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
Message-ID: <3ED7F1E9-FE21-4796-99AC-0CD0EA418563@uiuc.edu>


On Apr 5, 2007, at 4:09 PM, DeeGee wrote:

>
> Thanks again, Torsten. I tried (die "could not get seq" if not defined
> $queryin;) as you suggested, and now I get the following error  
> message:
>
> Software error:
> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
>
> Does this mean that next_seq() method in 'my $queryin =
> $Seq_in->next_seq();' has some problem? How can I fix it? I would  
> appreciate
> your help.
> Cheers!

This indicates there is likely some problem with your sequence file  
(either it isn't fasta or something else is wrong), but w/o actually  
seeing it we can't be sure.  I can't be sure but I don't think it is  
a next_seq() issue.  Also, if there are problems accessing the file  
the stream object should throw an error so I don't think it is that  
either...

chris

>
> Torsten Seemann wrote:
>>
>> DeeGee,
>>
>> Please add the following lines to help deduce the problem:
>>
>>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '- 
>>> format' =>
>>> 'Fasta');
>>
>> die "could not open fasta" if not defined $Seq_in;
>>
>>> my $queryin = $Seq_in->next_seq();
>>
>> die "could not get seq" if not defined $queryin;
>>
>> Does anything happen now?
>>
>> ...
>>
>> Some other comments:
>>
>>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>
>>> 'blastp',
>>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
>>
>> I'm not sure why it is in the blastpgp() method when you chose
>> $factory->blastall() ?
>>
>>>                                                  _READMETHOD =>  
>>> 'Blast'
>>
>> I don't think this is required anymore in modern Bioperl. Are you
>> using 1.5.x or bioperl-live ?
>>
>>> when i paste the protein sequence into the textarea of my html  
>>> page and
>>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
>>
>> So this is a CGI script?
>> Does the script run as user 'apache' or 'httpd', or as yourself via
>> SuEXEC?
>> Does 'apache' have permissions to READ/WRITE the result/ directory?
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9864004
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From torsten.seemann at infotech.monash.edu.au  Thu Apr  5 20:40:32 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 6 Apr 2007 10:40:32 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9864004.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
Message-ID: <a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>

Dorjee,

> thanks alot for your reply again. as per your suggestion (using 'die "could
> not get seq" if not defined $queryin;'), i now get the following error
> message:
> Software error:
> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
> i've attached the script. could you plz have a look at it and see where am i
> going wrong.
> cheers mate!

This strongly suggests that your FASTA file is not actually in FASTA format.
http://en.wikipedia.org/wiki/Fasta_format

Does it work if you pass it to blastall on the command line?
eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr

> Saier Lab.
> 858-534-2457

Are you working at UCSD?

--Torsten


From gdorjee at hotmail.com  Thu Apr  5 23:26:16 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Thu, 5 Apr 2007 20:26:16 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
Message-ID: <9867402.post@talk.nabble.com>


hi Torsten,  
blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works
perfectly fine on the command line, and the 'fasta.faa' is in fasta format:

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

it seems like i'm just one bloody step away from success. ^ ^* can't figure
out the prob. 
thanks for your help.


Torsten Seemann wrote:
> 
> Dorjee,
> 
>> thanks alot for your reply again. as per your suggestion (using 'die
>> "could
>> not get seq" if not defined $queryin;'), i now get the following error
>> message:
>> Software error:
>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
>> i've attached the script. could you plz have a look at it and see where
>> am i
>> going wrong.
>> cheers mate!
> 
> This strongly suggests that your FASTA file is not actually in FASTA
> format.
> http://en.wikipedia.org/wiki/Fasta_format
> 
> Does it work if you pass it to blastall on the command line?
> eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
> 
>> Saier Lab.
>> 858-534-2457
> 
> Are you working at UCSD?
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9867402
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From tuco at pasteur.fr  Fri Apr  6 09:33:08 2007
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Fri, 06 Apr 2007 15:33:08 +0200
Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior
Message-ID: <46164C14.8040701@pasteur.fr>

Hi folks,

I have a strange behavior from Bio::SeqIO::embl.
When I read an EMBL file as an input and write to another one, the tags
in the output file (EMBL format) are not in the same order as the original
file.
Is it a normal and expecting result ?

I anyone want to test it as a perl on line here is the code :

perl -MBio::SeqIO -e '$i = Bio::SeqIO->new(-file => "file.embl", -format 
=> "EMBL"); $o = Bio::SeqIO->new(-file => ">new.embl", -format => 
"EMBL"); while($e = $i->next_seq()){ $o->write_seq($e);  }'

I checked in the embl.pm code but was enable to find where this behavior 
came from.

If someone has the solution or any clue.

Thanks

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Softwares and data banks
Pasteur Insititue
tuco at_ pasteur dot fr	
-------------------------


From dmessina at wustl.edu  Fri Apr  6 11:09:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Fri, 6 Apr 2007 10:09:51 -0500
Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior
In-Reply-To: <46164C14.8040701@pasteur.fr>
References: <46164C14.8040701@pasteur.fr>
Message-ID: <7C67D287-DE2A-488A-8636-01EFF468368D@wustl.edu>

> Is it a normal and expecting result ?

Yes, unfortunately. Due to the complexity of the parsing, it is  
surprisingly difficult to "round-trip" some sequence file formats.

http://www.bioperl.org/wiki/HOWTO:SeqIO#Caveats


Dave


From jason at bioperl.org  Fri Apr  6 11:42:41 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 6 Apr 2007 08:42:41 -0700
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9867402.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
Message-ID: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>

When/How are are you writing your sequences to this file result.faa?   
are you using seqIO or bioperl to write the sequence  to a file?
I'm wondering if this is I/O buffering problem.

On Apr 5, 2007, at 8:26 PM, DeeGee wrote:

>
> hi Torsten,
> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr  
> works
> perfectly fine on the command line, and the 'fasta.faa' is in fasta  
> format:
>
>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV 
> SPSMTVASSQ
> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA 
> GTAPGAEGPA
> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF 
> RRKEHLRRHR
> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH 
> QRIHGRAAAS
> AQGAVAPGPDGGGPFPPWPLG
>
> it seems like i'm just one bloody step away from success. ^ ^*  
> can't figure
> out the prob.
> thanks for your help.
>
>
> Torsten Seemann wrote:
>>
>> Dorjee,
>>
>>> thanks alot for your reply again. as per your suggestion (using 'die
>>> "could
>>> not get seq" if not defined $queryin;'), i now get the following  
>>> error
>>> message:
>>> Software error:
>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl  
>>> line 50.
>>> i've attached the script. could you plz have a look at it and see  
>>> where
>>> am i
>>> going wrong.
>>> cheers mate!
>>
>> This strongly suggests that your FASTA file is not actually in FASTA
>> format.
>> http://en.wikipedia.org/wiki/Fasta_format
>>
>> Does it work if you pass it to blastall on the command line?
>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ 
>> database/nr
>>
>>> Saier Lab.
>>> 858-534-2457
>>
>> Are you working at UCSD?
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9867402
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0002.bin>

From bernd.web at gmail.com  Fri Apr  6 14:00:18 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 6 Apr 2007 20:00:18 +0200
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
Message-ID: <716af09c0704061100n1555915bw18050639d25cbf89@mail.gmail.com>

Hi Dorjee,

Do you now use complete file paths everywhere (instead of some
relative paths that were in your script).  Did you check all read and
execute permission (turn r, x on for group and others)? And regarding
the fasta file I'd suggest closing the filehandle after you printed
the fasta sequence to the file.

open(OUTPUT,">result/fasta.faa"); #don't use this relative path and
use the "die" as was suggested earlier.
.... your other code lines
print OUTPUT
"$desc\n$seqo\n";
close(OUTPUT); #close the file.

Also check if your complete script runs from the command-line as to be
sure your problems are not related to the webserver enviroment.


BTW I do think you do not want to parse your fasta file like you do:
if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;}
$fasta_file=~s/[\n\r]//g;
if ($fasta_file =~ /([A-Z]{10}.+)/){$seqo=$1;}

$seqo will contain the description as well, so your sequence starts
with the description.
BioPerl provides code for fasta file parsing too ;-) If you really
want to stick to your code you can catch the $desc and $seqo in one
RegExp, or replace this line:
if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;}
with
if ($fasta_file =~ s/^(\>.+)\s+//){$desc=$1;}


I hope you will get your script working now.

Regards,
Bernd

On 4/6/07, Jason Stajich <jason at bioperl.org> wrote:
> When/How are are you writing your sequences to this file result.faa?  are
> you using seqIO or bioperl to write the sequence  to a file?
> I'm wondering if this is I/O buffering problem.
>
>
>
> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>
>
> hi Torsten,
> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works
> perfectly fine on the command line, and the 'fasta.faa' is in fasta format:
>
>
> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
> AQGAVAPGPDGGGPFPPWPLG
>
> it seems like i'm just one bloody step away from success. ^ ^* can't figure
> out the prob.
> thanks for your help.
>
>
> Torsten Seemann wrote:
>
> Dorjee,
>
>
> thanks alot for your reply again. as per your suggestion (using 'die
> "could
> not get seq" if not defined $queryin;'), i now get the following error
> message:
> Software error:
> could not get seq at
> /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
> i've attached the script. could you plz have a look at it and see where
> am i
> going wrong.
> cheers mate!
>
> This strongly suggests that your FASTA file is not actually in FASTA
> format.
> http://en.wikipedia.org/wiki/Fasta_format
>
> Does it work if you pass it to blastall on the command line?
> eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>
>
> Saier Lab.
> 858-534-2457
>
> Are you working at UCSD?
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/blastall-problem-tf3527412.html#a9867402
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.htmlhttp://fungalgenomes.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From gdorjee at hotmail.com  Fri Apr  6 13:39:38 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Fri, 6 Apr 2007 10:39:38 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
Message-ID: <9875685.post@talk.nabble.com>


Following is the part of my script, which is in the 'htdocs' directory:

####### part of my script #############
#generate a new CGI object from the input to the CGI script
my $query=new CGI;

open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");

print STDOUT $query->header();
print STDOUT $query->start_html(-title=>"Response from blast",
-BGCOLOR=>"#FFFFFF");
print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";

#gets the sequence from the html textarea with ?post? method
my $fasta_file=$query->param('sequence');
print OUTPUT $fasta_file;

#Local blast of the input sequence against nr database
my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
'Fasta');
die "could not open fasta" if not defined $Seq_in;
my $queryin = $Seq_in->next_seq();
die "could not get seq" if not defined $queryin;
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/dorjee/database/nr',
                                                 _READMETHOD => 'Blast'
                                                   );
$factory->outfile("result/out.blast");
my $blastreport = $factory->blastall($queryin);
.....

Thank you.


Jason Stajich-3 wrote:
> 
> When/How are are you writing your sequences to this file result.faa?   
> are you using seqIO or bioperl to write the sequence  to a file?
> I'm wondering if this is I/O buffering problem.
> 
> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
> 
>>
>> hi Torsten,
>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr  
>> works
>> perfectly fine on the command line, and the 'fasta.faa' is in fasta  
>> format:
>>
>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV 
>> SPSMTVASSQ
>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA 
>> GTAPGAEGPA
>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF 
>> RRKEHLRRHR
>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH 
>> QRIHGRAAAS
>> AQGAVAPGPDGGGPFPPWPLG
>>
>> it seems like i'm just one bloody step away from success. ^ ^*  
>> can't figure
>> out the prob.
>> thanks for your help.
>>
>>
>> Torsten Seemann wrote:
>>>
>>> Dorjee,
>>>
>>>> thanks alot for your reply again. as per your suggestion (using 'die
>>>> "could
>>>> not get seq" if not defined $queryin;'), i now get the following  
>>>> error
>>>> message:
>>>> Software error:
>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl  
>>>> line 50.
>>>> i've attached the script. could you plz have a look at it and see  
>>>> where
>>>> am i
>>>> going wrong.
>>>> cheers mate!
>>>
>>> This strongly suggests that your FASTA file is not actually in FASTA
>>> format.
>>> http://en.wikipedia.org/wiki/Fasta_format
>>>
>>> Does it work if you pass it to blastall on the command line?
>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ 
>>> database/nr
>>>
>>>> Saier Lab.
>>>> 858-534-2457
>>>
>>> Are you working at UCSD?
>>>
>>> --Torsten
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/blastall- 
>> problem-tf3527412.html#a9867402
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
> 
> 
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9875685
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Fri Apr  6 14:40:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 6 Apr 2007 11:40:42 -0700
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9875685.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
	<9875685.post@talk.nabble.com>
Message-ID: <A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>

Looks like you need to deal with buffering:

http://perl.plover.com/FAQs/Buffering.html

So you need to add this:
close(OUTPUT);

Alternatively you can build a sequence object and pass that in to the  
BLAST factory, then you don't have to mess around with creating  
temporary files or run into this sort of problem.

-jason
On Apr 6, 2007, at 10:39 AM, DeeGee wrote:

>
> Following is the part of my script, which is in the 'htdocs'  
> directory:
>
> ####### part of my script #############
> #generate a new CGI object from the input to the CGI script
> my $query=new CGI;
>
> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");
>
> print STDOUT $query->header();
> print STDOUT $query->start_html(-title=>"Response from blast",
> -BGCOLOR=>"#FFFFFF");
> print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";
>
> #gets the sequence from the html textarea with ?post? method
> my $fasta_file=$query->param('sequence');
> print OUTPUT $fasta_file;
>
close(OUTPUT);
> #Local blast of the input sequence against nr database
> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
> 'Fasta');
> die "could not open fasta" if not defined $Seq_in;
> my $queryin = $Seq_in->next_seq();
> die "could not get seq" if not defined $queryin;
> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
> 'blastp',
>                                                  'database' =>
> '/export/home/dorjee/database/nr',
>                                                  _READMETHOD =>  
> 'Blast'
>                                                    );
> $factory->outfile("result/out.blast");
> my $blastreport = $factory->blastall($queryin);
> .....
>
> Thank you.
>
>
>
> Jason Stajich-3 wrote:
>>
>> When/How are are you writing your sequences to this file result.faa?
>> are you using seqIO or bioperl to write the sequence  to a file?
>> I'm wondering if this is I/O buffering problem.
>>
>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>>
>>>
>>> hi Torsten,
>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>>> works
>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta
>>> format:
>>>
>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA 
>>> SV
>>> SPSMTVASSQ
>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP 
>>> LA
>>> GTAPGAEGPA
>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK 
>>> AF
>>> RRKEHLRRHR
>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL 
>>> RH
>>> QRIHGRAAAS
>>> AQGAVAPGPDGGGPFPPWPLG
>>>
>>> it seems like i'm just one bloody step away from success. ^ ^*
>>> can't figure
>>> out the prob.
>>> thanks for your help.
>>>
>>>
>>> Torsten Seemann wrote:
>>>>
>>>> Dorjee,
>>>>
>>>>> thanks alot for your reply again. as per your suggestion (using  
>>>>> 'die
>>>>> "could
>>>>> not get seq" if not defined $queryin;'), i now get the following
>>>>> error
>>>>> message:
>>>>> Software error:
>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl
>>>>> line 50.
>>>>> i've attached the script. could you plz have a look at it and see
>>>>> where
>>>>> am i
>>>>> going wrong.
>>>>> cheers mate!
>>>>
>>>> This strongly suggests that your FASTA file is not actually in  
>>>> FASTA
>>>> format.
>>>> http://en.wikipedia.org/wiki/Fasta_format
>>>>
>>>> Does it work if you pass it to blastall on the command line?
>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/
>>>> database/nr
>>>>
>>>>> Saier Lab.
>>>>> 858-534-2457
>>>>
>>>> Are you working at UCSD?
>>>>
>>>> --Torsten
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/blastall-
>>> problem-tf3527412.html#a9867402
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Miller Research Fellow
>> University of California, Berkeley
>> lab: 510.642.8441
>> http://pmb.berkeley.edu/~taylor/people/js.html
>> http://fungalgenomes.org/
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9875685
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment-0002.bin>

From MEC at stowers-institute.org  Fri Apr  6 16:34:37 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 6 Apr 2007 15:34:37 -0500
Subject: [Bioperl-l] Bio/DB/SeqFeature/Store/DBI/mysql.pm patched
Message-ID: <CED81D34E37D5043A1211565277A51E507E22BAF@exchkc02.stowers-institute.org>

Lincoln,

I just commited a patch to Bio/DB/SeqFeature/Store/DBI/mysql.pm which
avoids potential problem which, unless fixed, can generates warnings
that look like this:

prepare_cached(SELECT f.id,f.object
  FROM feature as f, typelist AS tl
  WHERE (   tl.id=f.typeid
   AND   (tl.tag LIKE ?)
)
  
) statement handle DBI::st=HASH(0x16f61c0) still Active at
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line
1427
DBD::mysql::st fetchrow_array failed: fetch() without execute() at
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line
1416.

... as well as other downstream abberent program behaviour.  

I encounterd what the DBI manpage suggests might happen: "The results
will certainly not be what you expect"

This can happen, for example, when you open an iterator using
Bio::DB::SeqFeature::Store->get_seq_stream, and then while iterating,
perform other queries against the store.  My understanding of the DBI
doc is that this should only occur if the 2nd iterator is for the same
sql statement identically parameterized as the 1st, but I have not
proven beyond a doubt that this is what Bio::DB::SeqFeature::Store is
doing the way I am using it.  Nonetheless, the patch fixes my pipeline.

Cheers,

Malcolm


From gdorjee at hotmail.com  Fri Apr  6 18:27:54 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Fri, 6 Apr 2007 15:27:54 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
	<9875685.post@talk.nabble.com>
	<A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>
Message-ID: <9879110.post@talk.nabble.com>


I added the line: 
close(OUTPUT);
and now following error comes up, where 'out.blast' is supposed to be the
blast result file, but it not being created. 

Software error:
------------- EXCEPTION  -------------
MSG: Could not open /export/home/dorjee/result/out.blast: No such file or
directory
STACK Bio::Root::IO::_initialize_io /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:167
STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:53

--------------------------------------


Jason Stajich-3 wrote:
> 
> Looks like you need to deal with buffering:
> 
> http://perl.plover.com/FAQs/Buffering.html
> 
> So you need to add this:
> close(OUTPUT);
> 
> Alternatively you can build a sequence object and pass that in to the  
> BLAST factory, then you don't have to mess around with creating  
> temporary files or run into this sort of problem.
> 
> -jason
> On Apr 6, 2007, at 10:39 AM, DeeGee wrote:
> 
>>
>> Following is the part of my script, which is in the 'htdocs'  
>> directory:
>>
>> ####### part of my script #############
>> #generate a new CGI object from the input to the CGI script
>> my $query=new CGI;
>>
>> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");
>>
>> print STDOUT $query->header();
>> print STDOUT $query->start_html(-title=>"Response from blast",
>> -BGCOLOR=>"#FFFFFF");
>> print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";
>>
>> #gets the sequence from the html textarea with ?post? method
>> my $fasta_file=$query->param('sequence');
>> print OUTPUT $fasta_file;
>>
> close(OUTPUT);
>> #Local blast of the input sequence against nr database
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
>> 'Fasta');
>> die "could not open fasta" if not defined $Seq_in;
>> my $queryin = $Seq_in->next_seq();
>> die "could not get seq" if not defined $queryin;
>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
>> 'blastp',
>>                                                  'database' =>
>> '/export/home/dorjee/database/nr',
>>                                                  _READMETHOD =>  
>> 'Blast'
>>                                                    );
>> $factory->outfile("result/out.blast");
>> my $blastreport = $factory->blastall($queryin);
>> .....
>>
>> Thank you.
>>
>>
>>
>> Jason Stajich-3 wrote:
>>>
>>> When/How are are you writing your sequences to this file result.faa?
>>> are you using seqIO or bioperl to write the sequence  to a file?
>>> I'm wondering if this is I/O buffering problem.
>>>
>>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>>>
>>>>
>>>> hi Torsten,
>>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>>>> works
>>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta
>>>> format:
>>>>
>>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA 
>>>> SV
>>>> SPSMTVASSQ
>>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP 
>>>> LA
>>>> GTAPGAEGPA
>>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK 
>>>> AF
>>>> RRKEHLRRHR
>>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL 
>>>> RH
>>>> QRIHGRAAAS
>>>> AQGAVAPGPDGGGPFPPWPLG
>>>>
>>>> it seems like i'm just one bloody step away from success. ^ ^*
>>>> can't figure
>>>> out the prob.
>>>> thanks for your help.
>>>>
>>>>
>>>> Torsten Seemann wrote:
>>>>>
>>>>> Dorjee,
>>>>>
>>>>>> thanks alot for your reply again. as per your suggestion (using  
>>>>>> 'die
>>>>>> "could
>>>>>> not get seq" if not defined $queryin;'), i now get the following
>>>>>> error
>>>>>> message:
>>>>>> Software error:
>>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl
>>>>>> line 50.
>>>>>> i've attached the script. could you plz have a look at it and see
>>>>>> where
>>>>>> am i
>>>>>> going wrong.
>>>>>> cheers mate!
>>>>>
>>>>> This strongly suggests that your FASTA file is not actually in  
>>>>> FASTA
>>>>> format.
>>>>> http://en.wikipedia.org/wiki/Fasta_format
>>>>>
>>>>> Does it work if you pass it to blastall on the command line?
>>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/
>>>>> database/nr
>>>>>
>>>>>> Saier Lab.
>>>>>> 858-534-2457
>>>>>
>>>>> Are you working at UCSD?
>>>>>
>>>>> --Torsten
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/blastall-
>>>> problem-tf3527412.html#a9867402
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> Miller Research Fellow
>>> University of California, Berkeley
>>> lab: 510.642.8441
>>> http://pmb.berkeley.edu/~taylor/people/js.html
>>> http://fungalgenomes.org/
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/blastall- 
>> problem-tf3527412.html#a9875685
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
> 
> 
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9879110
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gilbertd at cricket.bio.indiana.edu  Fri Apr  6 23:31:29 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Fri, 6 Apr 2007 22:31:29 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704070331.l373VTI22000@cricket.bio.indiana.edu>


Dear Bioperlers,

There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta
files have fixed line widths, but that isn't a requirement of Fasta
format. The documentation notes this package requirement, but I was
bitten by this, and I'd guess not many people check their data (esp.
if from someone else) to see it meets this requirement.

Simple tools can easily produce fasta with ragged line formatting:
e.g. genome assemblers that paste together contig fasta with spacers
to make assemblies.

It would be nice if B:D:Fasta would check and die when it can't handle
this ragged input.  Here is a suggested addition:

  package Bio::DB::Fasta;

=head1 DESCRIPTION
  
  Entries may have any line length up to 65,536 characters, and
  different line lengths are allowed in the same file.  However, within
  a sequence entry, all lines must be the same length except for the
  last.  
+ An error will be thrown if this is not the case.

=cut

  use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want 
  
  sub calculate_offsets {
  
     my ($offset,$id,$linelength,$type,$firstline,$count,$termination_length,%offsets);
  +  my ($l3_len,$l2_len,$l_len)=(0,0,0);
  
         $self->_check_linelength($linelength);
  +      ($l3_len,$l2_len,$l_len)=(0,0,0);
       } else {
  +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to check every line :(
  +      if(DIE_ON_MISSMATCHED_LINES &&
  +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
  +         my $fap= substr($_,0,20)."..";
  +         $self->throw("Each line of the fasta entry must be the same length except the last.
  +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
  +         }
  
         $linelength ||= length($_);
  
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


From hlapp at gmx.net  Sat Apr  7 12:42:13 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 12:42:13 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
References: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
Message-ID: <05D43C56-8B30-41C9-8C35-2CD77419DE7F@gmx.net>

Wouldn't it be easier (and more robust) to just reformat the file to  
meet the constant line width requirement? The code required to do  
that should be fewer lines than your addition below, I think.

For example, one could do a fast first-pass through the file simply  
checking that all sequence lines not followed by a description line  
or eof have the same length, stopping at the first line that fails  
the test. If unequal lengths, use Bio::SeqIO to read and write back  
out the fasta file, then continue as for well-formatted files.

	-hilmar

On Apr 6, 2007, at 11:31 PM, Don Gilbert wrote:

>
> Dear Bioperlers,
>
> There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta
> files have fixed line widths, but that isn't a requirement of Fasta
> format. The documentation notes this package requirement, but I was
> bitten by this, and I'd guess not many people check their data (esp.
> if from someone else) to see it meets this requirement.
>
> Simple tools can easily produce fasta with ragged line formatting:
> e.g. genome assemblers that paste together contig fasta with spacers
> to make assemblies.
>
> It would be nice if B:D:Fasta would check and die when it can't handle
> this ragged input.  Here is a suggested addition:
>
>   package Bio::DB::Fasta;
>
> =head1 DESCRIPTION
>
>   Entries may have any line length up to 65,536 characters, and
>   different line lengths are allowed in the same file.  However,  
> within
>   a sequence entry, all lines must be the same length except for the
>   last.
> + An error will be thrown if this is not the case.
>
> =cut
>
>   use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want
>
>   sub calculate_offsets {
>
>      my ($offset,$id,$linelength,$type,$firstline,$count, 
> $termination_length,%offsets);
>   +  my ($l3_len,$l2_len,$l_len)=(0,0,0);
>
>          $self->_check_linelength($linelength);
>   +      ($l3_len,$l2_len,$l_len)=(0,0,0);
>        } else {
>   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); #  
> need to check every line :(
>   +      if(DIE_ON_MISSMATCHED_LINES &&
>   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
>   +         my $fap= substr($_,0,20)."..";
>   +         $self->throw("Each line of the fasta entry must be the  
> same length except the last.
>   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
>   +         }
>
>          $linelength ||= length($_);
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr  7 17:13:24 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 17:13:24 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704071711.l37HBB823983@cricket.bio.indiana.edu>
References: <200704071711.l37HBB823983@cricket.bio.indiana.edu>
Message-ID: <8177CF47-558F-4891-97B5-69F327EF8A4A@gmx.net>

What I was suggesting was the indexer automatically does the  
reformatting, i.e., to have touch/change the input data if necessary  
(and obviously one would be able to turn this feature off when the  
correctness of the input formatting is known).

Are you suggesting that this automatic reformatting isn't possible?

	-hilmar

On Apr 7, 2007, at 1:11 PM, Don Gilbert wrote:

>
>
> Hilmar,
>
> I have added reformatting where appropriate (in code that installs the
> files for indexing by Bio::DB::Fasta).  What I'm suggesting is a patch
> to Bio::DB::Fasta to warn and die when the documented fixed width
> that Bio::DB::Fasta requires isn't met.  I.e., keep other folks from
> being bitten by this hard to identify requirement.  Then when they
> see that this indexer is failing on inappropriate inputs, they also  
> can reformat
> their Fasta to meet this requirement, and not continue to use the  
> software with
> bad results.  The operation of Bio::DB::Fasta is reading a sequence  
> stream
> and it doesn't touch/change the input data, so it would be hard to  
> patch it
> to re-format the input data.
>
> - Don
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr  7 21:00:51 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 21:00:51 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>
References: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>
Message-ID: <B8009E72-30C5-479B-B7B9-456E859B80CB@gmx.net>

Since you'd have to reformat it though, how would you do it then  
(presumably offline)?

	-hilmar

On Apr 7, 2007, at 8:06 PM, Don Gilbert wrote:

>
>
> Hilmar,
>
> Yes, basically automatic reformatting isn't possible. If you are
> indexing a large genome of fasta data, I'd not want a bioperl script
> to rewrite that data, or create a new version, automatically.
>
> - Don

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From gilbertd at cricket.bio.indiana.edu  Sat Apr  7 13:11:11 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Sat, 7 Apr 2007 12:11:11 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704071711.l37HBB823983@cricket.bio.indiana.edu>


Hilmar,

I have added reformatting where appropriate (in code that installs the 
files for indexing by Bio::DB::Fasta).  What I'm suggesting is a patch
to Bio::DB::Fasta to warn and die when the documented fixed width
that Bio::DB::Fasta requires isn't met.  I.e., keep other folks from
being bitten by this hard to identify requirement.  Then when they
see that this indexer is failing on inappropriate inputs, they also can reformat 
their Fasta to meet this requirement, and not continue to use the software with
bad results.  The operation of Bio::DB::Fasta is reading a sequence stream
and it doesn't touch/change the input data, so it would be hard to patch it
to re-format the input data.

- Don

-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


From gilbertd at cricket.bio.indiana.edu  Sat Apr  7 20:06:34 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Sat, 7 Apr 2007 19:06:34 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>


Hilmar,

Yes, basically automatic reformatting isn't possible. If you are
indexing a large genome of fasta data, I'd not want a bioperl script
to rewrite that data, or create a new version, automatically.

- Don


From gdorjee at hotmail.com  Mon Apr  9 00:18:39 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 8 Apr 2007 21:18:39 -0700 (PDT)
Subject: [Bioperl-l] parse blast report for the best evalue
Message-ID: <9898358.post@talk.nabble.com>


hi all, 
i'm trying to parse a blast report using Bio::SearchIO as follows, but since
this blast report is generated with many against many (database) fasta
sequences, there're many individual blast reports (one for each of the
sequence from the query file). i was wondering if there is a way to get only
the best hit (with best evalue) from each one of them.

##### part of my script ######
my $in = new Bio::SearchIO(-format => 'blast',  -file   => $blast_report);
while( my $result = $in->next_result ) {
        while( my $hit = $result->next_hit ) {
              ...........

thanks.


-- 
View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9898358
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From staffa at niehs.nih.gov  Mon Apr  9 11:43:19 2007
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Mon, 09 Apr 2007 11:43:19 -0400
Subject: [Bioperl-l] Retrieve mRNA from Genome
Message-ID: <C23FD757.3FAB%staffa@niehs.nih.gov>

I have been retrieving sub-sequence from Genbank genomic records by use of
Bio::SeqIO
and ->get_SeqFeatures, ->start ->end ,
but now I'm looking for a quick way to extract CDS or mRNA from
a multi-segmented annotation, e.g.
     mRNA          
join(72458..72791,84573..84613,93279..94419,94481..94656,
                     94719..94992,95056..95350,95438..95553,95614..96056)

Is there such a method?
Please point me to appropriate documentation.


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From Kevin.M.Brown at asu.edu  Mon Apr  9 12:19:19 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 9 Apr 2007 09:19:19 -0700
Subject: [Bioperl-l] Retrieve mRNA from Genome
In-Reply-To: <C23FD757.3FAB%staffa@niehs.nih.gov>
References: <C23FD757.3FAB%staffa@niehs.nih.gov>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCAED7@EX02.asurite.ad.asu.edu>

I believe that is what the spliced_seq method is for

$feat->spliced_seq    # the "joined" sequence, when there are
                      # multiple sub-locations

http://www.bioperl.org/wiki/Bptutorial.pl 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Staffa, Nick (NIH/NIEHS)
> Sent: Monday, April 09, 2007 8:43 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Retrieve mRNA from Genome
> 
> I have been retrieving sub-sequence from Genbank genomic 
> records by use of Bio::SeqIO and ->get_SeqFeatures, ->start 
> ->end , but now I'm looking for a quick way to extract CDS or 
> mRNA from a multi-segmented annotation, e.g.
>      mRNA          
> join(72458..72791,84573..84613,93279..94419,94481..94656,
>                      
> 94719..94992,95056..95350,95438..95553,95614..96056)
> 
> Is there such a method?
> Please point me to appropriate documentation.


From cjfields at uiuc.edu  Mon Apr  9 12:50:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 9 Apr 2007 11:50:05 -0500
Subject: [Bioperl-l] parse blast report for the best evalue
In-Reply-To: <9898358.post@talk.nabble.com>
References: <9898358.post@talk.nabble.com>
Message-ID: <C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>

You should probably use sort_hits() with a coderef that sorts by  
evalue to ensure that you retrieve the best evalue (significance()  
for hits) (see POD for Bio::Search::Result::ResultI).  You could then  
do something like:

my $hit;

unless ($result->no_hits_found) {
    # pass coderef to sort by evalue
    $result->sort_hits(\&sort_by_evalue);
    # retrieve first (best) hit
    $hit = $result->next_hit;
}

# do whatever you want with the best Hit

If you plan on retaining data from hits over a ton of different  
reports it may be best (memory-wise) to only retain the data you want  
for each hit instead of retaining the actual object.  For instance,  
if you only care about the description and evalue set up a simple  
data structure to house what you want by the query data instead of  
retaining all the extra stuff in the Hit object you don't need (all  
the HSP data, etc).

chris

On Apr 8, 2007, at 11:18 PM, DeeGee wrote:

>
> hi all,
> i'm trying to parse a blast report using Bio::SearchIO as follows,  
> but since
> this blast report is generated with many against many (database) fasta
> sequences, there're many individual blast reports (one for each of the
> sequence from the query file). i was wondering if there is a way to  
> get only
> the best hit (with best evalue) from each one of them.
>
> ##### part of my script ######
> my $in = new Bio::SearchIO(-format => 'blast',  -file   =>  
> $blast_report);
> while( my $result = $in->next_result ) {
>         while( my $hit = $result->next_hit ) {
>               ...........
>
> thanks.
>
>
> -- 
> View this message in context: http://www.nabble.com/parse-blast- 
> report-for-the-best-evalue-tf3545784.html#a9898358
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr  9 15:40:02 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 9 Apr 2007 12:40:02 -0700 (PDT)
Subject: [Bioperl-l] parse blast report for the best evalue
In-Reply-To: <C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>
References: <9898358.post@talk.nabble.com>
	<C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>
Message-ID: <9907757.post@talk.nabble.com>


thank you, Chris.
^ ^*

Chris Fields wrote:
> 
> You should probably use sort_hits() with a coderef that sorts by  
> evalue to ensure that you retrieve the best evalue (significance()  
> for hits) (see POD for Bio::Search::Result::ResultI).  You could then  
> do something like:
> 
> my $hit;
> 
> unless ($result->no_hits_found) {
>     # pass coderef to sort by evalue
>     $result->sort_hits(\&sort_by_evalue);
>     # retrieve first (best) hit
>     $hit = $result->next_hit;
> }
> 
> # do whatever you want with the best Hit
> 
> If you plan on retaining data from hits over a ton of different  
> reports it may be best (memory-wise) to only retain the data you want  
> for each hit instead of retaining the actual object.  For instance,  
> if you only care about the description and evalue set up a simple  
> data structure to house what you want by the query data instead of  
> retaining all the extra stuff in the Hit object you don't need (all  
> the HSP data, etc).
> 
> chris
> 
> On Apr 8, 2007, at 11:18 PM, DeeGee wrote:
> 
>>
>> hi all,
>> i'm trying to parse a blast report using Bio::SearchIO as follows,  
>> but since
>> this blast report is generated with many against many (database) fasta
>> sequences, there're many individual blast reports (one for each of the
>> sequence from the query file). i was wondering if there is a way to  
>> get only
>> the best hit (with best evalue) from each one of them.
>>
>> ##### part of my script ######
>> my $in = new Bio::SearchIO(-format => 'blast',  -file   =>  
>> $blast_report);
>> while( my $result = $in->next_result ) {
>>         while( my $hit = $result->next_hit ) {
>>               ...........
>>
>> thanks.
>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/parse-blast- 
>> report-for-the-best-evalue-tf3545784.html#a9898358
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9907757
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bosborne11 at verizon.net  Tue Apr 10 09:55:37 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 10 Apr 2007 09:55:37 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
Message-ID: <C2410F99.DA34%bosborne11@verizon.net>

OK, applied.


On 4/6/07 11:31 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu> wrote:

>   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to
> check every line :(
>   +      if(DIE_ON_MISSMATCHED_LINES &&
>   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
>   +         my $fap= substr($_,0,20)."..";
>   +         $self->throw("Each line of the fasta entry must be the same length
> except the last.
>   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
>   +         }


From MEC at stowers-institute.org  Tue Apr 10 12:21:45 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 10 Apr 2007 11:21:45 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option
Message-ID: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>

Lincoln,

In `perldoc Bio::DB::SeqFeature::Store` I read:

"Caching requires the Tie::Cacher module to be installed. If the module
is not installed, then caching will silently be disabled."

I am wondering about the design motivation for silently disabling
caching when Tie::Cacher is not installed.  Perhaps at least emitting a
warning when -cache is requested and Tie::Cacher is not present is a
good idea?

I am writing a code that depends upon caching (i.e. upon the equality of
in-memory objects).

Do you advise that I don't depend upon Tie::Cacher working?  I
understand that it will NOT work as hoped if the cache is too small for
my application.

Thanks,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
 

From cjfields at uiuc.edu  Tue Apr 10 12:31:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 11:31:43 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
Message-ID: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>

At the moment we do not have a comprehensive list up on the wiki.  I  
have been slowly working (alphabetically!) to switch them over, so  
any help would be appreciated.

I have CC'd this to the main mail list for anyone else interested.

chris

On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:

> Hey guys,
>
> I noticed there's an open task regarding moving testing code to use
> Test::More etc and that Chris and Nathan are already on to it. Is
> there any kind of wiki page that you keep track of which modules you
> are already working on? I am new to this and want to contribute,
> having a fair amount of unit testing from work, but don't want to step
> over other people's work and avoid duplication as well.
> Any pointers where i could get started would be much appreciated :-)
>
> Thanks,
> Spiros
>
> ps. apologies if this is not the correct list to post this, just
> seemed the most intuitive choice.
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From spiros at lokku.com  Tue Apr 10 12:34:49 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Tue, 10 Apr 2007 17:34:49 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
Message-ID: <bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>

Okay, awesome, thank you for the info. I'll get started and see how it goes!

Spiros

On 4/10/07, Chris Fields <cjfields at uiuc.edu> wrote:
> At the moment we do not have a comprehensive list up on the wiki.  I
> have been slowly working (alphabetically!) to switch them over, so
> any help would be appreciated.
>
> I have CC'd this to the main mail list for anyone else interested.
>
> chris
>
> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>
> > Hey guys,
> >
> > I noticed there's an open task regarding moving testing code to use
> > Test::More etc and that Chris and Nathan are already on to it. Is
> > there any kind of wiki page that you keep track of which modules you
> > are already working on? I am new to this and want to contribute,
> > having a fair amount of unit testing from work, but don't want to step
> > over other people's work and avoid duplication as well.
> > Any pointers where i could get started would be much appreciated :-)
> >
> > Thanks,
> > Spiros
> >
> > ps. apologies if this is not the correct list to post this, just
> > seemed the most intuitive choice.
> > _______________________________________________
> > Bioperl-guts-l mailing list
> > Bioperl-guts-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Tue Apr 10 12:34:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 11:34:12 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>
Message-ID: <0D396A53-9911-4304-88FE-CCD6884A2699@uiuc.edu>


On Apr 10, 2007, at 11:21 AM, Cook, Malcolm wrote:

> Lincoln,
>
> In `perldoc Bio::DB::SeqFeature::Store` I read:
>
> "Caching requires the Tie::Cacher module to be installed. If the  
> module
> is not installed, then caching will silently be disabled."
>
> I am wondering about the design motivation for silently disabling
> caching when Tie::Cacher is not installed.  Perhaps at least  
> emitting a
> warning when -cache is requested and Tie::Cacher is not present is a
> good idea?

...

Maybe this should be added to the optional BioPerl dependencies?   
It's not listed in Build.PL in CVS...

chris


From cjfields at uiuc.edu  Tue Apr 10 13:22:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 12:22:33 -0500
Subject: [Bioperl-l] ] moving tests to use Test::More
In-Reply-To: <bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>
Message-ID: <DFAA7C75-BC52-4027-9816-5970404D1558@uiuc.edu>

When moving tests over be particularly careful of 'ok' tests which  
should be 'is'; a few older tests have display messages which make  
things tricky.  Use 'isa_ok', 'use_ok', 'require_ok', 'like', etc.  
where appropriate.

Also, we are not supporting TODO blocks at this time due to the  
upgrade needed for Test::Harness (which isn't necessary for BioPerl  
functionality).  Just use a skip block with a message if you run into  
something, like this (from RNA_SearchIO.t):

SKIP: {
     skip('Working on meta string building; TODO', 3);
     is($hsp->meta, 'blahblahblah', "HSP meta");
     # two more tests...
}

Thanks for helping out!

chris

On Apr 10, 2007, at 11:34 AM, Spiros Denaxas wrote:

> Okay, awesome, thank you for the info. I'll get started and see how  
> it goes!
>
> Spiros
...


From gopu_36 at yahoo.com  Tue Apr 10 03:42:26 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT)
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome
Message-ID: <9915265.post@talk.nabble.com>


Hi,
I am one of the newbee venturingout bioperl for my research purposes. I have
a whole genome sequence of a pathogen. I am trying to break them into
non-overlapping 1000bps subsequences. For example if my whole genome
sequence is 400000 bps length, then I should be beak them into 4000
subsequences of each 1000 bps and they should be non-overlapping but at the
same time continous. To be precise, my first substring would be from 1 to
1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone
help me. 
I tried with the following code but it gives me only the first substring and
rest are not! I would appreciate very much if someone could help me!
.........
.
.
my $start =1;
my $finish =100;
my $inseq  = Bio::SeqIO->new(-file => "$in_file");
while( my $seq = $inseq->next_seq ) {
	
	my $cleseq = $seq->seq();
	
	$seqlength = $seq->length();
	if ($finish<$seqlength){	
	print "The length of the sequence is $seqlength\n";	
	my $ordseq = $cleseq->subseq($start,$finish);
          push(@seq_array,$ordseq);
          $start=+100;
          $finish=+100;
          $counter++;
          next;          	             
       } 
}
-- 
View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gopu_36 at yahoo.com  Tue Apr 10 03:42:26 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT)
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome
Message-ID: <9915265.post@talk.nabble.com>


Hi,
I am one of the newbee venturingout bioperl for my research purposes. I have
a whole genome sequence of a pathogen. I am trying to break them into
non-overlapping 1000bps subsequences. For example if my whole genome
sequence is 400000 bps length, then I should be beak them into 4000
subsequences of each 1000 bps and they should be non-overlapping but at the
same time continous. To be precise, my first substring would be from 1 to
1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone
help me. 
I tried with the following code but it gives me only the first substring and
rest are not! I would appreciate very much if someone could help me!
.........
.
.
my $start =1;
my $finish =100;
my $inseq  = Bio::SeqIO->new(-file => "$in_file");
while( my $seq = $inseq->next_seq ) {
	
	my $cleseq = $seq->seq();
	
	$seqlength = $seq->length();
	if ($finish<$seqlength){	
	print "The length of the sequence is $seqlength\n";	
	my $ordseq = $cleseq->subseq($start,$finish);
          push(@seq_array,$ordseq);
          $start=+100;
          $finish=+100;
          $counter++;
          next;          	             
       } 
}
-- 
View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bix at sendu.me.uk  Tue Apr 10 16:10:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 10 Apr 2007 21:10:35 +0100
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
 genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <461BEF3B.3080708@sendu.me.uk>

gopu_36 wrote:
> Hi,
> I am one of the newbee venturingout bioperl for my research purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences.
[snip]
> I tried with the following code but it gives me only the first substring and
> rest are not! I would appreciate very much if someone could help me!
[snip]
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	             
>        } 
> }

Unless I've misunderstood, there are a few problems here.

I'm guessing $in_file is a file containing the entire genome sequence as 
a single sequence. This means your while loop will only loop once. To do 
what you want you then need another loop that acts on the single $seq 
object you're going to get. You don't need $cleseq, and in fact your 
script ought to crash on the $cleseq->subseq line because $cleseq is a 
string which has no subseq() method. $seq->subseq is what you want.

I didn't look at the remaining code.


Hope that helps,
Sendu.


From cjfields at uiuc.edu  Tue Apr 10 16:22:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 15:22:15 -0500
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
	genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <88E9CC63-48FD-444B-877D-12BB1D944214@uiuc.edu>

There is a script in the BioPerl scripts directory which does this,  
with optional overlaps (split_seq.PLS).  It's in /scripts/seq.

chris

On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:

>
> Hi,
> I am one of the newbee venturingout bioperl for my research  
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping  
> but at the
> same time continous. To be precise, my first substring would be  
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could  
> anyone
> help me.
> I tried with the following code but it gives me only the first  
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	
>        }
> }
> -- 
> View this message in context: http://www.nabble.com/extract- 
> nonoverlapping-subsequences-from-a-whole-genome- 
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Apr 10 16:57:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 15:57:20 -0500
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
	genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <18529D36-C772-474A-9CE6-A29FA0C59ABA@uiuc.edu>

Okay, I was bored!  This is a little shorter than that script:

my $seqin = Bio::SeqIO->new(-format => 'fasta',
                             -file => shift);

my $seqout = Bio::SeqIO->new(-format => 'fasta',
                             -file => '>split.fas');

while( my $seq = $seqin->next_seq ) {
     my $seqlength = $seq->length();
     print STDERR "Length is $seqlength\n";
     my $start = 1;
     my $end = 100;
     my $desc = $seq->description;
     CHUNK:
     while ($end <= $seqlength){
         my $ordseq = $seq->trunc($start,$end);
         $ordseq->description("$start-$end $desc");
         $seqout->write_seq($ordseq);
         last CHUNK if $end >= $seqlength;
         $start += 100;
         $end = ($end + 100 > $seqlength) ? $seqlength : $end + 100;
     }
}

chris

On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:

>
> Hi,
> I am one of the newbee venturingout bioperl for my research  
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping  
> but at the
> same time continous. To be precise, my first substring would be  
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could  
> anyone
> help me.
> I tried with the following code but it gives me only the first  
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	
>        }
> }
> -- 
> View this message in context: http://www.nabble.com/extract- 
> nonoverlapping-subsequences-from-a-whole-genome- 
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Tue Apr 10 18:01:37 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 10 Apr 2007 18:01:37 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <C2410F99.DA34%bosborne11@verizon.net>
References: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
	<C2410F99.DA34%bosborne11@verizon.net>
Message-ID: <6dce9a0b0704101501y15b96e20w89c4b9ef4abc1b48@mail.gmail.com>

I'm happy I didn't catch this thread until just now, but my preferred course
of action was to do exactly what Brian did and accept the patch.

Lincoln

On 4/10/07, Brian Osborne <bosborne11 at verizon.net> wrote:
>
> OK, applied.
>
>
> On 4/6/07 11:31 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu>
> wrote:
>
> >   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need
> to
> > check every line :(
> >   +      if(DIE_ON_MISSMATCHED_LINES &&
> >   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
> >   +         my $fap= substr($_,0,20)."..";
> >   +         $self->throw("Each line of the fasta entry must be the same
> length
> > except the last.
> >   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
> >   +         }
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From heikki at sanbi.ac.za  Wed Apr 11 05:14:27 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 11 Apr 2007 11:14:27 +0200
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
Message-ID: <200704111114.27839.heikki@sanbi.ac.za>

What is going on here? Can anyone remember doing this?

	-Heikki 

Please can I ask what is the purpose of the line @pos = sort @pos; in
the select_noncont subroutine of SimpleAlign.pm.

 
In previous versions this line was not present and I could use the
function to reorder the alignment e.g in an alignment with 5 sequences I
could reorder it to put the second sequence last using
$aln->select_noncont(1,3,4,5,2). The sort function stops this, but even
if the idea is to sort numerically this dos not work since the sort
function as is will put 10 before 2, so that
->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .

 
Many thanks

 
Anthony


From cjfields at uiuc.edu  Wed Apr 11 08:33:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 07:33:42 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za>
References: <200704111114.27839.heikki@sanbi.ac.za>
Message-ID: <F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>

Don't know when this was added.  Maybe we should make the sorting  
optional?  In other words, pass an optional 'nosort' string as the  
first arg, defaulting to numerical sort.

Either way the sort needs to be changed by the looks of it.  I'll  
verify the bug and commit today.

chris

On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote:

> What is going on here? Can anyone remember doing this?
>
> 	-Heikki
>
> Please can I ask what is the purpose of the line @pos = sort @pos; in
> the select_noncont subroutine of SimpleAlign.pm.
>
>
>
> In previous versions this line was not present and I could use the
> function to reorder the alignment e.g in an alignment with 5  
> sequences I
> could reorder it to put the second sequence last using
> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but  
> even
> if the idea is to sort numerically this dos not work since the sort
> function as is will put 10 before 2, so that
> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>
>
>
> Many thanks
>
>
>
> Anthony
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lzlgboy at gmail.com  Wed Apr 11 08:48:30 2007
From: lzlgboy at gmail.com (kenzy ken)
Date: Wed, 11 Apr 2007 20:48:30 +0800
Subject: [Bioperl-l] How to Remove root node from a tree, ???
Message-ID: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>

Hi all:
    I write a script which used the Bio::Tree module. I want to remove some
nodes from the tree, so I used "$tree->remove_Node($node_object);method . It
works ok, but when I remove root node, problem happened. It seens that this
method can not remove root node, so ,if you guys have any idea about how to
remove the root ,it will be very appreciated.

-- 
??????
Chen,Kenian
===========================
School of Life Science, Sun Yat-Sen University
===========================
Xingang Xilu 135
Guangzhou, Guangdong 510275
P. R. China
===========================
Phone: (86) 20-84113677; (86) 20-34474683;
Fax: (86) 20-34022356
===========================
Email:lzlgboy at gmail.com;
chenkn at mail2.sysu.edu.cn


From cjfields at uiuc.edu  Wed Apr 11 09:13:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 08:13:40 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>
Message-ID: <9DE1A554-4F33-45D1-9043-732FEB86ECD5@uiuc.edu>

I confirmed this; it is now fixed in CVS.  I have also added the  
option to prevent sorting if needed:

$aln2 = $aln->select_noncont(6,7,8,9,10,1,2,3,4,5);

sorts numerically by default.

$aln2 = $aln->select_noncont('nosort',6,7,8,9,10,1,2,3,4,5);

prevents sorting.  I have added a few tests to SimpleAlign.t for  
these.  It doesn't change the default behavior so shouldn't break  
anything.

chris

On Apr 11, 2007, at 7:33 AM, Chris Fields wrote:

> Don't know when this was added.  Maybe we should make the sorting
> optional?  In other words, pass an optional 'nosort' string as the
> first arg, defaulting to numerical sort.
>
> Either way the sort needs to be changed by the looks of it.  I'll
> verify the bug and commit today.
>
> chris
>
> On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote:
>
>> What is going on here? Can anyone remember doing this?
>>
>> 	-Heikki
>>
>> Please can I ask what is the purpose of the line @pos = sort @pos; in
>> the select_noncont subroutine of SimpleAlign.pm.
>>
>>
>>
>> In previous versions this line was not present and I could use the
>> function to reorder the alignment e.g in an alignment with 5
>> sequences I
>> could reorder it to put the second sequence last using
>> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but
>> even
>> if the idea is to sort numerically this dos not work since the sort
>> function as is will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>
>>
>>
>> Many thanks
>>
>>
>>
>> Anthony
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 11 09:21:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 11 Apr 2007 14:21:25 +0100
Subject: [Bioperl-l] How to Remove root node from a tree, ???
In-Reply-To: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>
References: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>
Message-ID: <461CE0D5.9040001@sendu.me.uk>

kenzy ken wrote:
> Hi all:
>    I write a script which used the Bio::Tree module. I want to remove some
> nodes from the tree, so I used "$tree->remove_Node($node_object);method 
> . It
> works ok, but when I remove root node, problem happened. It seens that this
> method can not remove root node, so ,if you guys have any idea about how to
> remove the root ,it will be very appreciated.

You'll have to re-root the tree to some other node in the tree. See the 
reroot() method.

(I don't think Bio::Tree::Tree objects can be unrooted.)


From emeric.sevin at univ-rennes1.fr  Wed Apr 11 09:32:38 2007
From: emeric.sevin at univ-rennes1.fr (Emeric Sevin)
Date: Wed, 11 Apr 2007 15:32:38 +0200
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
Message-ID: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>

Hi everybody,

I'm sorry to bug, but either I missed something so obvious nobody 
bothered to answer, either I'm being a little boycotted here...
A little help would be very much appreciated

Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :

> Hello,
>
> I am new to this community, and apologize if this subject has been 
> posted before.
>
> I want to print out only selected results from a multiple 
> blast-alignments results file. Problem is, the algorithm used is 
> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the actual 
> writing task yields "unclean" warnings. Although an ouput is actually 
> written, the writer (Bio::SearchIO::Writer::TextResultWriter) seems to 
> be disturbed by the fact rpsblast DBs are not labeled with 
> "protein"/"nucleic"/"translated".
> Does anybody know of an easy fix to that bug, or of another way to 
> come around it?
>
> Thank you very much
>
> Emeric SEVIN
> Universit? de Rennes 1_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1110 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/9784f194/attachment-0002.bin>

From cjfields at uiuc.edu  Wed Apr 11 10:44:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 09:44:27 -0500
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
Message-ID: <D0E54B3C-A345-4A90-9571-25144622265D@uiuc.edu>

We could ignore this post... oh the irony!  ;>

It has nothing to do with ignoring you.  Read this:

http://en.wikipedia.org/wiki/Warnock's_Dilemma

Basically your question probably fell on deaf ears b/c no one has  
time to look into it and post a fix.  Realize that BioPerl is, for  
the large part, a volunteer effort and we all have $jobs to worry  
about.  If you want you are more than welcome to file a bug on this  
(if it isn't already filed), which is the best way to make sure  
something is done:

http://www.bioperl.org/wiki/Bugs
http://bugzilla.open-bio.org/

chris


On Apr 11, 2007, at 8:32 AM, Emeric Sevin wrote:

> Hi everybody,
>
> I'm sorry to bug, but either I missed something so obvious nobody  
> bothered to answer, either I'm being a little boycotted here...
> A little help would be very much appreciated
>
> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>
>> Hello,
>>
>> I am new to this community, and apologize if this subject has been  
>> posted before.
>>
>> I want to print out only selected results from a multiple blast- 
>> alignments results file. Problem is, the algorithm used is  
>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the  
>> actual writing task yields "unclean" warnings. Although an ouput  
>> is actually written, the writer  
>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by  
>> the fact rpsblast DBs are not labeled with  
>> "protein"/"nucleic"/"translated".
>> Does anybody know of an easy fix to that bug, or of another way to  
>> come around it?
>>
>> Thank you very much
>>
>> Emeric SEVIN
>> Universit? de Rennes 1_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Wed Apr 11 10:30:11 2007
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Wed, 11 Apr 2007 15:30:11 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
Message-ID: <461CF0F3.1010708@sheffield.ac.uk>

It should be easy enough to find those t/*.t files that have "use Test;" 
or "require Test;" This should provide a list of files still needing to 
be converted over to Test::More. As discussed previously, it may also be 
useful to use Test::Exception to test for situations where 
exceptions/warnings are thrown. If you add additional tests using this 
module, you should add the Test::Exception module to t/lib/

Good luck, and feel free to mail the list with questions/comments etc.

Nath


Chris Fields wrote:
> At the moment we do not have a comprehensive list up on the wiki.  I  
> have been slowly working (alphabetically!) to switch them over, so  
> any help would be appreciated.
>
> I have CC'd this to the main mail list for anyone else interested.
>
> chris
>
> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>
>   
>> Hey guys,
>>
>> I noticed there's an open task regarding moving testing code to use
>> Test::More etc and that Chris and Nathan are already on to it. Is
>> there any kind of wiki page that you keep track of which modules you
>> are already working on? I am new to this and want to contribute,
>> having a fair amount of unit testing from work, but don't want to step
>> over other people's work and avoid duplication as well.
>> Any pointers where i could get started would be much appreciated :-)
>>
>> Thanks,
>> Spiros
>>
>> ps. apologies if this is not the correct list to post this, just
>> seemed the most intuitive choice.
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From spiros at lokku.com  Wed Apr 11 10:56:22 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Wed, 11 Apr 2007 15:56:22 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <461CF0F3.1010708@sheffield.ac.uk>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
Message-ID: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>

Yep! I have some rough stats I have at home, I will post them later on
tonight. Roughly, if i remember correctly, 50% of the tests were still
using Test, all the others were using Test::More.

More to follow later on,
Spiros

On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> It should be easy enough to find those t/*.t files that have "use Test;"
> or "require Test;" This should provide a list of files still needing to
> be converted over to Test::More. As discussed previously, it may also be
> useful to use Test::Exception to test for situations where
> exceptions/warnings are thrown. If you add additional tests using this
> module, you should add the Test::Exception module to t/lib/
>
> Good luck, and feel free to mail the list with questions/comments etc.
>
> Nath
>
>
> Chris Fields wrote:
> > At the moment we do not have a comprehensive list up on the wiki.  I
> > have been slowly working (alphabetically!) to switch them over, so
> > any help would be appreciated.
> >
> > I have CC'd this to the main mail list for anyone else interested.
> >
> > chris
> >
> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> >
> >
> >> Hey guys,
> >>
> >> I noticed there's an open task regarding moving testing code to use
> >> Test::More etc and that Chris and Nathan are already on to it. Is
> >> there any kind of wiki page that you keep track of which modules you
> >> are already working on? I am new to this and want to contribute,
> >> having a fair amount of unit testing from work, but don't want to step
> >> over other people's work and avoid duplication as well.
> >> Any pointers where i could get started would be much appreciated :-)
> >>
> >> Thanks,
> >> Spiros
> >>
> >> ps. apologies if this is not the correct list to post this, just
> >> seemed the most intuitive choice.
> >> _______________________________________________
> >> Bioperl-guts-l mailing list
> >> Bioperl-guts-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> >>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>


From Kevin.M.Brown at asu.edu  Wed Apr 11 11:14:07 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 11 Apr 2007 08:14:07 -0700
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za>
References: <200704111114.27839.heikki@sanbi.ac.za>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>

> What is going on here? Can anyone remember doing this?
> 
> 	-Heikki 
> 
> Please can I ask what is the purpose of the line @pos = sort 
> @pos; in the select_noncont subroutine of SimpleAlign.pm.
> 
>  
> 
> In previous versions this line was not present and I could 
> use the function to reorder the alignment e.g in an alignment 
> with 5 sequences I could reorder it to put the second 
> sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
> function stops this, but even if the idea is to sort 
> numerically this dos not work since the sort function as is 
> will put 10 before 2, so that
> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .

Not sure why 10 would come before 2 since perl would interpret that list
as a series of integers even if they were entered as strings and do the
sort.


From spiros at lokku.com  Wed Apr 11 11:51:27 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Wed, 11 Apr 2007 16:51:27 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <bba689ec0704110851qb1aa272m5db4e01356f28e92@mail.gmail.com>

This looks like the case of cmp vs <=> I think !

my @array = (1,10,2,3,4,5,6,7,8,9) ;
print join(",", @array), "\n";
my @sorted1 = sort(@array) ;
print join(",", @sorted1), "\n";
my @sorted2 = (sort { $a <=> $b } @array);
print join(",", @sorted2), "\n";

idaru:/tmp spiros$ perl koko.pl
1,10,2,3,4,5,6,7,8,9 # normal array
1,10,2,3,4,5,6,7,8,9 # sorted with sort
1,2,3,4,5,6,7,8,9,10 # sorted with <=>

Spiros


On 4/11/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> > What is going on here? Can anyone remember doing this?
> >
> >       -Heikki
> >
> > Please can I ask what is the purpose of the line @pos = sort
> > @pos; in the select_noncont subroutine of SimpleAlign.pm.
> >
> >
> >
> > In previous versions this line was not present and I could
> > use the function to reorder the alignment e.g in an alignment
> > with 5 sequences I could reorder it to put the second
> > sequence last using $aln->select_noncont(1,3,4,5,2). The sort
> > function stops this, but even if the idea is to sort
> > numerically this dos not work since the sort function as is
> > will put 10 before 2, so that
> > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ak at ebi.ac.uk  Wed Apr 11 11:58:52 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Wed, 11 Apr 2007 16:58:52 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <20070411155852.GC24537@ebi.ac.uk>

On Wed, Apr 11, 2007 at 08:14:07AM -0700, Kevin Brown wrote:
> > What is going on here? Can anyone remember doing this?
> > 
> > 	-Heikki 
> > 
> > Please can I ask what is the purpose of the line @pos = sort 
> > @pos; in the select_noncont subroutine of SimpleAlign.pm.
> > 
> >  
> > 
> > In previous versions this line was not present and I could 
> > use the function to reorder the alignment e.g in an alignment 
> > with 5 sequences I could reorder it to put the second 
> > sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
> > function stops this, but even if the idea is to sort 
> > numerically this dos not work since the sort function as is 
> > will put 10 before 2, so that
> > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> 
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.

Really?

$ perl -e 'print join(" ", sort(1..20)), "\n"';
1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9


-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
-------------------*=<>=*-------------------


From mkiwala at watson.wustl.edu  Wed Apr 11 11:51:35 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Wed, 11 Apr 2007 10:51:35 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <461D0407.8050105@watson.wustl.edu>

Kevin Brown wrote:
>> What is going on here? Can anyone remember doing this?
>>
>> 	-Heikki 
>>
>> Please can I ask what is the purpose of the line @pos = sort 
>> @pos; in the select_noncont subroutine of SimpleAlign.pm.
>>
>>  
>>
>> In previous versions this line was not present and I could 
>> use the function to reorder the alignment e.g in an alignment 
>> with 5 sequences I could reorder it to put the second 
>> sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
>> function stops this, but even if the idea is to sort 
>> numerically this dos not work since the sort function as is 
>> will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>     
>
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.
>
>   
Because, according to the documentation for Perl's sort function, 
sorting occurs "in standard string comparison order" unless the user 
specifies another comparison function to use.


From cjfields at uiuc.edu  Wed Apr 11 12:45:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 11:45:11 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
Message-ID: <FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>

We should probably place something on the wiki to prevent overlaps  
(i.e. make sure no two devs are working on the same tests).  I  
planned on working on the G's last night but got bogged down.

Spiros, if you haven't already go ahead and create a list on a wiki  
page for tracking.  We can lay claim to them by tagging with our sigs  
and cross them off once complete.

chris

On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:

> Yep! I have some rough stats I have at home, I will post them later on
> tonight. Roughly, if i remember correctly, 50% of the tests were still
> using Test, all the others were using Test::More.
>
> More to follow later on,
> Spiros
>
> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
>> It should be easy enough to find those t/*.t files that have "use  
>> Test;"
>> or "require Test;" This should provide a list of files still  
>> needing to
>> be converted over to Test::More. As discussed previously, it may  
>> also be
>> useful to use Test::Exception to test for situations where
>> exceptions/warnings are thrown. If you add additional tests using  
>> this
>> module, you should add the Test::Exception module to t/lib/
>>
>> Good luck, and feel free to mail the list with questions/comments  
>> etc.
>>
>> Nath
>>
>>
>> Chris Fields wrote:
>> > At the moment we do not have a comprehensive list up on the  
>> wiki.  I
>> > have been slowly working (alphabetically!) to switch them over, so
>> > any help would be appreciated.
>> >
>> > I have CC'd this to the main mail list for anyone else interested.
>> >
>> > chris
>> >
>> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>> >
>> >
>> >> Hey guys,
>> >>
>> >> I noticed there's an open task regarding moving testing code to  
>> use
>> >> Test::More etc and that Chris and Nathan are already on to it. Is
>> >> there any kind of wiki page that you keep track of which  
>> modules you
>> >> are already working on? I am new to this and want to contribute,
>> >> having a fair amount of unit testing from work, but don't want  
>> to step
>> >> over other people's work and avoid duplication as well.
>> >> Any pointers where i could get started would be much  
>> appreciated :-)
>> >>
>> >> Thanks,
>> >> Spiros
>> >>
>> >> ps. apologies if this is not the correct list to post this, just
>> >> seemed the most intuitive choice.
>> >> _______________________________________________
>> >> Bioperl-guts-l mailing list
>> >> Bioperl-guts-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>> >>
>> >
>> > Christopher Fields
>> > Postdoctoral Researcher
>> > Lab of Dr. Robert Switzer
>> > Dept of Biochemistry
>> > University of Illinois Urbana-Champaign
>> >
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 11 12:09:54 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 11 Apr 2007 17:09:54 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <461D0852.9070802@sendu.me.uk>

Kevin Brown wrote:
>>  but even if the idea is to sort
>> numerically this dos not work since the sort function as is 
>> will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> 
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.

The default sort for sort() is { $a cmp $b } (standard string comparison 
order): 10 comes before 2.

The fix was to explicitly say sort { $a <=> $b } for a numeric sort.


From cjfields at uiuc.edu  Wed Apr 11 12:46:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 11:46:46 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <461D0407.8050105@watson.wustl.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
	<461D0407.8050105@watson.wustl.edu>
Message-ID: <7001A1A4-5CF4-4C70-8EFA-94AF0D16864C@uiuc.edu>

I have confirmed the bug and fixed this in CVS.  Kevin's right; sort  
defaults to string comparison if no subroutine or sort block is  
specified.

perldoc -f sort:

sort SUBNAME LIST
sort BLOCK LIST
sort LIST
...
If SUBNAME or BLOCK is omitted, "sort"s in standard string com-
parison order.
...

chris

On Apr 11, 2007, at 10:51 AM, Michael Kiwala wrote:

> Kevin Brown wrote:
>>> What is going on here? Can anyone remember doing this?
>>>
>>> 	-Heikki
>>>
>>> Please can I ask what is the purpose of the line @pos = sort
>>> @pos; in the select_noncont subroutine of SimpleAlign.pm.
>>>
>>>
>>>
>>> In previous versions this line was not present and I could
>>> use the function to reorder the alignment e.g in an alignment
>>> with 5 sequences I could reorder it to put the second
>>> sequence last using $aln->select_noncont(1,3,4,5,2). The sort
>>> function stops this, but even if the idea is to sort
>>> numerically this dos not work since the sort function as is
>>> will put 10 before 2, so that
>>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the  
>>> sequences in
>>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>>
>>
>> Not sure why 10 would come before 2 since perl would interpret  
>> that list
>> as a series of integers even if they were entered as strings and  
>> do the
>> sort.
>>
>>
> Because, according to the documentation for Perl's sort function,
> sorting occurs "in standard string comparison order" unless the user
> specifies another comparison function to use.


From heikki at sanbi.ac.za  Wed Apr 11 12:39:57 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 11 Apr 2007 18:39:57 +0200
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
Message-ID: <200704111839.58940.heikki@sanbi.ac.za>

A bit more than half is still using Test:

~/src/bioperl/core/t>  perl -lne 'print $1 if /use +(Test[^\sO;]*);/' *t | 
sort | uniq -c | sort -nr
    147 Test
     97 Test::More


Feel free to add scripts and functionality into core/maintenance directory of 
bioperl-live if you want to keep track of things in modules and tests.

	-Heikki


On Wednesday 11 April 2007 16:56:22 Spiros Denaxas wrote:
> Yep! I have some rough stats I have at home, I will post them later on
> tonight. Roughly, if i remember correctly, 50% of the tests were still
> using Test, all the others were using Test::More.
>
> More to follow later on,
> Spiros
>
> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> > It should be easy enough to find those t/*.t files that have "use Test;"
> > or "require Test;" This should provide a list of files still needing to
> > be converted over to Test::More. As discussed previously, it may also be
> > useful to use Test::Exception to test for situations where
> > exceptions/warnings are thrown. If you add additional tests using this
> > module, you should add the Test::Exception module to t/lib/
> >
> > Good luck, and feel free to mail the list with questions/comments etc.
> >
> > Nath
> >
> > Chris Fields wrote:
> > > At the moment we do not have a comprehensive list up on the wiki.  I
> > > have been slowly working (alphabetically!) to switch them over, so
> > > any help would be appreciated.
> > >
> > > I have CC'd this to the main mail list for anyone else interested.
> > >
> > > chris
> > >
> > > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> > >> Hey guys,
> > >>
> > >> I noticed there's an open task regarding moving testing code to use
> > >> Test::More etc and that Chris and Nathan are already on to it. Is
> > >> there any kind of wiki page that you keep track of which modules you
> > >> are already working on? I am new to this and want to contribute,
> > >> having a fair amount of unit testing from work, but don't want to step
> > >> over other people's work and avoid duplication as well.
> > >> Any pointers where i could get started would be much appreciated :-)
> > >>
> > >> Thanks,
> > >> Spiros
> > >>
> > >> ps. apologies if this is not the correct list to post this, just
> > >> seemed the most intuitive choice.
> > >> _______________________________________________
> > >> Bioperl-guts-l mailing list
> > >> Bioperl-guts-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From stewarta at nmrc.navy.mil  Wed Apr 11 14:40:18 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Wed, 11 Apr 2007 14:40:18 -0400
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
Message-ID: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>

First of all, mucho kudos to those who revamped this module.  It  
works really nice.  I have a couple thoughts..

* The .predict file from Glimmer provides frame and score information  
which could be parsed and included in the generated feature prediction

* It'd be nice to include the orfID somewhere on the feature  
prediction..  maybe the seqID ? (these could be post-processed into  
locus_tags for those using Glimmer as a preliminary annotation tool)

* Options to set the source and primary tags to something other than  
the default (ie) Glimmer3.X and 'transcript'.  This could always be  
done post-Bio::Tools::Glimmer, though, of course.

* This section..

         elsif (
                # Glimmer 2.X prediction
                (/^\s+(\d+)\s+      # gene num
                 (\d+)\s+(\d+)\s+   # start, end
                 \[([\+\-])\d{1}\s+ # strand
                 /ox ) ||
                # Glimmer 3.X prediction
                (/\w+(\d+)\s+       # orf (numeric portion)
                 (\d+)\s+(\d+)\s+   # start, end
                 ([\+\-])\d{1}\s+   # strand
                /ox)) {
	    my ($genenum,$start,$end,$strand) =
		( $1,$2,$3,$4 );

...isn't picking up more than the last digit in the orf-number.  Not  
sure if that's intentional.  A sample of the feature output using - 
 >gff_string shows up as ...

test-pseudocontig       Glimmer_3.X     transcript      1018     
8       .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      1134     
1736    .       +       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      1832     
2596    .       +       .       Group GenePrediction_4
test-pseudocontig       Glimmer_3.X     transcript      2710     
3225    .       +       .       Group GenePrediction_5
test-pseudocontig       Glimmer_3.X     transcript      3246     
4016    .       +       .       Group GenePrediction_6
test-pseudocontig       Glimmer_3.X     transcript      4177     
5064    .       +       .       Group GenePrediction_7
test-pseudocontig       Glimmer_3.X     transcript      5083     
5673    .       +       .       Group GenePrediction_8
test-pseudocontig       Glimmer_3.X     transcript      6001     
7275    .       +       .       Group GenePrediction_9
test-pseudocontig       Glimmer_3.X     transcript      7530     
8081    .       +       .       Group GenePrediction_0
test-pseudocontig       Glimmer_3.X     transcript      8785     
8117    .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      9423     
8788    .       -       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      10088    
9549    .       -       .       Group GenePrediction_3

...which was parsed originally from...

orf00001     1018        8  -2     2.95
orf00002     1134     1736  +3     2.91
orf00004     1832     2596  +2     2.93
orf00005     2710     3225  +1     2.90
orf00006     3246     4016  +3     2.93
orf00007     4177     5064  +1     2.94
orf00008     5083     5673  +1     2.91
orf00009     6001     7275  +1     2.96
orf00010     7530     8081  +3     2.58
orf00011     8785     8117  -2     2.92
orf00012     9423     8788  -1     2.81
orf00013    10088     9549  -3     2.90

* It'd also be nice if you could somehow set the string that is  
placed in front of the orf-number in the line...

                  '-tag'         => { 'Group' => "GenePrediction_ 
$genenum"},

...seeing as how these tag/values can't seem to be changed manually  
anymore without getting into AnnotationCollection stuff, which is no  
longer a simple matter of changing a tag/value string.  (By the way,  
where can I find a list of AnnotationCollectionI compliant objects?)


Any thoughts on the suggestions?  (I don't mind taking a stab at  
incorporating them into the code.. I've never submitted anything to  
BioPerl before)


-Andrew


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From cjfields at uiuc.edu  Wed Apr 11 15:53:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 14:53:54 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
Message-ID: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>

I'm posting this to the mail list in case anyone has any ideas on  
what is going on...

I have noticed an odd (read: annoying) rash of spam on the wiki.   
Jason also ran some spam reversions, so maybe he can chime in.   
Essentially it looks like the responsible spambots 'correct' the wiki  
text and links, so that '+' is being removed and URI-encoded symbols  
in links are reverted to symbols.  Unfortunately the removal occurs  
in all text, so places where '+' is intended (for instance, raw text  
for showing example record formats) are also changed.  My guess is  
we'll need to block the IP address or add to the blacklist if possible.

Between Jason and I we have blocked ~9 spambots and counting.   
Couldn't find anything via Google yet...

chris


From torsten.seemann at infotech.monash.edu.au  Wed Apr 11 20:33:02 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 12 Apr 2007 10:33:02 +1000
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
Message-ID: <a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>

Andrew,

>                 # Glimmer 3.X prediction
>                 (/\w+(\d+)\s+       # orf (numeric portion)
> ...isn't picking up more than the last digit in the orf-number.  Not
> sure if that's intentional.  A sample of the feature output using -
>  >gff_string shows up as ...

I think that regexp should be \w+?(\d+)

ie. the \w+ should be non-greedy, otherwise it will swallow up all but
one of the following \d+ (as \d is a subset of \w)

I've CC:ed this to Mark Johnson who made the recent changes to this module.

Thanks for your feedback,

--Torsten Seemann


From spiros at lokku.com  Wed Apr 11 21:08:47 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 02:08:47 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
Message-ID: <bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>

Good idea Chris. Just got back home so will probably do it tomorrow
morning or so.

Spiros

On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> We should probably place something on the wiki to prevent overlaps
> (i.e. make sure no two devs are working on the same tests).  I
> planned on working on the G's last night but got bogged down.
>
> Spiros, if you haven't already go ahead and create a list on a wiki
> page for tracking.  We can lay claim to them by tagging with our sigs
> and cross them off once complete.
>
> chris
>
> On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
>
> > Yep! I have some rough stats I have at home, I will post them later on
> > tonight. Roughly, if i remember correctly, 50% of the tests were still
> > using Test, all the others were using Test::More.
> >
> > More to follow later on,
> > Spiros
> >
> > On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> >> It should be easy enough to find those t/*.t files that have "use
> >> Test;"
> >> or "require Test;" This should provide a list of files still
> >> needing to
> >> be converted over to Test::More. As discussed previously, it may
> >> also be
> >> useful to use Test::Exception to test for situations where
> >> exceptions/warnings are thrown. If you add additional tests using
> >> this
> >> module, you should add the Test::Exception module to t/lib/
> >>
> >> Good luck, and feel free to mail the list with questions/comments
> >> etc.
> >>
> >> Nath
> >>
> >>
> >> Chris Fields wrote:
> >> > At the moment we do not have a comprehensive list up on the
> >> wiki.  I
> >> > have been slowly working (alphabetically!) to switch them over, so
> >> > any help would be appreciated.
> >> >
> >> > I have CC'd this to the main mail list for anyone else interested.
> >> >
> >> > chris
> >> >
> >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> >> >
> >> >
> >> >> Hey guys,
> >> >>
> >> >> I noticed there's an open task regarding moving testing code to
> >> use
> >> >> Test::More etc and that Chris and Nathan are already on to it. Is
> >> >> there any kind of wiki page that you keep track of which
> >> modules you
> >> >> are already working on? I am new to this and want to contribute,
> >> >> having a fair amount of unit testing from work, but don't want
> >> to step
> >> >> over other people's work and avoid duplication as well.
> >> >> Any pointers where i could get started would be much
> >> appreciated :-)
> >> >>
> >> >> Thanks,
> >> >> Spiros
> >> >>
> >> >> ps. apologies if this is not the correct list to post this, just
> >> >> seemed the most intuitive choice.
> >> >> _______________________________________________
> >> >> Bioperl-guts-l mailing list
> >> >> Bioperl-guts-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> >> >>
> >> >
> >> > Christopher Fields
> >> > Postdoctoral Researcher
> >> > Lab of Dr. Robert Switzer
> >> > Dept of Biochemistry
> >> > University of Illinois Urbana-Champaign
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From Kevin.M.Brown at asu.edu  Thu Apr 12 11:24:15 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 12 Apr 2007 08:24:15 -0700
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <461D0407.8050105@watson.wustl.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
	<461D0407.8050105@watson.wustl.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>

> >> What is going on here? Can anyone remember doing this?

> >> Please can I ask what is the purpose of the line @pos = 
> sort @pos; in 
> >> the select_noncont subroutine of SimpleAlign.pm.
> >>
> >>  
> >>
> >> In previous versions this line was not present and I could use the 
> >> function to reorder the alignment e.g in an alignment with 5 
> >> sequences I could reorder it to put the second sequence last using 
> >> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but 
> >> even if the idea is to sort numerically this dos not work 
> since the 
> >> sort function as is will put 10 before 2, so that
> >> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the 
> sequences in
> >> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> >>     
> >
> > Not sure why 10 would come before 2 since perl would interpret that 
> > list as a series of integers even if they were entered as 
> strings and 
> > do the sort.
> >
> >   
> Because, according to the documentation for Perl's sort 
> function, sorting occurs "in standard string comparison 
> order" unless the user specifies another comparison function to use.

OK, guess I never realized that since I've used just "sort @array" and
gotten things back how I expected them to be.


From bix at sendu.me.uk  Thu Apr 12 11:58:53 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 12 Apr 2007 16:58:53 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>	<461D0407.8050105@watson.wustl.edu>
	<1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>
Message-ID: <461E573D.1060906@sendu.me.uk>

Kevin Brown wrote:
>> Because, according to the documentation for Perl's sort 
>> function, sorting occurs "in standard string comparison 
>> order" unless the user specifies another comparison function to use.
> 
> OK, guess I never realized that since I've used just "sort @array" and
> gotten things back how I expected them to be.

If you were sorting numbers, getting the order wrong either didn't 
matter or you didn't notice the problem. Not realizing sort won't do 
what you expect in this case is a common source of bugs.

It might be worth it for you (and anyone else) to go through your old 
code to make sure you haven't been bitten.


From johnsonm at gmail.com  Thu Apr 12 13:26:33 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 12:26:33 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
Message-ID: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>

    I'd call that a buggy regexp.  Sounds like a good (but minimal)
fix.  Torsten, I don't have cvs write access, I think you do, can you
fix that up?  Andrew, can you file that as a bug:

http://bugzilla.bioperl.org/

    Everything else sounds like enhancements.  I'm not necessarily
opposed, but a little discussion is probably in order before putting
any tickets in for any of that.  Also, I'm not sure when I'll be able
to spare some time to work on the module.  It was easy to justify
spending time from my day job getting the module up to where is now,
as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
quite well for my purposes.  Again, I'm not opposed to further
enhancements, but If I'm going to work on any of them, they'll have to
fit into everything else I'm doing and it could be a while.  However,
there's no reason somebody else can't do what I did.  Discuss the
changes here, work out a plan, implement it, send along the diff(s)
attached to a bug in bugzilla.  Next thing you know, your changes are
in cvs.  8)

On 4/11/07, Torsten Seemann <torsten.seemann at infotech.monash.edu.au> wrote:
> Andrew,
>
> >                 # Glimmer 3.X prediction
> >                 (/\w+(\d+)\s+       # orf (numeric portion)
> > ...isn't picking up more than the last digit in the orf-number.  Not
> > sure if that's intentional.  A sample of the feature output using -
> >  >gff_string shows up as ...
>
> I think that regexp should be \w+?(\d+)
>
> ie. the \w+ should be non-greedy, otherwise it will swallow up all but
> one of the following \d+ (as \d is a subset of \w)
>
> I've CC:ed this to Mark Johnson who made the recent changes to this module.
>
> Thanks for your feedback,
>
> --Torsten Seemann


From cjfields at uiuc.edu  Thu Apr 12 14:11:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 13:11:33 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
Message-ID: <7314C1CD-8AD5-4400-A495-6C8124833D0D@uiuc.edu>

Agreed; anyone can suggest code enhancements and bug fixes and submit  
patches for these:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

You'll see a long list of unimplemented enhancement requests in  
Bugzilla.  These are the ones where no patch is given; you'll find  
that very few are willing to go through the effort to work on them  
unless there is something in it for them!  Enhancement requests that  
come with patches and tests tend to get committed fairly rapidly  
(sometimes within hours).

chris

On Apr 12, 2007, at 12:26 PM, Mark Johnson wrote:

>     I'd call that a buggy regexp.  Sounds like a good (but minimal)
> fix.  Torsten, I don't have cvs write access, I think you do, can you
> fix that up?  Andrew, can you file that as a bug:
>
> http://bugzilla.bioperl.org/
>
>     Everything else sounds like enhancements.  I'm not necessarily
> opposed, but a little discussion is probably in order before putting
> any tickets in for any of that.  Also, I'm not sure when I'll be able
> to spare some time to work on the module.  It was easy to justify
> spending time from my day job getting the module up to where is now,
> as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
> quite well for my purposes.  Again, I'm not opposed to further
> enhancements, but If I'm going to work on any of them, they'll have to
> fit into everything else I'm doing and it could be a while.  However,
> there's no reason somebody else can't do what I did.  Discuss the
> changes here, work out a plan, implement it, send along the diff(s)
> attached to a bug in bugzilla.  Next thing you know, your changes are
> in cvs.  8)
>
> On 4/11/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>> Andrew,
>>
>>>                 # Glimmer 3.X prediction
>>>                 (/\w+(\d+)\s+       # orf (numeric portion)
>>> ...isn't picking up more than the last digit in the orf-number.  Not
>>> sure if that's intentional.  A sample of the feature output using -
>>>> gff_string shows up as ...
>>
>> I think that regexp should be \w+?(\d+)
>>
>> ie. the \w+ should be non-greedy, otherwise it will swallow up all  
>> but
>> one of the following \d+ (as \d is a subset of \w)
>>
>> I've CC:ed this to Mark Johnson who made the recent changes to  
>> this module.
>>
>> Thanks for your feedback,
>>
>> --Torsten Seemann


From stewarta at nmrc.navy.mil  Thu Apr 12 14:35:00 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 12 Apr 2007 14:35:00 -0400
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
Message-ID: <DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>

I'm willing to do the coding and testing, I'm just not familiar with  
the submission process yet (I see there's a HOWTO now, nice).   Let's  
discuss first.

So to reiterate, I'm suggesting that the module also parse out the  
frame and score information from Glimmer output.  I take back my  
suggestion of overriding the source / primary tags through the module  
as this can easily be done post-parser.  Other annotations can also  
be edited post-parser easily enough.

Reasons for:  Parsing everything out of the output and letting the  
user determine what's useful or not.

Reasons against:  Extra information may not be relevant to the format  
of the generated feature type?


-Andrew


On Apr 12, 2007, at 1:26 PM, Mark Johnson wrote:

>    I'd call that a buggy regexp.  Sounds like a good (but minimal)
> fix.  Torsten, I don't have cvs write access, I think you do, can you
> fix that up?  Andrew, can you file that as a bug:
>
> http://bugzilla.bioperl.org/
>
>    Everything else sounds like enhancements.  I'm not necessarily
> opposed, but a little discussion is probably in order before putting
> any tickets in for any of that.  Also, I'm not sure when I'll be able
> to spare some time to work on the module.  It was easy to justify
> spending time from my day job getting the module up to where is now,
> as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
> quite well for my purposes.  Again, I'm not opposed to further
> enhancements, but If I'm going to work on any of them, they'll have to
> fit into everything else I'm doing and it could be a while.  However,
> there's no reason somebody else can't do what I did.  Discuss the
> changes here, work out a plan, implement it, send along the diff(s)
> attached to a bug in bugzilla.  Next thing you know, your changes are
> in cvs.  8)
>
> On 4/11/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>> Andrew,
>>
>> >                 # Glimmer 3.X prediction
>> >                 (/\w+(\d+)\s+       # orf (numeric portion)
>> > ...isn't picking up more than the last digit in the orf-number.   
>> Not
>> > sure if that's intentional.  A sample of the feature output using -
>> >  >gff_string shows up as ...
>>
>> I think that regexp should be \w+?(\d+)
>>
>> ie. the \w+ should be non-greedy, otherwise it will swallow up all  
>> but
>> one of the following \d+ (as \d is a subset of \w)
>>
>> I've CC:ed this to Mark Johnson who made the recent changes to  
>> this module.
>>
>> Thanks for your feedback,
>>
>> --Torsten Seemann


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From johnsonm at gmail.com  Thu Apr 12 15:11:18 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 14:11:18 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
	<DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>
Message-ID: <ebf5eb170704121211s19062ac8hb9b510d440fcfe44@mail.gmail.com>

> So to reiterate, I'm suggesting that the module also parse out the frame and
> score information from Glimmer output.  I take back my suggestion of
> overriding the source / primary tags through the module as this can easily
> be done post-parser.  Other annotations can also be edited post-parser
> easily enough.

The reason the source tags are what they are for my addition(s) is
that the output from glimmer2/glimmer3 does not include a version
string.  You can figure out the major version from the output
formatting, but that's about it.  Also, being my first significant
contribution, I wasn't out to break new ground.  I did what some of
the other gene predictors seem to do, and what the existing code
already did.  Maybe there should be a method to pass in the exact
version if you know it.  Further than that, I think the Glimmer module
should stay consistent with what the other gene predictors do.  No
reason, though, that they couldn't *all* be enhanced similarly, if you
want to be able to further control the source tag.  8)

Part of the reason I didn't parse out the frame / score info for
either glimmer2 or glimmer3 was that I didn't need it.  The other part
being that my regexp kung-fu is nothing special.  This sounds like a
no-brainer to me.  Extend the regexps to capture it and tag it (and
the tests).

As far as the ORFs go, I guess you could just use
Bio::SeqFeature::Generic to represent them.  I haven't been keeping
track of the relevant feature/annotation interfaces, but maybe there
should be some kind of relation between the ORFs and predictions?

The glimmer3 detail file is a little trickier.  The least disruptive
thing to do, interface wise, might be to specify that as a seperate
input via an argument to the constructor.  Then you've got *two* input
files, and are going to have to override the automagic stuff that
expects one input file and takes care of it all.

As far as process, I just got on the list and started pestering
people, and they haven't thrown me off yet.  8)  I'm afraid that
you're going to find that while people are happy to discuss
implementation details, when it comes time to fire up the editor,
you're usually on your own, if it's an enhancement.

I'd love to work on Bioperl more, but so far, it's only been for what
I need for my job.


From spiros at lokku.com  Thu Apr 12 15:16:39 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 20:16:39 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
	<bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
Message-ID: <bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>

Hey guys,

I have added a link as per Chris's nice suggestion for keeping track
on whats going on regarding the migration:
http://www.bioperl.org/wiki/TestMoreProgress
There's also a link to this page from the project priority list.
However, adding our signature for each module etc , in my humble
opinion, seems tedious. May i suggest we just split up the list in
'starting letter sections' and each one does his part.
I volunteer to work on all tests starting with the letter R down to
the bottom of the list.

Let me know if this makes sense or not. I will work on
removing/flagging all the files that have already been migrated on
that list as well.

-spiros

On 4/12/07, Spiros Denaxas <spiros at lokku.com> wrote:
> Good idea Chris. Just got back home so will probably do it tomorrow
> morning or so.
>
> Spiros
>
> On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> > We should probably place something on the wiki to prevent overlaps
> > (i.e. make sure no two devs are working on the same tests).  I
> > planned on working on the G's last night but got bogged down.
> >
> > Spiros, if you haven't already go ahead and create a list on a wiki
> > page for tracking.  We can lay claim to them by tagging with our sigs
> > and cross them off once complete.
> >
> > chris
> >
> > On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
> >
> > > Yep! I have some rough stats I have at home, I will post them later on
> > > tonight. Roughly, if i remember correctly, 50% of the tests were still
> > > using Test, all the others were using Test::More.
> > >
> > > More to follow later on,
> > > Spiros
> > >
> > > On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> > >> It should be easy enough to find those t/*.t files that have "use
> > >> Test;"
> > >> or "require Test;" This should provide a list of files still
> > >> needing to
> > >> be converted over to Test::More. As discussed previously, it may
> > >> also be
> > >> useful to use Test::Exception to test for situations where
> > >> exceptions/warnings are thrown. If you add additional tests using
> > >> this
> > >> module, you should add the Test::Exception module to t/lib/
> > >>
> > >> Good luck, and feel free to mail the list with questions/comments
> > >> etc.
> > >>
> > >> Nath
> > >>
> > >>
> > >> Chris Fields wrote:
> > >> > At the moment we do not have a comprehensive list up on the
> > >> wiki.  I
> > >> > have been slowly working (alphabetically!) to switch them over, so
> > >> > any help would be appreciated.
> > >> >
> > >> > I have CC'd this to the main mail list for anyone else interested.
> > >> >
> > >> > chris
> > >> >
> > >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> > >> >
> > >> >
> > >> >> Hey guys,
> > >> >>
> > >> >> I noticed there's an open task regarding moving testing code to
> > >> use
> > >> >> Test::More etc and that Chris and Nathan are already on to it. Is
> > >> >> there any kind of wiki page that you keep track of which
> > >> modules you
> > >> >> are already working on? I am new to this and want to contribute,
> > >> >> having a fair amount of unit testing from work, but don't want
> > >> to step
> > >> >> over other people's work and avoid duplication as well.
> > >> >> Any pointers where i could get started would be much
> > >> appreciated :-)
> > >> >>
> > >> >> Thanks,
> > >> >> Spiros
> > >> >>
> > >> >> ps. apologies if this is not the correct list to post this, just
> > >> >> seemed the most intuitive choice.
> > >> >> _______________________________________________
> > >> >> Bioperl-guts-l mailing list
> > >> >> Bioperl-guts-l at lists.open-bio.org
> > >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> > >> >>
> > >> >
> > >> > Christopher Fields
> > >> > Postdoctoral Researcher
> > >> > Lab of Dr. Robert Switzer
> > >> > Dept of Biochemistry
> > >> > University of Illinois Urbana-Champaign
> > >> >
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > Bioperl-l mailing list
> > >> > Bioperl-l at lists.open-bio.org
> > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >> >
> > >>
> > >>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> >
>


From marian.thieme at lycos.de  Wed Apr 11 12:02:14 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Apr 2007 16:02:14 +0000
Subject: [Bioperl-l] Affys ReseqChip
Message-ID: <188661178017404@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/bc2eb3aa/attachment-0002.html>

From johnsonm at gmail.com  Thu Apr 12 15:35:35 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 14:35:35 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
Message-ID: <ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>

Looks like MediaWiki has some built in functionality:

    http://meta.wikimedia.org/wiki/Anti-spam_Features
    http://www.mediawiki.org/wiki/Extension:ConfirmEdit

I'm not sure I'd call what they're doing spam, more like vandalism,
but either way, I don't see the point (though I only looked at a
couple examples via Recent Changes).

If they're indeed bots, maybe it's time to enable Captchas? Depending
on who they are and what their goals are, that may get rid of them
completely or just slow them down.

On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> I'm posting this to the mail list in case anyone has any ideas on
> what is going on...
>
> I have noticed an odd (read: annoying) rash of spam on the wiki.
> Jason also ran some spam reversions, so maybe he can chime in.
> Essentially it looks like the responsible spambots 'correct' the wiki
> text and links, so that '+' is being removed and URI-encoded symbols
> in links are reverted to symbols.  Unfortunately the removal occurs
> in all text, so places where '+' is intended (for instance, raw text
> for showing example record formats) are also changed.  My guess is
> we'll need to block the IP address or add to the blacklist if possible.
>
> Between Jason and I we have blocked ~9 spambots and counting.
> Couldn't find anything via Google yet...
>
> chris


From cjfields at uiuc.edu  Thu Apr 12 15:44:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 14:44:28 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
Message-ID: <BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>


On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:

> Looks like MediaWiki has some built in functionality:
>
>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>
> I'm not sure I'd call what they're doing spam, more like vandalism,
> but either way, I don't see the point (though I only looked at a
> couple examples via Recent Changes).
>
> If they're indeed bots, maybe it's time to enable Captchas? Depending
> on who they are and what their goals are, that may get rid of them
> completely or just slow them down.

Already done; Mauricio installed ConfirmEdit yesterday after a bit of  
off-list discussion (thanks again Mauricio!).

If you create a new account you'll encounter a simple captcha (it  
isn't configured for each edit yet).  We may implement confirmations  
per edit or install picture captchas at a later point, dep. on how  
well the current system works.

We may start granting anyone interested in maintaining the wiki sysop  
privs which makes handling spam easier.  If so we'll probably  
announce something along those lines here first.

chris


From cjfields at uiuc.edu  Thu Apr 12 15:48:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 14:48:41 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
	<bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
	<bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>
Message-ID: <3B4500DD-CAB6-4FD6-ABF9-A0160981F7E3@uiuc.edu>

Sounds good!  I'll finish up the P's (halfway through now...) and  
move on to other things; got plenty to do, believe me!

Appreciate all the help, Spiros!

chris

On Apr 12, 2007, at 2:16 PM, Spiros Denaxas wrote:

> Hey guys,
>
> I have added a link as per Chris's nice suggestion for keeping track
> on whats going on regarding the migration:
> http://www.bioperl.org/wiki/TestMoreProgress
> There's also a link to this page from the project priority list.
> However, adding our signature for each module etc , in my humble
> opinion, seems tedious. May i suggest we just split up the list in
> 'starting letter sections' and each one does his part.
> I volunteer to work on all tests starting with the letter R down to
> the bottom of the list.
>
> Let me know if this makes sense or not. I will work on
> removing/flagging all the files that have already been migrated on
> that list as well.
>
> -spiros
>
> On 4/12/07, Spiros Denaxas <spiros at lokku.com> wrote:
>> Good idea Chris. Just got back home so will probably do it tomorrow
>> morning or so.
>>
>> Spiros
>>
>> On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> We should probably place something on the wiki to prevent overlaps
>>> (i.e. make sure no two devs are working on the same tests).  I
>>> planned on working on the G's last night but got bogged down.
>>>
>>> Spiros, if you haven't already go ahead and create a list on a wiki
>>> page for tracking.  We can lay claim to them by tagging with our  
>>> sigs
>>> and cross them off once complete.
>>>
>>> chris
>>>
>>> On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
>>>
>>>> Yep! I have some rough stats I have at home, I will post them  
>>>> later on
>>>> tonight. Roughly, if i remember correctly, 50% of the tests were  
>>>> still
>>>> using Test, all the others were using Test::More.
>>>>
>>>> More to follow later on,
>>>> Spiros
>>>>
>>>> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
>>>>> It should be easy enough to find those t/*.t files that have "use
>>>>> Test;"
>>>>> or "require Test;" This should provide a list of files still
>>>>> needing to
>>>>> be converted over to Test::More. As discussed previously, it may
>>>>> also be
>>>>> useful to use Test::Exception to test for situations where
>>>>> exceptions/warnings are thrown. If you add additional tests using
>>>>> this
>>>>> module, you should add the Test::Exception module to t/lib/
>>>>>
>>>>> Good luck, and feel free to mail the list with questions/comments
>>>>> etc.
>>>>>
>>>>> Nath
>>>>>
>>>>>
>>>>> Chris Fields wrote:
>>>>>> At the moment we do not have a comprehensive list up on the
>>>>> wiki.  I
>>>>>> have been slowly working (alphabetically!) to switch them  
>>>>>> over, so
>>>>>> any help would be appreciated.
>>>>>>
>>>>>> I have CC'd this to the main mail list for anyone else  
>>>>>> interested.
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>>>>>>
>>>>>>
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> I noticed there's an open task regarding moving testing code to
>>>>> use
>>>>>>> Test::More etc and that Chris and Nathan are already on to  
>>>>>>> it. Is
>>>>>>> there any kind of wiki page that you keep track of which
>>>>> modules you
>>>>>>> are already working on? I am new to this and want to contribute,
>>>>>>> having a fair amount of unit testing from work, but don't want
>>>>> to step
>>>>>>> over other people's work and avoid duplication as well.
>>>>>>> Any pointers where i could get started would be much
>>>>> appreciated :-)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Spiros
>>>>>>>
>>>>>>> ps. apologies if this is not the correct list to post this, just
>>>>>>> seemed the most intuitive choice.
>>>>>>> _______________________________________________
>>>>>>> Bioperl-guts-l mailing list
>>>>>>> Bioperl-guts-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>>>>>>>
>>>>>>
>>>>>> Christopher Fields
>>>>>> Postdoctoral Researcher
>>>>>> Lab of Dr. Robert Switzer
>>>>>> Dept of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>
>>>>>
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From spiros at lokku.com  Thu Apr 12 16:19:18 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 21:19:18 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
Message-ID: <bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>

Nice idea, i saw it a bit before. However, any chance of implementing
white lists with regular and/or trusted users to skip it each time we
add something to the wiki ?

Spiros

On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>
> > Looks like MediaWiki has some built in functionality:
> >
> >    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >
> > I'm not sure I'd call what they're doing spam, more like vandalism,
> > but either way, I don't see the point (though I only looked at a
> > couple examples via Recent Changes).
> >
> > If they're indeed bots, maybe it's time to enable Captchas? Depending
> > on who they are and what their goals are, that may get rid of them
> > completely or just slow them down.
>
> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> off-list discussion (thanks again Mauricio!).
>
> If you create a new account you'll encounter a simple captcha (it
> isn't configured for each edit yet).  We may implement confirmations
> per edit or install picture captchas at a later point, dep. on how
> well the current system works.
>
> We may start granting anyone interested in maintaining the wiki sysop
> privs which makes handling spam easier.  If so we'll probably
> announce something along those lines here first.
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Jonathan_Epstein at nih.gov  Thu Apr 12 16:22:40 2007
From: Jonathan_Epstein at nih.gov (Jonathan Epstein)
Date: Thu, 12 Apr 2007 16:22:40 -0400
Subject: [Bioperl-l] Affys ReseqChip
In-Reply-To: <188661178017404@lycos-europe.com>
References: <188661178017404@lycos-europe.com>
Message-ID: <6.2.3.4.2.20070412161407.04a38b60@mail.nih.gov>

This sounds great to me.

Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this.  But I suggest that you proceed full-speed-ahead, and we can sort this out in the future.

Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach.

Jonathan


At 12:02 PM 4/11/2007, Marian Thieme wrote:
>Hi,
>
>I am working on a piece of software, which is aimed to analyse the outcome of Affymetrix DNA Resequencing Arrays. (In particular Mitochip V2). The main goal of the software is to take into account for the redundant fragments. The software is able to align the redundant fragments to the entire sequence and in particular to call bases which arent called by the entire sequence and to detect insertions/deletion, depending on the design of the redundant frags.
>
>I would be glad to distribute the software to the bioperl package or otherwise, if anybody is interested I can give the code and/or further develop some features.
>
>Marian

Jonathan Epstein                                Jonathan_Epstein at nih.gov
Head, Unit on Biologic Computation              (301)402-4563
Office of the Scientific Director               Bldg 31, Room 2A47
Nat. Inst. of Child Health & Human Development  31 Center Drive
National Institutes of Health                   Bethesda, MD 20892  


From spiros at lokku.com  Thu Apr 12 17:35:43 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 22:35:43 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EA4FA.8010504@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
	<461EA4FA.8010504@campus.iztacala.unam.mx>
Message-ID: <bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>

Mauricio, thanks for your response. I actually edited a page several
times today and i got the captcha. More specifically, it was displayed
because "the page i edited contained external links" which is true
since i included a {{CPAN}} link.

Spiros

On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> The chance of having white lists exists but as far as I tested last
> night, the captcha is working only at the Create Account pages, not at
> the time of applying changes to wiki content (I tested as a regular user
> and not as a wiki admin).
>
> The idea at this moment is only to block automated methods for account
> creation (bots). Registered users who haven't been blocked and/or have
> confirmed their email wouldn't be bothered while adding/changing wiki
> content.
>
> Regards,
> Mauricio.
>
> Spiros Denaxas wrote:
> > Nice idea, i saw it a bit before. However, any chance of implementing
> > white lists with regular and/or trusted users to skip it each time we
> > add something to the wiki ?
> >
> > Spiros
> >
> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
> >>
> >>> Looks like MediaWiki has some built in functionality:
> >>>
> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >>>
> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
> >>> but either way, I don't see the point (though I only looked at a
> >>> couple examples via Recent Changes).
> >>>
> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
> >>> on who they are and what their goals are, that may get rid of them
> >>> completely or just slow them down.
> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> >> off-list discussion (thanks again Mauricio!).
> >>
> >> If you create a new account you'll encounter a simple captcha (it
> >> isn't configured for each edit yet).  We may implement confirmations
> >> per edit or install picture captchas at a later point, dep. on how
> >> well the current system works.
> >>
> >> We may start granting anyone interested in maintaining the wiki sysop
> >> privs which makes handling spam easier.  If so we'll probably
> >> announce something along those lines here first.
> >>
> >> chris
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>


From arareko at campus.iztacala.unam.mx  Thu Apr 12 17:30:34 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 12 Apr 2007 16:30:34 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
Message-ID: <461EA4FA.8010504@campus.iztacala.unam.mx>

The chance of having white lists exists but as far as I tested last 
night, the captcha is working only at the Create Account pages, not at 
the time of applying changes to wiki content (I tested as a regular user 
and not as a wiki admin).

The idea at this moment is only to block automated methods for account 
creation (bots). Registered users who haven't been blocked and/or have 
confirmed their email wouldn't be bothered while adding/changing wiki 
content.

Regards,
Mauricio.

Spiros Denaxas wrote:
> Nice idea, i saw it a bit before. However, any chance of implementing
> white lists with regular and/or trusted users to skip it each time we
> add something to the wiki ?
> 
> Spiros
> 
> On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>>
>>> Looks like MediaWiki has some built in functionality:
>>>
>>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>>>
>>> I'm not sure I'd call what they're doing spam, more like vandalism,
>>> but either way, I don't see the point (though I only looked at a
>>> couple examples via Recent Changes).
>>>
>>> If they're indeed bots, maybe it's time to enable Captchas? Depending
>>> on who they are and what their goals are, that may get rid of them
>>> completely or just slow them down.
>> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
>> off-list discussion (thanks again Mauricio!).
>>
>> If you create a new account you'll encounter a simple captcha (it
>> isn't configured for each edit yet).  We may implement confirmations
>> per edit or install picture captchas at a later point, dep. on how
>> well the current system works.
>>
>> We may start granting anyone interested in maintaining the wiki sysop
>> privs which makes handling spam easier.  If so we'll probably
>> announce something along those lines here first.
>>
>> chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From arareko at campus.iztacala.unam.mx  Thu Apr 12 17:53:51 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 12 Apr 2007 16:53:51 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>	
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>	
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
Message-ID: <461EAA6F.1090805@campus.iztacala.unam.mx>

I've reconfigured the extension to display captchas exclusively for 
account creation and disabled it when adding URLs in pages. Don't know 
why this didn't happened to me while testing last night...

Please try do it again to see if the change works. Thanks for pointing 
this out Spiros :)

Mauricio.

Spiros Denaxas wrote:
> Mauricio, thanks for your response. I actually edited a page several
> times today and i got the captcha. More specifically, it was displayed
> because "the page i edited contained external links" which is true
> since i included a {{CPAN}} link.
> 
> Spiros
> 
> On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> 
> wrote:
>> The chance of having white lists exists but as far as I tested last
>> night, the captcha is working only at the Create Account pages, not at
>> the time of applying changes to wiki content (I tested as a regular user
>> and not as a wiki admin).
>>
>> The idea at this moment is only to block automated methods for account
>> creation (bots). Registered users who haven't been blocked and/or have
>> confirmed their email wouldn't be bothered while adding/changing wiki
>> content.
>>
>> Regards,
>> Mauricio.
>>
>> Spiros Denaxas wrote:
>> > Nice idea, i saw it a bit before. However, any chance of implementing
>> > white lists with regular and/or trusted users to skip it each time we
>> > add something to the wiki ?
>> >
>> > Spiros
>> >
>> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>> >>
>> >>> Looks like MediaWiki has some built in functionality:
>> >>>
>> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>> >>>
>> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
>> >>> but either way, I don't see the point (though I only looked at a
>> >>> couple examples via Recent Changes).
>> >>>
>> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
>> >>> on who they are and what their goals are, that may get rid of them
>> >>> completely or just slow them down.
>> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
>> >> off-list discussion (thanks again Mauricio!).
>> >>
>> >> If you create a new account you'll encounter a simple captcha (it
>> >> isn't configured for each edit yet).  We may implement confirmations
>> >> per edit or install picture captchas at a later point, dep. on how
>> >> well the current system works.
>> >>
>> >> We may start granting anyone interested in maintaining the wiki sysop
>> >> privs which makes handling spam easier.  If so we'll probably
>> >> announce something along those lines here first.
>> >>
>> >> chris
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>> -- 
>> MAURICIO HERRERA CUADRA
>> arareko at campus.iztacala.unam.mx
>> Laboratorio de Gen?tica
>> Unidad de Morfofisiolog?a y Funci?n
>> Facultad de Estudios Superiores Iztacala, UNAM
>>
>>
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From spiros at lokku.com  Thu Apr 12 18:11:46 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 23:11:46 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
	<461EAA6F.1090805@campus.iztacala.unam.mx>
Message-ID: <bba689ec0704121511y135f0da0j26d520a11dd3ffa1@mail.gmail.com>

You're welcome Mauricio. Its all cool now, works without the captcha
for internal edits. Thanks for changing it over :-)

-spiros

On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> I've reconfigured the extension to display captchas exclusively for
> account creation and disabled it when adding URLs in pages. Don't know
> why this didn't happened to me while testing last night...
>
> Please try do it again to see if the change works. Thanks for pointing
> this out Spiros :)
>
> Mauricio.
>
> Spiros Denaxas wrote:
> > Mauricio, thanks for your response. I actually edited a page several
> > times today and i got the captcha. More specifically, it was displayed
> > because "the page i edited contained external links" which is true
> > since i included a {{CPAN}} link.
> >
> > Spiros
> >
> > On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx>
> > wrote:
> >> The chance of having white lists exists but as far as I tested last
> >> night, the captcha is working only at the Create Account pages, not at
> >> the time of applying changes to wiki content (I tested as a regular user
> >> and not as a wiki admin).
> >>
> >> The idea at this moment is only to block automated methods for account
> >> creation (bots). Registered users who haven't been blocked and/or have
> >> confirmed their email wouldn't be bothered while adding/changing wiki
> >> content.
> >>
> >> Regards,
> >> Mauricio.
> >>
> >> Spiros Denaxas wrote:
> >> > Nice idea, i saw it a bit before. However, any chance of implementing
> >> > white lists with regular and/or trusted users to skip it each time we
> >> > add something to the wiki ?
> >> >
> >> > Spiros
> >> >
> >> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
> >> >>
> >> >>> Looks like MediaWiki has some built in functionality:
> >> >>>
> >> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >> >>>
> >> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
> >> >>> but either way, I don't see the point (though I only looked at a
> >> >>> couple examples via Recent Changes).
> >> >>>
> >> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
> >> >>> on who they are and what their goals are, that may get rid of them
> >> >>> completely or just slow them down.
> >> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> >> >> off-list discussion (thanks again Mauricio!).
> >> >>
> >> >> If you create a new account you'll encounter a simple captcha (it
> >> >> isn't configured for each edit yet).  We may implement confirmations
> >> >> per edit or install picture captchas at a later point, dep. on how
> >> >> well the current system works.
> >> >>
> >> >> We may start granting anyone interested in maintaining the wiki sysop
> >> >> privs which makes handling spam easier.  If so we'll probably
> >> >> announce something along those lines here first.
> >> >>
> >> >> chris
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Bioperl-l mailing list
> >> >> Bioperl-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >>
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >> --
> >> MAURICIO HERRERA CUADRA
> >> arareko at campus.iztacala.unam.mx
> >> Laboratorio de Gen?tica
> >> Unidad de Morfofisiolog?a y Funci?n
> >> Facultad de Estudios Superiores Iztacala, UNAM
> >>
> >>
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>


From cjfields at uiuc.edu  Thu Apr 12 18:02:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 17:02:51 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>	
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>	
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
	<461EAA6F.1090805@campus.iztacala.unam.mx>
Message-ID: <E1139262-84C3-4282-8E9D-643BF91A3656@uiuc.edu>

You disabled yourself as sysop last night, IIRC.  Don't know; could  
be what Spiros suggested, eg. adding external links trips it.

chris

On Apr 12, 2007, at 4:53 PM, Mauricio Herrera Cuadra wrote:

> I've reconfigured the extension to display captchas exclusively for  
> account creation and disabled it when adding URLs in pages. Don't  
> know why this didn't happened to me while testing last night...
>
> Please try do it again to see if the change works. Thanks for  
> pointing this out Spiros :)
>
> Mauricio.
>
> Spiros Denaxas wrote:
>> Mauricio, thanks for your response. I actually edited a page several
>> times today and i got the captcha. More specifically, it was  
>> displayed
>> because "the page i edited contained external links" which is true
>> since i included a {{CPAN}} link.
>> Spiros
>> On 4/12/07, Mauricio Herrera Cuadra  
>> <arareko at campus.iztacala.unam.mx> wrote:
>>> The chance of having white lists exists but as far as I tested last
>>> night, the captcha is working only at the Create Account pages,  
>>> not at
>>> the time of applying changes to wiki content (I tested as a  
>>> regular user
>>> and not as a wiki admin).
>>>
>>> The idea at this moment is only to block automated methods for  
>>> account
>>> creation (bots). Registered users who haven't been blocked and/or  
>>> have
>>> confirmed their email wouldn't be bothered while adding/changing  
>>> wiki
>>> content.
>>>
>>> Regards,
>>> Mauricio.
>>>
>>> Spiros Denaxas wrote:
>>> > Nice idea, i saw it a bit before. However, any chance of  
>>> implementing
>>> > white lists with regular and/or trusted users to skip it each  
>>> time we
>>> > add something to the wiki ?
>>> >
>>> > Spiros
>>> >
>>> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>>> >>
>>> >>> Looks like MediaWiki has some built in functionality:
>>> >>>
>>> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>>> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>>> >>>
>>> >>> I'm not sure I'd call what they're doing spam, more like  
>>> vandalism,
>>> >>> but either way, I don't see the point (though I only looked at a
>>> >>> couple examples via Recent Changes).
>>> >>>
>>> >>> If they're indeed bots, maybe it's time to enable Captchas?  
>>> Depending
>>> >>> on who they are and what their goals are, that may get rid of  
>>> them
>>> >>> completely or just slow them down.
>>> >> Already done; Mauricio installed ConfirmEdit yesterday after a  
>>> bit of
>>> >> off-list discussion (thanks again Mauricio!).
>>> >>
>>> >> If you create a new account you'll encounter a simple captcha (it
>>> >> isn't configured for each edit yet).  We may implement  
>>> confirmations
>>> >> per edit or install picture captchas at a later point, dep. on  
>>> how
>>> >> well the current system works.
>>> >>
>>> >> We may start granting anyone interested in maintaining the  
>>> wiki sysop
>>> >> privs which makes handling spam easier.  If so we'll probably
>>> >> announce something along those lines here first.
>>> >>
>>> >> chris
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Bioperl-l mailing list
>>> >> Bioperl-l at lists.open-bio.org
>>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >>
>>> > _______________________________________________
>>> > Bioperl-l mailing list
>>> > Bioperl-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >
>>>
>>> -- 
>>> MAURICIO HERRERA CUADRA
>>> arareko at campus.iztacala.unam.mx
>>> Laboratorio de Gen?tica
>>> Unidad de Morfofisiolog?a y Funci?n
>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>
>>>
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Apr 13 04:30:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 13 Apr 2007 09:30:50 +0100
Subject: [Bioperl-l] GenericHit->start/end needs tiled hsps?
Message-ID: <461F3FBA.2010101@sendu.me.uk>

Hi all,

I want to double-check my thinking regarding 
Bio::Search::Hit::GenericHit->start() and end(). Right now the docs 
claim that hsps of the hit object must be tiled before the answer can be 
produced. The code is implemented in that way 
(Bio::Search::SearchUtils::tile_hsps($self)).

Yet as far as I can see, all you need to do is loop through all hsps and 
pick out the smallest start and largest end respectively in terms of 
subject and query.

This comes up because I have a blast report where a single hit contains 
over 80000 hsps and the tiling takes over an hour (I gave up on it, 
don't know how long it really takes). The simple loop through hsps takes 
seconds or less.

Now in this situation the answer isn't especially useful (to me). An 
alternative way of fixing the problem would be to re-write the tiling 
algorithm (again) to somehow make it hundreds of times faster, then 
provide some way in start() and end() for the user to request the start 
and end of the best contig, or other contig of choice. Easier said than 
done though!


What do people think?


From marian.thieme at lycos.de  Fri Apr 13 06:12:51 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Fri, 13 Apr 2007 10:12:51 +0000
Subject: [Bioperl-l] Affys ReseqChip
Message-ID: <18866117804894@lycos-europe.com>

Hi,

To provide a better understanding of the matter and to assess the approach I will shortly present 
1.) the problem and 2.) my approach.


1.)
given: fragments (string of certain length) with description of location within some reference sequence. For instance:

- redundant fragment: acgtnna--gcta (deletion: pos12, pos13)
- start position: 5
- end position: 17
- and some suited reference sequence

Fragments are assumed to be mappable 1:1 to reference sequence and can contain gaps and n's, the latter indicates that the base wasnt determined maybe because of failed hybridization or something like this.
Thus we dont need to cope with insertions/deletions in terms of only parsing an array design file (description of all insertions and deletions in each redundant fragment) and according to that description inserting gaps in the reference sequence and in the fragments if required.
So from my point of view and in the case of the affy mitochip v2 we only need to process the description file rather than calculating an alignment via dynamic programming matrix.


2.)
My current approach is like the following 5 steps:

1.) input reference sequence and redundant fragments into SeqIO object.

2.) calculate a hash with all insertions defined by length and position and
3.) insert the longest insertion of each position in the appropriate fragments and in the reference sequence. And hence insert as many gaps as given by

length(max_insertion(position_x))-length(insertion(fragment_y, position _x))

to each fragment/reference sequence.
(This is done by iterating over each sequence in the SeqIO and insert gaps according to insertion hash) and

4.) Create SimpleAlign object with LocatableSeq objects

5.) Afterwards we can do some statistical analysis and calc some consensus base for each column in the SimpleAlignment. (I use a Statistics module from cpan).

Unfortunatly I didnt manage to find some method that is giving me the set of bases (column) for a given position in the alignment (did I overlooked something ? is SimpleAlign not appropriate? ), so I iterate for each position (base) of the reference sequence and for each fragments which covers that particular position.


Marian


Jonathan Epstein schrieb:

> This sounds great to me.
>
> Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this.  But I suggest that you proceed full-speed-ahead, and we can sort this out in the future.
>
> Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach.
>
> Jonathan

Stelle Deine Fragen bei Lycos iQ -  http://iq.lycos.de/qa/ask/

From thiago.venancio at gmail.com  Fri Apr 13 15:05:12 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Fri, 13 Apr 2007 16:05:12 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
Message-ID: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>

Hi all.

What is the best way to extract coding region from a nucleotide sequence
based on a BLASTX or TBLASTX comparisons ?

Thanks in advance.

Thiago
-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================


From jason at bioperl.org  Fri Apr 13 16:05:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:05:42 -0700
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
Message-ID: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>

Depends on how far away the query protein is, but I don't trust BLAST  
for the actual alignment.  Find the boundaries, add a little slop,  
and refine the alignment of protein to genome with a good alignment  
program designed to like genewise or exonerate or even FASTX/Y.

-jason
On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:

> Hi all.
>
> What is the best way to extract coding region from a nucleotide  
> sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
> -- 
> "The way to get started is to quit talking and begin doing."
>       Walt Disney
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From jason at bioperl.org  Fri Apr 13 16:13:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:13:07 -0700
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
Message-ID: <7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>

I think it just needs an edit the code in the to_string which checks  
for the type of algorithm.  You'd need to add to the if/elsif cascade  
and add something for the RPSBLAST type and codes the query and  
target dbs and query and target sequence types properly.  This would  
be very trivial to code in, have you tried adding this to see if it  
works?

if you submit a bug with and example report we'd be able to make  
appropriate changes faster.

-jason
On Apr 11, 2007, at 6:32 AM, Emeric Sevin wrote:

> Hi everybody,
>
> I'm sorry to bug, but either I missed something so obvious nobody  
> bothered to answer, either I'm being a little boycotted here...
> A little help would be very much appreciated
>
> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>
>> Hello,
>>
>> I am new to this community, and apologize if this subject has been  
>> posted before.
>>
>> I want to print out only selected results from a multiple blast- 
>> alignments results file. Problem is, the algorithm used is  
>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the  
>> actual writing task yields "unclean" warnings. Although an ouput  
>> is actually written, the writer  
>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by  
>> the fact rpsblast DBs are not labeled with  
>> "protein"/"nucleic"/"translated".
>> Does anybody know of an easy fix to that bug, or of another way to  
>> come around it?
>>
>> Thank you very much
>>
>> Emeric SEVIN
>> Universit? de Rennes 1_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From thiago.venancio at gmail.com  Fri Apr 13 16:20:32 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Fri, 13 Apr 2007 17:20:32 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
Message-ID: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>

Thanks Jason.

I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
comparisons and want to extract some translated coding regions for further
multiple aligmnent and phylogenetic analysis.

Best.

Thiago

On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Depends on how far away the query protein is, but I don't trust BLAST for
> the actual alignment.  Find the boundaries, add a little slop, and refine
> the alignment of protein to genome with a good alignment program designed to
> like genewise or exonerate or even FASTX/Y.
> -jason
> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>
> Hi all.
>
> What is the best way to extract coding region from a nucleotide sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


From jason at bioperl.org  Fri Apr 13 16:47:50 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:47:50 -0700
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
	<44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
Message-ID: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>

Hi -

There are some tools that do this for you -- I've listed a few from a  
google search or from what I remember reading.  It would be great If  
you (and others!) are willing to contribute a little of the info of  
what you find that works for you to the wiki, that would be great as  
well.   A little HOWTO would be cool - here or on openwetware.org.

Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml
EST-PAC:  doi: http://dx.doi.org/10.1186/1751-0473-1-2

Ewan Birney's estwise as part of wise package also can help if you  
have a likely protein from BLAST you want to align to the est -  
estwise can handle frameshifts, but can be too slow for some people.   
Exonerate's protein2dna model may also work here, but I haven't tried  
it.

-jason
On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote:

> Thanks Jason.
>
> I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
> comparisons and want to extract some translated coding regions for  
> further
> multiple aligmnent and phylogenetic analysis.
>
> Best.
>
> Thiago
>
> On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> Depends on how far away the query protein is, but I don't trust  
>> BLAST for
>> the actual alignment.  Find the boundaries, add a little slop, and  
>> refine
>> the alignment of protein to genome with a good alignment program  
>> designed to
>> like genewise or exonerate or even FASTX/Y.
>> -jason
>> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>>
>> Hi all.
>>
>> What is the best way to extract coding region from a nucleotide  
>> sequence
>> based on a BLASTX or TBLASTX comparisons ?
>>
>> Thanks in advance.
>>
>> Thiago
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>> http://jason.open-bio.org/
>>
>>
>>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From gopu_36 at yahoo.com  Fri Apr 13 12:48:48 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Fri, 13 Apr 2007 09:48:48 -0700 (PDT)
Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score
Message-ID: <9982570.post@talk.nabble.com>


Can anyone help me to collect the value of the second best hit score
(ie)raw_score from the blast results which contains multiple queries? I have
used searchIO object to parse my blast report. I am only interested in the
second best hit/raw_score and not the first hit!

Thanks in advance!


-- 
View this message in context: http://www.nabble.com/How-to-parse-blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Sat Apr 14 13:53:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 14 Apr 2007 10:53:42 -0700
Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score
In-Reply-To: <9982570.post@talk.nabble.com>
References: <9982570.post@talk.nabble.com>
Message-ID: <67974DCD-B1F9-4286-86A4-5E4C4DBA3914@bioperl.org>

Try reading the HOWTO.

http://bioperl.org/wiki/HOWTO:SearchIO

On Apr 13, 2007, at 9:48 AM, gopu_36 wrote:

>
> Can anyone help me to collect the value of the second best hit score
> (ie)raw_score from the blast results which contains multiple  
> queries? I have
> used searchIO object to parse my blast report. I am only interested  
> in the
> second best hit/raw_score and not the first hit!
>
> Thanks in advance!
>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-parse- 
> blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment-0002.bin>

From gdorjee at hotmail.com  Sat Apr 14 17:39:50 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
Message-ID: <9997343.post@talk.nabble.com>


hi all, 
can anyone please tell me why the following script gives me error like:
waiting... 5 units of time
Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.
cheers!

use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], 
                              -format => 'fasta');
my $query = $Seq_in->next_seq(); 

my $factory = Bio::Tools::Run::RemoteBlast->new(
                                '-prog'  => 'blastp',
                                '-data' => 'swissprot',
                                 _READMETHOD => "Blast"
                         );
my $blast_report = $factory->submit_blast($query);
my $max_number = 100;
my $trial = 0;


while ( my @rids = $factory->each_rid ) {

    print STDERR "\nSorry, maximum number of retries $max_number exceeded\n"
if $trial >= $max_number;
    last if $trial >= $max_number;
    $trial++;

    print STDERR "waiting... ".(5*$trial)." units of time\n" ;

    # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
    foreach my $rid ( @rids ) {
        my $rc = $factory->retrieve_blast($rid);
       if( !ref($rc) ) {
           if( $rc < 0 ) {
                # retrieve_blast returns -1 on error
               $factory->remove_rid($rid);
            }
            # retrieve_blast returns 0 on 'job not finished'
           sleep 5*$trial;
        } else {

            #---- Blast done ----
            $factory->remove_rid($rid);
            my $result = $rc->next_result;
            print "database: ", $result->database_name(), "\n";
            while( my $hit = $result->next_hit ) {
                print "hit name is: ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                    print "score is: ", $hsp->score, "\n";
                }          }
        }
    }
} 
-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Sat Apr 14 17:39:50 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
Message-ID: <9997343.post@talk.nabble.com>


hi all, 
can anyone please tell me why and how can i fix the following script, which
gives me an error like:
waiting... 5 units of time
Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.
cheers!

use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], 
                              -format => 'fasta');
my $query = $Seq_in->next_seq(); 

my $factory = Bio::Tools::Run::RemoteBlast->new(
                                '-prog'  => 'blastp',
                                '-data' => 'swissprot',
                                 _READMETHOD => "Blast"
                         );
my $blast_report = $factory->submit_blast($query);
my $max_number = 100;
my $trial = 0;


while ( my @rids = $factory->each_rid ) {

    print STDERR "\nSorry, maximum number of retries $max_number exceeded\n"
if $trial >= $max_number;
    last if $trial >= $max_number;
    $trial++;

    print STDERR "waiting... ".(5*$trial)." units of time\n" ;

    # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
    foreach my $rid ( @rids ) {
        my $rc = $factory->retrieve_blast($rid);
       if( !ref($rc) ) {
           if( $rc < 0 ) {
                # retrieve_blast returns -1 on error
               $factory->remove_rid($rid);
            }
            # retrieve_blast returns 0 on 'job not finished'
           sleep 5*$trial;
        } else {

            #---- Blast done ----
            $factory->remove_rid($rid);
            my $result = $rc->next_result;
            print "database: ", $result->database_name(), "\n";
            while( my $hit = $result->next_hit ) {
                print "hit name is: ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                    print "score is: ", $hsp->score, "\n";
                }          }
        }
    }
} 
-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From dmessina at wustl.edu  Sun Apr 15 12:02:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 15 Apr 2007 11:02:51 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <9997343.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
Message-ID: <ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>

Hi DeeGee,

Your script worked fine for me. Perhaps the problem is in your input  
fasta file?

Dave

% perl test.pl AAC12660.fa
waiting... 5 units of time
waiting... 10 units of time
waiting... 15 units of time
database: Non-redundant SwissProt sequences
hit name is: sp|Q15750|TAB1_HUMAN
score is: 2413
hit name is: sp|Q8CF89|TAB1_MOUSE
score is: 2352
hit name is: sp|P49444|PP2C_PARTE
score is: 159
hit name is: sp|Q6ING9|PP2CK_XENLA
[...etc...]


From spiros at lokku.com  Sun Apr 15 12:12:05 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Sun, 15 Apr 2007 17:12:05 +0100
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
Message-ID: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>

Yep, it must be in the input file. The

$result->database_name()

function gets called on $result the result object.

The error you get,

Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.

means the result object is not defined thus the function fails since
there are no data to operate on.

Spiros

On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
> Hi DeeGee,
>
> Your script worked fine for me. Perhaps the problem is in your input
> fasta file?
>
> Dave
>
> % perl test.pl AAC12660.fa
> waiting... 5 units of time
> waiting... 10 units of time
> waiting... 15 units of time
> database: Non-redundant SwissProt sequences
> hit name is: sp|Q15750|TAB1_HUMAN
> score is: 2413
> hit name is: sp|Q8CF89|TAB1_MOUSE
> score is: 2352
> hit name is: sp|P49444|PP2C_PARTE
> score is: 159
> hit name is: sp|Q6ING9|PP2CK_XENLA
> [...etc...]
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From dr.hogart at gmail.com  Sun Apr 15 12:13:29 2007
From: dr.hogart at gmail.com (sergei ryazansky)
Date: Sun, 15 Apr 2007 20:13:29 +0400
Subject: [Bioperl-l] error with blast parsing by searchIO
Message-ID: <op.tqt10r17avnppr@hogart.img.ras.ru>

Hello all,

script (parsing blastn report) that previously had worked today "tell" me  
that:

------------- EXCEPTION  -------------
MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
: No such file or directory
STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
STACK toplevel parse-te-lib2.pl:3

--------------------------------------

What does it mean??

ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8


From cjfields at uiuc.edu  Sun Apr 15 13:40:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 15 Apr 2007 12:40:24 -0500
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqt10r17avnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
Message-ID: <460926E6-0EEA-45D9-838E-70706062857C@uiuc.edu>

You have to update to bioperl 1.5.2 or CVS.  BLAST parsing is broken  
for recent BLAST versions (> v.2.2, I believe).

chris

On Apr 15, 2007, at 11:13 AM, sergei ryazansky wrote:

> Hello all,
>
> script (parsing blastn report) that previously had worked today  
> "tell" me
> that:
>
> ------------- EXCEPTION  -------------
> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
> : No such file or directory
> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: 
> 273
> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
> STACK toplevel parse-te-lib2.pl:3
>
> --------------------------------------
>
> What does it mean??
>
> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Sun Apr 15 14:24:56 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 15 Apr 2007 11:24:56 -0700
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqt10r17avnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
Message-ID: <C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>

It looks like something is broken in your script as to how you are  
passing it a filename - it is trying to open a file called "BLASTN  
2.2.13 [Nov-27-2005]".
did you already open the file and are you passing data from the first  
line of the file to SearchIO perhaps?
Sending the relevant part of your script to the list will help us  
diagnose the problem better.

-jason
On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote:

> Hello all,
>
> script (parsing blastn report) that previously had worked today  
> "tell" me
> that:
>
> ------------- EXCEPTION  -------------
> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
> : No such file or directory
> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: 
> 273
> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
> STACK toplevel parse-te-lib2.pl:3
>
> --------------------------------------
>
> What does it mean??
>
> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment-0002.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment-0002.bin>

From gdorjee at hotmail.com  Sun Apr 15 20:40:22 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
Message-ID: <10008507.post@talk.nabble.com>


hi guys,
thanks for your replies, but i still don't understand why it doesn't work.
my input fasta sequence looks fine. here, take a look,

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

is it possible that the script is not being about to read the
RemoteBlast.pm? but the thing is, i can run the standalone blast on the
command line, although i've never been able the run the same with cgi module
(by gettting the input from an html textarea). i don't understand. i've been
trying to get the standalone running for a while now, and i also mentioned
it in my previous postings....but all in vain. i haven't got over it yet. 
any help or an example would be much appreciated.


Spiros Denaxas wrote:
> 
> Yep, it must be in the input file. The
> 
> $result->database_name()
> 
> function gets called on $result the result object.
> 
> The error you get,
> 
> Can't call method "database_name" on an undefined value at
> test1_remote_swissblast.pl line 41, <GEN4> line 31.
> 
> means the result object is not defined thus the function fails since
> there are no data to operate on.
> 
> Spiros
> 
> On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
>> Hi DeeGee,
>>
>> Your script worked fine for me. Perhaps the problem is in your input
>> fasta file?
>>
>> Dave
>>
>> % perl test.pl AAC12660.fa
>> waiting... 5 units of time
>> waiting... 10 units of time
>> waiting... 15 units of time
>> database: Non-redundant SwissProt sequences
>> hit name is: sp|Q15750|TAB1_HUMAN
>> score is: 2413
>> hit name is: sp|Q8CF89|TAB1_MOUSE
>> score is: 2352
>> hit name is: sp|P49444|PP2C_PARTE
>> score is: 159
>> hit name is: sp|Q6ING9|PP2CK_XENLA
>> [...etc...]
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Sun Apr 15 20:40:22 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
Message-ID: <10008507.post@talk.nabble.com>


hi guys,
thanks for your replies, but i still don't understand why it doesn't work.
my input fasta sequence looks fine. here, take a look,

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

is it possible that the script is not being able to read the RemoteBlast.pm?
but the thing is, i can run the standalone blast on the command line,
although i've never been able the run the same with cgi module (by gettting
the input from an html textarea). i don't understand. i've been trying to
get the standalone running for a while now, and i also mentioned it in my
previous postings....but all in vain. i haven't got over it yet. 
any help or an example would be much appreciated.


Spiros Denaxas wrote:
> 
> Yep, it must be in the input file. The
> 
> $result->database_name()
> 
> function gets called on $result the result object.
> 
> The error you get,
> 
> Can't call method "database_name" on an undefined value at
> test1_remote_swissblast.pl line 41, <GEN4> line 31.
> 
> means the result object is not defined thus the function fails since
> there are no data to operate on.
> 
> Spiros
> 
> On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
>> Hi DeeGee,
>>
>> Your script worked fine for me. Perhaps the problem is in your input
>> fasta file?
>>
>> Dave
>>
>> % perl test.pl AAC12660.fa
>> waiting... 5 units of time
>> waiting... 10 units of time
>> waiting... 15 units of time
>> database: Non-redundant SwissProt sequences
>> hit name is: sp|Q15750|TAB1_HUMAN
>> score is: 2413
>> hit name is: sp|Q8CF89|TAB1_MOUSE
>> score is: 2352
>> hit name is: sp|P49444|PP2C_PARTE
>> score is: 159
>> hit name is: sp|Q6ING9|PP2CK_XENLA
>> [...etc...]
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From dmessina at wustl.edu  Sun Apr 15 22:43:06 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 15 Apr 2007 21:43:06 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10008507.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
Message-ID: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>

You're right, it's not the input sequence. I just tried it with your  
script and it worked.


> is it possible that the script is not being about to read the
> RemoteBlast.pm?

I think the program wouldn't compile if that were the case, and your  
error message would be about not finding RemoteBlast.pm rather than  
the one you got.


> but the thing is, i can run the standalone blast on the
> command line, although i've never been able the run the same with  
> cgi module
> (by gettting the input from an html textarea). i don't understand.

This result really suggests that perl and Bioperl are not the issue.  
I'm not saying the following to give you the brushoff, but given the  
numerous ways in which web-based apps can fail and in which  
webservers can be installed, it might be best for you to find someone  
at your institution who can sit down with you and work through it.

Dave


From cjfields at uiuc.edu  Sun Apr 15 23:51:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 15 Apr 2007 22:51:05 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
Message-ID: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>

This sounds like a similar issue that popped up a few weeks ago  
related to URLAPI changes for remote BLAST access.  That was fixed on  
NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
well.

Saying that, my guess is the same as Dave's, that there are  
connectivity issues.  What happens when you set the RemoteBlast  
factory to a verbosity of 1?  This will spill out debugging output  
from the repeated queries to the NCBI server (so if there are  
problems they'll show up there).

...
my $factory = Bio::Tools::Run::RemoteBlast->new(
                                 '-prog'  => 'blastp',
                                 '-data' => 'swissprot',
                                  _READMETHOD => "Blast",
                                  -verbose => 1    # debugging output
                          );
...

If you see the BLAST report but get the same error try using the  
RemoteBlast in CVS to see if it fixes the problem.

chris


On Apr 15, 2007, at 9:43 PM, David Messina wrote:

> You're right, it's not the input sequence. I just tried it with your
> script and it worked.
>
>
>> is it possible that the script is not being about to read the
>> RemoteBlast.pm?
>
> I think the program wouldn't compile if that were the case, and your
> error message would be about not finding RemoteBlast.pm rather than
> the one you got.
>
>
>> but the thing is, i can run the standalone blast on the
>> command line, although i've never been able the run the same with
>> cgi module
>> (by gettting the input from an html textarea). i don't understand.
>
> This result really suggests that perl and Bioperl are not the issue.
> I'm not saying the following to give you the brushoff, but given the
> numerous ways in which web-based apps can fail and in which
> webservers can be installed, it might be best for you to find someone
> at your institution who can sit down with you and work through it.
>
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dr.hogart at gmail.com  Mon Apr 16 03:03:46 2007
From: dr.hogart at gmail.com (sergei ryazansky)
Date: Mon, 16 Apr 2007 11:03:46 +0400
Subject: [Bioperl-l] error with blast parsing by searchIO
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
	<C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>
Message-ID: <op.tqu68kvzavnppr@hogart.img.ras.ru>

The problem was resolved by the direct path (-file=>'d\...\input.txt') to  
input file in the my script.
I think that Chris right and i should update my bioperl to 1.5 version.
By the way, bioperl-1.5 is not accessible via ppm. Where I can download it  
for winXP?

On Sun, 15 Apr 2007 22:24:56 +0400, Jason Stajich <jason at bioperl.org>  
wrote:

> It looks like something is broken in your script as to how you are
> passing it a filename - it is trying to open a file called "BLASTN
> 2.2.13 [Nov-27-2005]".
> did you already open the file and are you passing data from the first
> line of the file to SearchIO perhaps?
> Sending the relevant part of your script to the list will help us
> diagnose the problem better.
>
> -jason
> On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote:
>
>> Hello all,
>>
>> script (parsing blastn report) that previously had worked today
>> "tell" me
>> that:
>>
>> ------------- EXCEPTION  -------------
>> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
>> : No such file or directory
>> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm:
>> 273
>> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
>> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
>> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
>> STACK toplevel parse-te-lib2.pl:3
>>
>> --------------------------------------
>>
>> What does it mean??
>>
>> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>


-- 
?????????? M2, ????????????? ???????? ?????????? Opera:  
http://www.opera.com/mail/mail/


From bix at sendu.me.uk  Mon Apr 16 04:34:56 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 16 Apr 2007 09:34:56 +0100
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqu68kvzavnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>	<C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>
	<op.tqu68kvzavnppr@hogart.img.ras.ru>
Message-ID: <46233530.1010109@sendu.me.uk>

sergei ryazansky wrote:
> The problem was resolved by the direct path (-file=>'d\...\input.txt') to  
> input file in the my script.
> I think that Chris right and i should update my bioperl to 1.5 version.
> By the way, bioperl-1.5 is not accessible via ppm. Where I can download it  
> for winXP?

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 10:36:33 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 22:36:33 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>


Dear all,
 
Given a GO id, is there a way to extract all
the related gene names from that id with Perl?
 
Anybody has experience with that?
I've looked through GO module in CPAN, but can't seem
to find any tool that facilitated that searc
 
Look forward very much for your advice.
 
--
Edward WIJAYA
SINGAPORE

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From spiros at lokku.com  Mon Apr 16 11:10:49 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 16 Apr 2007 16:10:49 +0100
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>

Hi Edward,

What organism are you interested in? I have some code from my PhD
based on the Saccharomyces cerevisiae genome. Basically uses the SGD
flat files and a local MySQL instance of GO. Might be worth turning
into modules if people are interested in it, although it is pretty
organism oriented and the lack of abstraction might introduce a number
of problems.

Spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 11:14:09 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 23:14:09 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>


Hi Spiros,
 
Thanks for your reply. I am interested to apply it for 
all the kind of organisms related to that particular GO ID.
 
Do you have a CPAN module for that?
--
Edward WIJAYA
SINGAPORE

________________________________

From: s.denaxas at gmail.com on behalf of Spiros Denaxas
Sent: Mon 4/16/2007 11:10 PM
To: Wijaya Edward
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


Hi Edward,

What organism are you interested in? I have some code from my PhD
based on the Saccharomyces cerevisiae genome. Basically uses the SGD
flat files and a local MySQL instance of GO. Might be worth turning
into modules if people are interested in it, although it is pretty
organism oriented and the lack of abstraction might introduce a number
of problems.

Spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From dmessina at wustl.edu  Mon Apr 16 11:21:01 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 16 Apr 2007 10:21:01 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>

I use BioMART for this kind of thing. If you need to do this for more  
than a couple of GO terms, BioMART has a Perl API you can use to  
connect to their data.

http://www.biomart.org/

http://www.biomart.org/install-overview.html

Dave


From spiros at lokku.com  Mon Apr 16 11:21:40 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 16 Apr 2007 16:21:40 +0100
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
Message-ID: <bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>

Nope, I don't have a CPAN module for it, and to be honest, I don't
think I will release one for it until I actually finish my PhD. The
code is really scruffy at some parts, lacks documentation and might
not work under all setups. My plan is to take some time after and
clean it up and release a proper version of it to the public.

What you are talking about however, if I understand correctly, is a
much much bigger project. Different genome databases have different
formats and a potential module must take them all into consideration.
Then the issue of the different evidence codes GO annotators use
throughout different genomes and which you consider of higher or lower
quality respective.

If you happen to stumble upon such a module, please share it, it would
be very interesting !

spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> >
> > Dear all,
> >
> > Given a GO id, is there a way to extract all
> > the related gene names from that id with Perl?
> >
> > Anybody has experience with that?
> > I've looked through GO module in CPAN, but can't seem
> > to find any tool that facilitated that searc
> >
> > Look forward very much for your advice.
> >
> > --
> > Edward WIJAYA
> > SINGAPORE
> >
> > ------------ Institute For Infocomm Research - Disclaimer -------------
> > This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> > --------------------------------------------------------
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
>


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 11:33:27 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 23:33:27 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>


Hi David, 
 
There seems to be no biomart-perl module in CPAN.
 
I tried their cvs:
cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login
 
But require password. Can suggest if there is another way to get this module?
 
--
Edward WIJAYA

________________________________

From: David Messina [mailto:dmessina at wustl.edu]
Sent: Mon 4/16/2007 11:21 PM
To: Wijaya Edward
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


I use BioMART for this kind of thing. If you need to do this for more 
than a couple of GO terms, BioMART has a Perl API you can use to 
connect to their data.

http://www.biomart.org/

http://www.biomart.org/install-overview.html

Dave


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From Kevin.M.Brown at asu.edu  Mon Apr 16 11:44:28 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 16 Apr 2007 08:44:28 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net><BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
	<3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
Message-ID: <1A4207F8295607498283FE9E93B775B4030A4914@EX02.asurite.ad.asu.edu>

Did you follow the directions as listed at?

http://www.biomart.org/install-overview.html 


> There seems to be no biomart-perl module in CPAN.
>  
> I tried their cvs:
> cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login
>  
> But require password. Can suggest if there is another way to 
> get this module?


From dmessina at wustl.edu  Mon Apr 16 11:44:26 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 16 Apr 2007 10:44:26 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
	<3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
Message-ID: <2D698B2E-49B9-411E-B1FA-C12F4A235EB2@wustl.edu>

The password you need to enter when asked is CVSUSER.

Dave


From sdavis2 at mail.nih.gov  Mon Apr 16 11:55:14 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 16 Apr 2007 11:55:14 -0400
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
	<bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>
Message-ID: <200704161155.14567.sdavis2@mail.nih.gov>


> > On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> > > Dear all,
> > >
> > > Given a GO id, is there a way to extract all
> > > the related gene names from that id with Perl?

This is a pretty simple problem if you have the data in a useable format.  The 
data that you need are available here:

ftp://ftp.ncbi.nih.gov/gene/DATA

The README file gives details, but the files in this directory are all 
tab-delimited text.  Download the gene2go.gz file, which contains a mapping 
from Entrez Gene ID to GO accession.  Then, download the gene_info.gz file, 
which contains the information about the Entrez Gene ID, including 
description, gene symbol, etc.  If you need to link to other data, you can of 
course download the respective files from NCBI.  You can either load the data 
into a SQL database of some type for general queries, or you can simply read 
them into perl directly (with appropriate data structures) to do you mapping.  
Since they are tab-delimited text, I would choose the database route and then 
use SQL and DBI to do the queries you like.

Sean


From cjfields at uiuc.edu  Mon Apr 16 12:25:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 11:25:42 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>

You can limit EntrezGene searches by Gene Ontology ID using the [Gene  
Ontology] field in queries.  The following query:

'9220[Gene Ontology]'

will give 120 gene IDs.  You can get the same list using the still- 
under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm  
still working on this):

my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                        -db => 'gene',
                                        -term => '9220[Gene Ontology]',
                                        -retmax => 300);
$esearch->get_response;
my @ids = $esearch->get_ids;
print join "\n", at ids;

In my opinion, Sean's idea of using SQL is probably better if you  
have tons of searches to do.

chris

On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:

>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Apr 16 14:34:25 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 13:34:25 -0500
Subject: [Bioperl-l] Bio::Matrix::PSM::ProtPsm
Message-ID: <CA820306-7480-478D-BD3E-A0F094943065@uiuc.edu>

I was going through tests converting to Test::More and found this  
module is largely unimplemented (relevant tests are in t/ProtPsm.t in  
CVS).  It was written by James Thompson a few years ago and the  
module docs seem to indicate some uncertainty on what this class is  
meant to accomplish.  Does anyone know the status of this code?

chris


From cjm at fruitfly.org  Mon Apr 16 14:49:23 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 11:49:23 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
Message-ID: <AAF82F3A-3C75-4D51-AFD4-FDE358391A03@fruitfly.org>


Download:
http://search.cpan.org/~cmungall/go-db-perl

or do:

cpan GO::AppHandle

The API call you want is here:
http://search.cpan.org/~cmungall/go-db-perl/GO/ 
AppHandle.pm#get_deep_products

Here is an example snippet:

   use GO::AppHandle;
   my $apph=GO::AppHandle->connect(@ARGV);
   my $go_acc = shift @ARGV;
   my $gps = $apph->get_deep_products({term=>{acc=>$go_acc}});
   foreach my $gp (@$gps) {
     printf "%s %s\n", $gp->xref->acc, $gp->symbol;
   }

You will need to download the GO Database.

Cheers
Chris

On Apr 16, 2007, at 8:14 AM, Wijaya Edward wrote:

>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO)  
> with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer  
>> -------------
>> This email is confidential and may be privileged.  If you are not  
>> the intended recipient, please delete it and notify us  
>> immediately. Please do not copy or use it for any purpose, or  
>> disclose its contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gdorjee at hotmail.com  Mon Apr 16 15:10:01 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:10:01 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
Message-ID: <10022463.post@talk.nabble.com>


hi Chris,
thanks for your reply. i set the RemoteBlast factory to a verbosity of 1,
and i get the same error message. i'm new to all these. so, could you plz
tell me how can i do the RemoteBlast in CVS that you've suggested.

cheers!!!
 

Chris Fields wrote:
> 
> This sounds like a similar issue that popped up a few weeks ago  
> related to URLAPI changes for remote BLAST access.  That was fixed on  
> NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
> well.
> 
> Saying that, my guess is the same as Dave's, that there are  
> connectivity issues.  What happens when you set the RemoteBlast  
> factory to a verbosity of 1?  This will spill out debugging output  
> from the repeated queries to the NCBI server (so if there are  
> problems they'll show up there).
> 
> ...
> my $factory = Bio::Tools::Run::RemoteBlast->new(
>                                  '-prog'  => 'blastp',
>                                  '-data' => 'swissprot',
>                                   _READMETHOD => "Blast",
>                                   -verbose => 1    # debugging output
>                           );
> ...
> 
> If you see the BLAST report but get the same error try using the  
> RemoteBlast in CVS to see if it fixes the problem.
> 
> chris
> 
> 
> On Apr 15, 2007, at 9:43 PM, David Messina wrote:
> 
>> You're right, it's not the input sequence. I just tried it with your
>> script and it worked.
>>
>>
>>> is it possible that the script is not being about to read the
>>> RemoteBlast.pm?
>>
>> I think the program wouldn't compile if that were the case, and your
>> error message would be about not finding RemoteBlast.pm rather than
>> the one you got.
>>
>>
>>> but the thing is, i can run the standalone blast on the
>>> command line, although i've never been able the run the same with
>>> cgi module
>>> (by gettting the input from an html textarea). i don't understand.
>>
>> This result really suggests that perl and Bioperl are not the issue.
>> I'm not saying the following to give you the brushoff, but given the
>> numerous ways in which web-based apps can fail and in which
>> webservers can be installed, it might be best for you to find someone
>> at your institution who can sit down with you and work through it.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022463
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Mon Apr 16 15:11:18 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:11:18 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
Message-ID: <10022464.post@talk.nabble.com>


Thank you, David.


David Messina-2 wrote:
> 
> You're right, it's not the input sequence. I just tried it with your  
> script and it worked.
> 
> 
>> is it possible that the script is not being about to read the
>> RemoteBlast.pm?
> 
> I think the program wouldn't compile if that were the case, and your  
> error message would be about not finding RemoteBlast.pm rather than  
> the one you got.
> 
> 
>> but the thing is, i can run the standalone blast on the
>> command line, although i've never been able the run the same with  
>> cgi module
>> (by gettting the input from an html textarea). i don't understand.
> 
> This result really suggests that perl and Bioperl are not the issue.  
> I'm not saying the following to give you the brushoff, but given the  
> numerous ways in which web-based apps can fail and in which  
> webservers can be installed, it might be best for you to find someone  
> at your institution who can sit down with you and work through it.
> 
> Dave
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022464
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjm at fruitfly.org  Mon Apr 16 14:41:59 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 11:41:59 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
Message-ID: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>


Unless the Entrez interface has changed since I last looked, the  
query below for "pyrimidine ribonucleotide biosynthetic process" will  
NOT perform the transitive closure over the graph; this means genes  
and gene products annotated to GO:0009174 "pyrimidine ribonucleoside  
monophosphate biosynthetic process", for example

On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:

> You can limit EntrezGene searches by Gene Ontology ID using the [Gene
> Ontology] field in queries.  The following query:
>
> '9220[Gene Ontology]'
>
> will give 120 gene IDs.  You can get the same list using the still-
> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
> still working on this):
>
> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                         -db => 'gene',
>                                         -term => '9220[Gene  
> Ontology]',
>                                         -retmax => 300);
> $esearch->get_response;
> my @ids = $esearch->get_ids;
> print join "\n", at ids;
>
> In my opinion, Sean's idea of using SQL is probably better if you
> have tons of searches to do.
>
> chris
>
> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer
>> -------------
>> This email is confidential and may be privileged.  If you are not
>> the intended recipient, please delete it and notify us immediately.
>> Please do not copy or use it for any purpose, or disclose its
>> contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Mon Apr 16 15:25:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 14:25:14 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
	<50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
Message-ID: <3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>

You are correct; it explains why the list is only 120 genes.  The  
only way (currently) to do so would be to perform the closure locally  
somehow (maybe via go-perl or similar).

chris

On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:

>
> Unless the Entrez interface has changed since I last looked, the  
> query below for "pyrimidine ribonucleotide biosynthetic process"  
> will NOT perform the transitive closure over the graph; this means  
> genes and gene products annotated to GO:0009174 "pyrimidine  
> ribonucleoside monophosphate biosynthetic process", for example
>
> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>
>> You can limit EntrezGene searches by Gene Ontology ID using the [Gene
>> Ontology] field in queries.  The following query:
>>
>> '9220[Gene Ontology]'
>>
>> will give 120 gene IDs.  You can get the same list using the still-
>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>> still working on this):
>>
>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>                                         -db => 'gene',
>>                                         -term => '9220[Gene  
>> Ontology]',
>>                                         -retmax => 300);
>> $esearch->get_response;
>> my @ids = $esearch->get_ids;
>> print join "\n", at ids;
>>
>> In my opinion, Sean's idea of using SQL is probably better if you
>> have tons of searches to do.
>>
>> chris
>>
>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>
>>>
>>> Dear all,
>>>
>>> Given a GO id, is there a way to extract all
>>> the related gene names from that id with Perl?
>>>
>>> Anybody has experience with that?
>>> I've looked through GO module in CPAN, but can't seem
>>> to find any tool that facilitated that searc
>>>
>>> Look forward very much for your advice.
>>>
>>> --
>>> Edward WIJAYA
>>> SINGAPORE
>>>
>>> ------------ Institute For Infocomm Research - Disclaimer
>>> -------------
>>> This email is confidential and may be privileged.  If you are not
>>> the intended recipient, please delete it and notify us immediately.
>>> Please do not copy or use it for any purpose, or disclose its
>>> contents to any other person. Thank you.
>>> --------------------------------------------------------
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr 16 15:27:32 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:27:32 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
Message-ID: <10022661.post@talk.nabble.com>


hi Chris, 
sorry to bother you again, but could you plz check the following script to
see what's wrong. i've been getting errors like :

Content-type: text/html
Software error:
------------- EXCEPTION  -------------
MSG:   (0) not Bio::Seq object or array of Bio::Seq objects or file name!
STACK Bio::Tools::Run::StandAloneBlast::blastall
/usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:532
STACK toplevel /usr/local/apache2/htdocs/rmtest.pl:46
--------------------------------------

#### the script ######
#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::SeqIO;
use Bio::SearchIO;
use Bio::DB::GenPept; 
use Bio::Tools::Run::StandAloneBlast;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

my $cgi = new CGI;

print $cgi->header,
$cgi->start_html(-title=>'A StandAloneBlast Test'),
$cgi->h1('Blast Result'),
$cgi->start_form,
"Enter or paste an amino-acid sequence? ",
$cgi->p,
$cgi->textarea(-name=>'name', rows=>10, -columns=>60),
$cgi->p,
$cgi->submit,
$cgi->end_form,
$cgi->hr;

open(OUTPUT,">result/query.faa");

if ($cgi->param()) {
        my $seq = $cgi->param('name');
        print OUTPUT $seq;

my @params = ('program'=>'blastp', 'database' =>
'/export/home/dorjee/database/nrpart', 'outfile' => 'result/blast.out',
_READMETHOD => 'Blast');
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

# Blast a sequence against a database:
my $str = Bio::SeqIO->new(-file => "result/query.faa", '-format' => 'Fasta'
);
my $input = $str->next_seq();
my $blast_report = $factory->blastall($input);
}


Chris Fields wrote:
> 
> This sounds like a similar issue that popped up a few weeks ago  
> related to URLAPI changes for remote BLAST access.  That was fixed on  
> NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
> well.
> 
> Saying that, my guess is the same as Dave's, that there are  
> connectivity issues.  What happens when you set the RemoteBlast  
> factory to a verbosity of 1?  This will spill out debugging output  
> from the repeated queries to the NCBI server (so if there are  
> problems they'll show up there).
> 
> ...
> my $factory = Bio::Tools::Run::RemoteBlast->new(
>                                  '-prog'  => 'blastp',
>                                  '-data' => 'swissprot',
>                                   _READMETHOD => "Blast",
>                                   -verbose => 1    # debugging output
>                           );
> ...
> 
> If you see the BLAST report but get the same error try using the  
> RemoteBlast in CVS to see if it fixes the problem.
> 
> chris
> 
> 
> On Apr 15, 2007, at 9:43 PM, David Messina wrote:
> 
>> You're right, it's not the input sequence. I just tried it with your
>> script and it worked.
>>
>>
>>> is it possible that the script is not being about to read the
>>> RemoteBlast.pm?
>>
>> I think the program wouldn't compile if that were the case, and your
>> error message would be about not finding RemoteBlast.pm rather than
>> the one you got.
>>
>>
>>> but the thing is, i can run the standalone blast on the
>>> command line, although i've never been able the run the same with
>>> cgi module
>>> (by gettting the input from an html textarea). i don't understand.
>>
>> This result really suggests that perl and Bioperl are not the issue.
>> I'm not saying the following to give you the brushoff, but given the
>> numerous ways in which web-based apps can fail and in which
>> webservers can be installed, it might be best for you to find someone
>> at your institution who can sit down with you and work through it.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022661
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Apr 16 15:37:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 14:37:58 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10022463.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
Message-ID: <5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>

The 'verbose' setting doesn't change the way the BLAST query is sent,  
it just sends the raw output from the repeated attempts to retrieve  
the report (using the RID) to STDERR.  The error you saw won't be  
fixed by doing so.

What I was interested in was the raw HTML output dumped to the  
screen.  If it is querying the NCBI server it should dump stuff that  
includes something like this:

...
<HTML>
<p></p>
<!--
QBlastInfoBegin
         Status=WAITING
QBlastInfoEnd
--><p></p>
<SCRIPT LANGUAGE="JavaScript"><!--
...

which indicates you have a request in the BLAST queue.  If you aren't  
seeing anything then the problem is likely network-related on your  
end, so getting the latest RemoteBlast won't help.  Do any other  
BioPerl modules requiring network access work (Bio::DB::GenBank, for  
instance)?  If not it could be a proxy issue...

Just in case, here's the browsable CVS location for RemoteBlast:

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
Tools/Run/RemoteBlast.pm?cvsroot=bioperl

Click on the download link and save over your local version.

chris

On Apr 16, 2007, at 2:10 PM, DeeGee wrote:

>
> hi Chris,
> thanks for your reply. i set the RemoteBlast factory to a verbosity  
> of 1,
> and i get the same error message. i'm new to all these. so, could  
> you plz
> tell me how can i do the RemoteBlast in CVS that you've suggested.
>
> cheers!!!


From gdorjee at hotmail.com  Mon Apr 16 16:42:37 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 13:42:37 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
Message-ID: <10024333.post@talk.nabble.com>


hi 
i tried the following code just to check the network, and it worked fine
except for the SwissProt part, for which i got the error message instead of
the sequence:

------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq
/usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
STACK toplevel bbbbb.pl:21
--------------------------------------

#### check #####
#!/usr/bin/perl -w
use strict;
use Bio::DB::GenBank;
use Bio::DB::SwissProt;
use Bio::DB::GenPept;
use Bio::SeqIO;

my $genpeptdb = new Bio::DB::GenPept();
my $genbankdb = new Bio::DB::GenBank();
my $swissdb = new Bio::DB::SwissProt();

my $seqio = new Bio::SeqIO(-format => 'fasta',
                           -fh     => \*STDOUT);

my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
$seqio->write_seq($protseq);

my $seq = $genbankdb->get_Seq_by_acc('AF303112');
$seqio->write_seq($seq);

$protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
$seqio->write_seq($protseq);

thanks a lot.


Chris Fields wrote:
> 
> The 'verbose' setting doesn't change the way the BLAST query is sent,  
> it just sends the raw output from the repeated attempts to retrieve  
> the report (using the RID) to STDERR.  The error you saw won't be  
> fixed by doing so.
> 
> What I was interested in was the raw HTML output dumped to the  
> screen.  If it is querying the NCBI server it should dump stuff that  
> includes something like this:
> 
> ...
> <HTML>
> <p></p>
> <!--
> QBlastInfoBegin
>          Status=WAITING
> QBlastInfoEnd
> --><p></p>
> <SCRIPT LANGUAGE="JavaScript"><!--
> ...
> 
> which indicates you have a request in the BLAST queue.  If you aren't  
> seeing anything then the problem is likely network-related on your  
> end, so getting the latest RemoteBlast won't help.  Do any other  
> BioPerl modules requiring network access work (Bio::DB::GenBank, for  
> instance)?  If not it could be a proxy issue...
> 
> Just in case, here's the browsable CVS location for RemoteBlast:
> 
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
> 
> Click on the download link and save over your local version.
> 
> chris
> 
> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
> 
>>
>> hi Chris,
>> thanks for your reply. i set the RemoteBlast factory to a verbosity  
>> of 1,
>> and i get the same error message. i'm new to all these. so, could  
>> you plz
>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>
>> cheers!!!
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10024333
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Apr 16 18:24:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 17:24:11 -0500
Subject: [Bioperl-l] HOWTO:Writing BioPerl Tests
Message-ID: <547A30CD-6BAA-4C08-A935-9975634691B2@uiuc.edu>

I have posted a quickie HOWTO on writing up BioPerl tests using  
Test::More.  If anyone wants to add to it feel free (make sure to  
credit yourself in the authors section).

http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

There is space in there if we decide to add more modules for  
enhancing tests (I think Nathan suggested Test::Exception or similar).

chris


From cjfields at uiuc.edu  Mon Apr 16 19:24:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 18:24:32 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10024333.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
	<10024333.post@talk.nabble.com>
Message-ID: <54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>

What version of bioperl are you using?  I get an error but it is b/c  
the ID doesn't exist.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: acc KPYK_ECOLI does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /Users/cjfields/src/bioperl- 
live/Bio/DB/WebDBSeqI.pm:181
STACK: genpept.pl:21
-----------------------------------------------------------

The actual accession is 'KPYK1_ECOLI'.

chris

On Apr 16, 2007, at 3:42 PM, DeeGee wrote:

>
> hi
> i tried the following code just to check the network, and it worked  
> fine
> except for the SwissProt part, for which i got the error message  
> instead of
> the sequence:
>
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq
> /usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
> STACK toplevel bbbbb.pl:21
> --------------------------------------
>
> #### check #####
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::GenBank;
> use Bio::DB::SwissProt;
> use Bio::DB::GenPept;
> use Bio::SeqIO;
>
> my $genpeptdb = new Bio::DB::GenPept();
> my $genbankdb = new Bio::DB::GenBank();
> my $swissdb = new Bio::DB::SwissProt();
>
> my $seqio = new Bio::SeqIO(-format => 'fasta',
>                            -fh     => \*STDOUT);
>
> my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
> $seqio->write_seq($protseq);
>
> my $seq = $genbankdb->get_Seq_by_acc('AF303112');
> $seqio->write_seq($seq);
>
> $protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
> $seqio->write_seq($protseq);
>
> thanks a lot.
>
>
> Chris Fields wrote:
>>
>> The 'verbose' setting doesn't change the way the BLAST query is sent,
>> it just sends the raw output from the repeated attempts to retrieve
>> the report (using the RID) to STDERR.  The error you saw won't be
>> fixed by doing so.
>>
>> What I was interested in was the raw HTML output dumped to the
>> screen.  If it is querying the NCBI server it should dump stuff that
>> includes something like this:
>>
>> ...
>> <HTML>
>> <p></p>
>> <!--
>> QBlastInfoBegin
>>          Status=WAITING
>> QBlastInfoEnd
>> --><p></p>
>> <SCRIPT LANGUAGE="JavaScript"><!--
>> ...
>>
>> which indicates you have a request in the BLAST queue.  If you aren't
>> seeing anything then the problem is likely network-related on your
>> end, so getting the latest RemoteBlast won't help.  Do any other
>> BioPerl modules requiring network access work (Bio::DB::GenBank, for
>> instance)?  If not it could be a proxy issue...
>>
>> Just in case, here's the browsable CVS location for RemoteBlast:
>>
>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
>>
>> Click on the download link and save over your local version.
>>
>> chris
>>
>> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
>>
>>>
>>> hi Chris,
>>> thanks for your reply. i set the RemoteBlast factory to a verbosity
>>> of 1,
>>> and i get the same error message. i'm new to all these. so, could
>>> you plz
>>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>>
>>> cheers!!!
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/error-while- 
> remote-blast-against-swissprot-db-tf3577674.html#a10024333
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjm at fruitfly.org  Mon Apr 16 20:59:46 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 17:59:46 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
	<50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
	<3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>
Message-ID: <9612F3E7-F239-49C1-A5BE-E10FF2FC2063@fruitfly.org>


You could perform the closure locally and then iterate over the  
individual IDs or construct a big disjunctive query to Entrez -  
either way it's not so efficient, especially for less specific nodes  
(distributed queries with ontologies is an interesting challenge).

Soon you'll be able to do the same query over the GO Database / AmiGO  
using a REST API

On Apr 16, 2007, at 12:25 PM, Chris Fields wrote:

> You are correct; it explains why the list is only 120 genes.  The
> only way (currently) to do so would be to perform the closure locally
> somehow (maybe via go-perl or similar).
>
> chris
>
> On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:
>
>>
>> Unless the Entrez interface has changed since I last looked, the
>> query below for "pyrimidine ribonucleotide biosynthetic process"
>> will NOT perform the transitive closure over the graph; this means
>> genes and gene products annotated to GO:0009174 "pyrimidine
>> ribonucleoside monophosphate biosynthetic process", for example
>>
>> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>>
>>> You can limit EntrezGene searches by Gene Ontology ID using the  
>>> [Gene
>>> Ontology] field in queries.  The following query:
>>>
>>> '9220[Gene Ontology]'
>>>
>>> will give 120 gene IDs.  You can get the same list using the still-
>>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>>> still working on this):
>>>
>>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                         -db => 'gene',
>>>                                         -term => '9220[Gene
>>> Ontology]',
>>>                                         -retmax => 300);
>>> $esearch->get_response;
>>> my @ids = $esearch->get_ids;
>>> print join "\n", at ids;
>>>
>>> In my opinion, Sean's idea of using SQL is probably better if you
>>> have tons of searches to do.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> Given a GO id, is there a way to extract all
>>>> the related gene names from that id with Perl?
>>>>
>>>> Anybody has experience with that?
>>>> I've looked through GO module in CPAN, but can't seem
>>>> to find any tool that facilitated that searc
>>>>
>>>> Look forward very much for your advice.
>>>>
>>>> --
>>>> Edward WIJAYA
>>>> SINGAPORE
>>>>
>>>> ------------ Institute For Infocomm Research - Disclaimer
>>>> -------------
>>>> This email is confidential and may be privileged.  If you are not
>>>> the intended recipient, please delete it and notify us immediately.
>>>> Please do not copy or use it for any purpose, or disclose its
>>>> contents to any other person. Thank you.
>>>> --------------------------------------------------------
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 23:51:18 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Tue, 17 Apr 2007 11:51:18 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu><50A1CCF2-4650-4F87-8386-DB0
	E87292023@fruitfly.org><3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu><9612
	F3E7-F239-49C1-A5BE-E10FF2FC2063@fruitfly.org>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061686@mailbe01.teak.local.net>


Thanks so much for all the suggestion.
It was really helpful to me. 
 
--
Edward WIJAYA

________________________________

From: Chris Mungall [mailto:cjm at fruitfly.org]
Sent: Tue 4/17/2007 8:59 AM
To: Chris Fields
Cc: bioperl-l at lists.open-bio.org; Wijaya Edward
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


You could perform the closure locally and then iterate over the 
individual IDs or construct a big disjunctive query to Entrez - 
either way it's not so efficient, especially for less specific nodes 
(distributed queries with ontologies is an interesting challenge).

Soon you'll be able to do the same query over the GO Database / AmiGO 
using a REST API

On Apr 16, 2007, at 12:25 PM, Chris Fields wrote:

> You are correct; it explains why the list is only 120 genes.  The
> only way (currently) to do so would be to perform the closure locally
> somehow (maybe via go-perl or similar).
>
> chris
>
> On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:
>
>>
>> Unless the Entrez interface has changed since I last looked, the
>> query below for "pyrimidine ribonucleotide biosynthetic process"
>> will NOT perform the transitive closure over the graph; this means
>> genes and gene products annotated to GO:0009174 "pyrimidine
>> ribonucleoside monophosphate biosynthetic process", for example
>>
>> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>>
>>> You can limit EntrezGene searches by Gene Ontology ID using the 
>>> [Gene
>>> Ontology] field in queries.  The following query:
>>>
>>> '9220[Gene Ontology]'
>>>
>>> will give 120 gene IDs.  You can get the same list using the still-
>>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>>> still working on this):
>>>
>>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                         -db => 'gene',
>>>                                         -term => '9220[Gene
>>> Ontology]',
>>>                                         -retmax => 300);
>>> $esearch->get_response;
>>> my @ids = $esearch->get_ids;
>>> print join "\n", at ids;
>>>
>>> In my opinion, Sean's idea of using SQL is probably better if you
>>> have tons of searches to do.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> Given a GO id, is there a way to extract all
>>>> the related gene names from that id with Perl?
>>>>
>>>> Anybody has experience with that?
>>>> I've looked through GO module in CPAN, but can't seem
>>>> to find any tool that facilitated that searc
>>>>
>>>> Look forward very much for your advice.
>>>>
>>>> --
>>>> Edward WIJAYA
>>>> SINGAPORE
>>>>
>>>> ------------ Institute For Infocomm Research - Disclaimer
>>>> -------------
>>>> This email is confidential and may be privileged.  If you are not
>>>> the intended recipient, please delete it and notify us immediately.
>>>> Please do not copy or use it for any purpose, or disclose its
>>>> contents to any other person. Thank you.
>>>> --------------------------------------------------------
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From hlapp at gmx.net  Tue Apr 17 00:00:55 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 17 Apr 2007 00:00:55 -0400
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>

Hi Leighton, please see below:

On Apr 16, 2007, at 11:55 AM, Leighton Pritchard wrote:

> Hi,
>
> I've been trying to upload the GO into a clean BioSQL (MySQL, 1.4.1)
> schema using the BioPerl bp_load_ontology.pl script, with the OBOv1.0,
> OBOv1.2, and the most recent flatfiles from
> http://www.geneontology.org/GO.downloads.ontology.shtml - none of my
> attempts have been successful.  The errors below are from a Linux
> installation, but the same errors are thrown on OS X, too.  I am using
> the most recent versions of BioPerl and bioperl-db, installed via  
> CPAN:
>
> [lpritc at lplinuxdev sequence_data]$ perl -MBio::Root::Version -e 'print
> $Bio::Root::Version::VERSION,"\n"'
> 1.005002102
>
> and bioperl-db 1.5.2.
>
> I have attached the traceback below (running with --safe throws a  
> number
> of equivalent errors),

Using --safe will throw the same errors, but will continue loading.  
I.e., you'd lose the one term, but keep everything else.

I do realize that especially for a graph losing an internal node can  
be quite detrimental.

> [...]
> ########
>
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format obo ~/Downloads/gene_ontology_edit.obo
> Loading ontology gene_ontology:
>         ... terms
>         ... relationships
>         Done with gene_ontology.
> Loading ontology biological_process:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------

This would point to a problem of the BioPerl obo parser. According to  
the message, both the database name and the accession of the db_xref  
for the term are - surely erroneously - empty. Apparently the parser  
fails to parse out database and accession for this db_xref of term GO: 
0018901.

If you can edit the obo file, you can try deleting the db_xref(s) for  
that term that look odd (or delete all if you don't need them).

I'd have to debug the obo parser to see exactly where it's going  
wrong in parsing.

> Could not store term GO:0018901, name '2,4-dichlorophenoxyacetic acid
> metabolic process':
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> [...]
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format goflat --fmtargs ~/Downloads/GO.defs

Note that the argument for --fmtargs here should read
"-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
there is no tilde expansion.)

> ~/Downloads/function.ontology
> Loading ontology Gene Ontology:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("MetaCyc","2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RXN","0","")  
> FKs
> ()
> Duplicate entry '2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX- 
> MetaCyc-0' for
> key 2
> ---------------------------------------------------

This is one the things why you've got to love MySQL (and I am correct  
in inferring that you're using MySQL?). The width of the  
dbxref.accession column (for which the second value in parentheses  
is) is 40 chars. The apparently pre-existing value ("2\,3- 
DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX-MetaCyc-0") is 50 chars, which  
when loaded should have resulted in an exception. Instead, MySQL just  
simply and silently truncates it to 40 chars, which makes it  
identical to the first 40 chars of "2\,3-DIHYDROXYINDOLE-2\,3- 
DIOXYGENASE-RXN" (which is 41 chars in length).

It may be necessary to widen the length of dbname.accession here, for  
example to 80 chars? Let me know if you need help with the DDL  
command to do this.

Let me know how far this gets you.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lubapardo at gmail.com  Tue Apr 17 05:16:04 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 17 Apr 2007 10:16:04 +0100
Subject: [Bioperl-l] CVS AND PAML
Message-ID: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>

Dear all,
I have two questions.
1.) I am trying to download some modules from Bioperl-run via CVS but I can
not login.

$ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.

The error I get is: time out, failed to connect to the server. I have
no trouble to download other files and I installed bioperl modules via
CPAN and it works.

2) The second question I have is that I am using the PAML:CODEML
module to do phylogenetic analysis.

I have used the example provided in the HOWTO:PAML (also given as
example: pairwise_ka_ks.PL). The program does not crash but it returns
and empty object. I think the problem is in the last part of the
script because I manage to get sequences and also the alignment, but I
can not get any ka, ks value. I am not sure whether there is a bug in
the last part of the script.

Does anyone have an idea?

Thank you very much

Luba Pardo

$kaks_factory->alignment($dna_aln);

# run the KaKs analysis
my ($rc,$parser) = $kaks_factory->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();

my @otus = $result->get_seqs();
# this gives us a mapping from the PAML order of sequences back to
# the input order (since names get truncated)
my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
    my $c= 1;
    foreach my $s ( @each ) {
      last if( $s->display_id eq $_->display_id );
      $c++;
    }
    $c;
   } @otus;

print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
<http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
<http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
for( my $i = 0; $i < (scalar
<http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
$i++) {
  for( my $j = $i+1; $j < (scalar
<http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
    my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
    my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
    print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
join <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
$otus[$i]->display_id,
                         $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
                         $MLmatrix->[$i]->[$j]->{'dS'},
                         $MLmatrix->[$i]->[$j]->{'omega'},
                         sprintf
<http://www.perldoc.com/perl5.6/pod/func/sprintf.html>("%.2f",$sub_aa_aln->percentage_identity),
                         sprintf
<http://www.perldoc.com/perl5.6/pod/func/sprintf.html>("%.2f",$sub_dna_aln->percentage_identity),
                         ), "\n";
  }
}


From avilella at gmail.com  Tue Apr 17 05:25:40 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 17 Apr 2007 10:25:40 +0100
Subject: [Bioperl-l] CVS AND PAML
In-Reply-To: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
References: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
Message-ID: <358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>

hmmm, there are some perldoc links around your code snippet. can you post
the code again? what is the input data you are trying this with?

On 4/17/07, Luba Pardo <lubapardo at gmail.com> wrote:
>
> Dear all,
> I have two questions.
> 1.) I am trying to download some modules from Bioperl-run via CVS but I
> can
> not login.
>
> $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.
>
> The error I get is: time out, failed to connect to the server. I have
> no trouble to download other files and I installed bioperl modules via
> CPAN and it works.
>
> 2) The second question I have is that I am using the PAML:CODEML
> module to do phylogenetic analysis.
>
> I have used the example provided in the HOWTO:PAML (also given as
> example: pairwise_ka_ks.PL). The program does not crash but it returns
> and empty object. I think the problem is in the last part of the
> script because I manage to get sequences and also the alignment, but I
> can not get any ka, ks value. I am not sure whether there is a bug in
> the last part of the script.
>
> Does anyone have an idea?
>
> Thank you very much
>
> Luba Pardo
>
> $kaks_factory->alignment($dna_aln);
>
> # run the KaKs analysis
> my ($rc,$parser) = $kaks_factory->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
>
> my @otus = $result->get_seqs();
> # this gives us a mapping from the PAML order of sequences back to
> # the input order (since names get truncated)
> my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
>     my $c= 1;
>     foreach my $s ( @each ) {
>       last if( $s->display_id eq $_->display_id );
>       $c++;
>     }
>     $c;
>    } @otus;
>
> print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
> <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
> <http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
> Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
> for( my $i = 0; $i < (scalar
> <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
> $i++) {
>   for( my $j = $i+1; $j < (scalar
> <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
>     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>     print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
> join <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
> $otus[$i]->display_id,
>
> $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>                          $MLmatrix->[$i]->[$j]->{'dS'},
>                          $MLmatrix->[$i]->[$j]->{'omega'},
>                          sprintf
> <http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> >("%.2f",$sub_aa_aln->percentage_identity),
>                          sprintf
> <http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> >("%.2f",$sub_dna_aln->percentage_identity),
>                          ), "\n";
>   }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From IoannisKirmitzoglou at gmail.com  Tue Apr 17 09:05:37 2007
From: IoannisKirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Tue, 17 Apr 2007 06:05:37 -0700 (PDT)
Subject: [Bioperl-l] Parsing FASTA m10 output
Message-ID: <10034698.post@talk.nabble.com>


I apologize if this question has already been answered but my search came up
with no relevant results.
I am new to the FASTA program and after reading the fasta3x.doc I decided to
run it using the m10 output. The reason for doing such a choice was 

Quote from fasta3x.doc:  
     -m 10 is a new, parseable format for use with other
     programs.... 


I ran FASTA in batch mode and waited about 3-4 days for the results.
My problem is that today, when i started writing a perl script to parse the
output I realized that SearchIO doesn't supports m10 format.
Seems like I had to be more careful...
Before starting coding a module that will be able to parse the output (or
re-running FASTA with -m9 -d0 switches which will take 4 more days) I would
be really thankful if any of you knows of any other way to parse those
files?
Thanks in advance...

Ioannis Kirmitzoglou, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus

-- 
View this message in context: http://www.nabble.com/Parsing-FASTA-m10-output-tf3590568.html#a10034698
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From ewijaya at i2r.a-star.edu.sg  Tue Apr 17 09:10:00 2007
From: ewijaya at i2r.a-star.edu.sg (Edward WIJAYA)
Date: Tue, 17 Apr 2007 21:10:00 +0800
Subject: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
Message-ID: <462473B7.4070905@i2r.a-star.edu.sg>


Dear all,

How do you usually construct a graph for TFBS (binding sites) position
within their sequences? I was thinking to build something like this kind of
visualization tool:

http://research.i2r.a-star.edu.sg/Dragon/Motif_Search/cgi-bin/tmp/29740M1.html

or

http://wingless.cs.washington.edu:8080/assessment/servlet?filenameID=submission/SPACE.D9F26D506DE90E9A0A0010BB6BCCAEF3&pageType=visualizationForm&action=Visualize+It

Is there a BioPerl module to do that?

--
Edward


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From lubapardo at gmail.com  Tue Apr 17 10:01:57 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 17 Apr 2007 15:01:57 +0100
Subject: [Bioperl-l] CVS AND PAML
In-Reply-To: <358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>
References: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
	<358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>
Message-ID: <58ff33550704170701p1207ad51r271b0aff235bfd05@mail.gmail.com>

Hi,
Sorry. Bellow is the code. The part of the code that does not work is when
using the codeml module.
Thanks
Luba
# for projecting alignments from protein to R/DNA space
use Bio::Align::Utilities qw(aa_to_dna_aln);
# for input of the sequence data
use Bio::SeqIO;
use Bio::AlignIO;

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new;
my $seqdata = shift || 'cds.fa';

my $seqio = new Bio::SeqIO(-file   => $seqdata,
                           -format => 'fasta');
my %seqs;
my @prots;
# process each sequence
while ( my $seq = $seqio->next_seq ) {
    $seqs{$seq->display_id} = $seq;
    # translate them into protein
    my $protein = $seq->translate();
    my $pseq = $protein->seq();
    if( $pseq =~ /\*/ &&
        $pseq !~ /\*$/ ) {
          warn("provided a CDS sequence with a stop codon, PAML will
choke!");
          exit(0);
    }
    # Tcoffee can't handle '*' even if it is trailing
    $pseq =~ s/\*//g;
    $protein->seq($pseq);
    push @prots, $protein;
}

if( @prots < 2 ) {
    warn("Need at least 2 CDS sequences to proceed");
    exit(0);
}

open(OUT, ">align_output.txt") ||  die("cannot open output align_output for
writing");
# Align the sequences with clustalw
my $aa_aln = $aln_factory->align(\@prots);
# project the protein alignment back to CDS coordinates
my $dna_aln = aa_to_dna_aln($aa_aln, \%seqs);

my @each = $dna_aln->each_seq();

my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
                   ( -params => { 'runmode' => -2,
                                  'seqtype' => 1,
                                } );

# set the alignment object
$kaks_factory->alignment($dna_aln);

# run the KaKs analysis
my ($rc,$parser) = $kaks_factory->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();

my @otus = $result->get_seqs();
# this gives us a mapping from the PAML order of sequences back to
# the input order (since names get truncated)
my @pos = map {
    my $c= 1;
    foreach my $s ( @each ) {
      last if( $s->display_id eq $_->display_id );
      $c++;
    }
    $c;
   } @otus;

print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
CDNA_PERCENTID)),"\n";
for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
  for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
    my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
    my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
    print OUT join("\t", $otus[$i]->display_id,

$otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
                         $MLmatrix->[$i]->[$j]->{'dS'},
                         $MLmatrix->[$i]->[$j]->{'omega'},
                         sprintf("%.2f",$sub_aa_aln->percentage_identity),
                         sprintf("%.2f",$sub_dna_aln->percentage_identity),
                         ), "\n";
  }
}

On 17/04/07, Albert Vilella <avilella at gmail.com> wrote:
>
> hmmm, there are some perldoc links around your code snippet. can you post
> the code again? what is the input data you are trying this with?
>
> On 4/17/07, Luba Pardo <lubapardo at gmail.com> wrote:
>
> > Dear all,
> > I have two questions.
> > 1.) I am trying to download some modules from Bioperl-run via CVS but I
> > can
> > not login.
> >
> > $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.
> >
> > The error I get is: time out, failed to connect to the server. I have
> > no trouble to download other files and I installed bioperl modules via
> > CPAN and it works.
> >
> > 2) The second question I have is that I am using the PAML:CODEML
> > module to do phylogenetic analysis.
> >
> > I have used the example provided in the HOWTO:PAML (also given as
> > example: pairwise_ka_ks.PL). The program does not crash but it returns
> > and empty object. I think the problem is in the last part of the
> > script because I manage to get sequences and also the alignment, but I
> > can not get any ka, ks value. I am not sure whether there is a bug in
> > the last part of the script.
> >
> > Does anyone have an idea?
> >
> > Thank you very much
> >
> > Luba Pardo
> >
> > $kaks_factory->alignment($dna_aln);
> >
> > # run the KaKs analysis
> > my ($rc,$parser) = $kaks_factory->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> >
> > my @otus = $result->get_seqs();
> > # this gives us a mapping from the PAML order of sequences back to
> > # the input order (since names get truncated)
> > my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
> >     my $c= 1;
> >     foreach my $s ( @each ) {
> >       last if( $s->display_id eq $_->display_id );
> >       $c++;
> >     }
> >     $c;
> >    } @otus;
> >
> > print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
> > < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
> > <http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
> > Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
> > for( my $i = 0; $i < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
> > $i++) {
> >   for( my $j = $i+1; $j < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
> >     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
> >     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
> >     print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
> > join < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
> > $otus[$i]->display_id,
> >
> > $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
> >                          $MLmatrix->[$i]->[$j]->{'dS'},
> >                          $MLmatrix->[$i]->[$j]->{'omega'},
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_aa_aln->percentage_identity),
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_dna_aln->percentage_identity),
> >                          ), "\n";
> >   }
> > }
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>


From alexl at users.sourceforge.net  Tue Apr 17 09:54:13 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Tue, 17 Apr 2007 06:54:13 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu> (Chris Fields's
	message of "Fri\, 30 Mar 2007 23\:39\:15 -0500")
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
Message-ID: <5h4pnff6nu.fsf@delpy.biol.berkeley.edu>

On Mar 30, 2007, at 11:02 PM, Allen Day wrote:

[...]

>> If we're in agreement that the primary data sets and
>> libraries/applications for producing derivative data should not be
>> present in Fedora Extras, then it follows that the Bioperl classes
>> for manipulating these primary and derivative data should also not
>> be present in Fedora Extras as they are of little use without data
>> to manipulate.

Chris Fields wrote:

CF> I respectfully disagree.  BioPerl, to me, is a toolkit which helps
CF> accomplish certain tasks.  As with any toolkit, not all parts are
CF> required to do what one needs.  A good number of end-users use
CF> BioPerl for remote database queries
CF> (Bio::DB::GenBank/Taxonomy/etc), remote BLAST, seq analysis,
CF> alignment analysis, phylogenetic tree manipulation, etc, none of
CF> which require outside apps be installed.  For many a remote db is
CF> their primary source of data; not everybody sets up BioPerl for
CF> accessing local db records, running programs, etc (just the smart
CF> ones!).  As for outside apps, the docs are pretty explicit where
CF> certain outside resources (libxml2, expat, libgd) are needed for
CF> functionality.

CF> When we package up a new release we generally have ActiveState PPM
CF> archives available for Win32 users who want an easy way to install
CF> BioPerl.  I wouldn't have a problem if ActiveState wanted to post
CF> these to their repository.  Why would allowing someone to do the
CF> same for fedora extras be any different?

Hi all,

Given that there seems to be a reasonable consensus (including list
discussion here as well as in private e-mail) from bioperl folks that
including bioperl in Fedora is OK, I'm going ahead and building
bioperl for Fedora >= 6 (it's currently in the development branch).  I
thought about the issue carefully and this seems to makes sense for
several reasons:

1. Biopackages.net isn't currently building packages for Fedora Core 6
   and later (as Allen said, that may happen later when more build
   resources come online).  I won't build perl-bioperl for FC-5 or
   earlier to make sure that the Fedora package doesn't disrupt
   installations with the biopackages.net version.

2. Currently I've only run the the base bioperl (live) package through
   the reviewing gauntlet, but I plan to add the bioperl-run package
   as well.  Even though the bioperl-run package is intended to use
   third party packages (e.g. Clustal etc.) which may not be
   distributed with Fedora, it appears that the bioperl-run package
   contains code that can download those packages directly (albeit
   outside the RPM package system).  And some of the external tools
   could be packaged in Fedora because they have open-source licenses
   (e.g. Wise2, EMBOSS, NCBI toolkit etc.)

   Furthermore it appears the biopackages.net version of that package
   doesn't actually have "Requires:" that would automatically install
   those third-party tool that is run via bioperl (e.g. Clustal) in
   any case, so when biopackages start building for >FC-6 the Fedora
   perl-bioperl* packages can function as a drop-in replacement
   without disturbing other biopackages dependencies such as genome
   databases.

3. Third-party packages that can't be included directly in Fedora
   (such as Clustal) that can be used by bioperl-run could still be
   added via third-party repos like biopackages.net, in the same way
   that the multimedia packages gstreamer and gstreamer-plugins-good
   live in Fedora, but gstreamer-plugins-bad containing patent
   encumbered MP3 codecs with live in Livna.

Cheers,
Alex


From cjfields at uiuc.edu  Tue Apr 17 10:35:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 09:35:10 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
Message-ID: <0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>

On Apr 17, 2007, at 8:54 AM, Alex Lancaster wrote:

> Hi all,
>
> Given that there seems to be a reasonable consensus (including list
> discussion here as well as in private e-mail) from bioperl folks that
> including bioperl in Fedora is OK, I'm going ahead and building
> bioperl for Fedora >= 6 (it's currently in the development branch).  I
> thought about the issue carefully and this seems to makes sense for
> several reasons:
>
> ...
> 2. Currently I've only run the the base bioperl (live) package through
>    the reviewing gauntlet, but I plan to add the bioperl-run package
>    as well.  Even though the bioperl-run package is intended to use
>    third party packages (e.g. Clustal etc.) which may not be
>    distributed with Fedora, it appears that the bioperl-run package
>    contains code that can download those packages directly (albeit
>    outside the RPM package system).  And some of the external tools
>    could be packaged in Fedora because they have open-source licenses
>    (e.g. Wise2, EMBOSS, NCBI toolkit etc.)
...

Do you mean the bioperl core modules instead of "bioperl-live"?  We  
use the term "bioperl-live" to designate code updated regularly via  
CVS, which can be buggy depending on when it's retrieved.

I'm not sure how others feel about this, but it's probably best to  
stick with either the latest official releases (v 1.5.2 at this time)  
or even GBrowse-sponsored interim releases (which fix GBrowse-related  
bugs and normally pass tests).

chris


From hlapp at gmx.net  Tue Apr 17 11:09:45 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 17 Apr 2007 11:09:45 -0400
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>


On Apr 17, 2007, at 9:35 AM, Leighton Pritchard wrote:

> Hi Hilmar,
>
> Thanks for the very quick response.  Apologies for the long reply,  
> but I
> thought it might be useful if anyone else happens across the same
> problems that I did.

Thanks for reporting all these.

> [...]
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------
> Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
> lactonase activity':
> [...]
> I tracked this down to an apparently poor formatting of the GO.defs  
> file
> (note that the first and third definition_lines appear to be two  
> halves
> of the same entry):
>
> term: 2-pyrone-4,6-dicarboxylate lactonase activity
> goid: GO:0047554
> definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate +  
> H2O
> = 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
> definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN

I wonder whether this is the line that throws the parser off. It  
looks like the database part of the reference is missing - bad.

> definition_reference: EC:3.1.1.57
> definition_reference: MetaCyc:2-PYRONE-4
>
> I found 43 similar errors for other GOIDs, and it appears to result  
> from
> the occurrence of the string "\," in a dbxref - mostly MetaCyc  
> entries,
> but also some UM-BBD_pathwayID entries.

I'm not sure - although the string "\," might indeed trip up the  
parser, would have to investigate to confirm. Could it be a  
coincidence with definition_references that lack the database part  
before the colon?

>
> These errors appear to have followed through into the generation of  
> the
> OBO format files in each case, e.g.:
>
> def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
> 4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE- 
> LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]

Again, the first db_xref lacks the database in front of the colon. I  
can also see why "\," will trip up the parser in this format.

>
> and so is something for the GO guys to fix, I guess.

The lack of a database for certain xrefs surely is. If the escaped  
comma does throw off the BioPerl parser then that part is for BioPerl  
to fix. It does seem to extract the parts correctly, if the error  
message is any indication, though you may argue that it should remove  
the escaping backslashes (and I'd certainly agree with that).

>
>
> Another error is thrown after fixing the above, though (with the same
> command as before):
>
> Loading ontology Gene Ontology:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values  
> were
> ("GO:0006905","vesicle transport","OBSOLETE (was not defined before
> being made obsolete).","X","") FKs (1)
> Duplicate entry 'vesicle transport-1-X' for key 3
> ---------------------------------------------------
> Could not store term GO:0006905, name 'vesicle transport':
> [...]
> There are duplicate terms, identical in the term table except for  
> GOID:
> GO:0006905 and GO:0005480.  They are both "vesicle transport", and
> obsoleted:

That violates the uniqueness constraint, and this sounds more like a  
bug in the GO file. I'm also not sure what motivated them to create  
the same term multiple times only to obsolete it immediately.

> [...]
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("PMID","","0","") FKs ()
> Column 'accession' cannot be null
> ---------------------------------------------------
> Could not store term GO:0032933, name 'SREBP-mediated signaling
> pathway':
> [...]
> with the offending entry being
>
> term: SREBP-mediated signaling pathway
> goid: GO:0032933
> definition: A series of molecular signals from the endoplasmic  
> reticulum
> to the nucleus generated as a consequence of altered levels of one or
> more lipids, and resulting in the activation of transcription by  
> SREBP.
> definition_reference: GOC:mah
> definition_reference: PMID:0
>
> I commented out the definition_reference for PMID:0, which seemed  
> to fix
> matters.

Right, it seems to be a bogus reference.

>
> The process.ontology and component.ontology files then went into the
> database without a hitch.  Thanks again for your help,

Fantastic you got it all loaded!

Note that you also have the --computetc switch which will compute the  
transitive closure for you automatically.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From alexl at users.sourceforge.net  Tue Apr 17 11:13:30 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Tue, 17 Apr 2007 08:13:30 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu> (Chris Fields's
	message of "Tue\, 17 Apr 2007 09\:35\:10 -0500")
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
	<0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>
Message-ID: <nwy7krdof9.fsf@delpy.biol.berkeley.edu>

>>>>> "CF" == Chris Fields  writes:

[...]

CF> Do you mean the bioperl core modules instead of "bioperl-live"?
CF> We use the term "bioperl-live" to designate code updated regularly
CF> via CVS, which can be buggy depending on when it's retrieved.

Yes, I am referring to the core package.  Called perl-bioperl in the
Fedora naming scheme.

CF> I'm not sure how others feel about this, but it's probably best to
CF> stick with either the latest official releases (v 1.5.2 at this
CF> time) or even GBrowse-sponsored interim releases (which fix
CF> GBrowse-related bugs and normally pass tests).

Yes I am sticking to the latest official release 1.5.2_102.  The
package is here:

http://download.fedora.redhat.com/pub/fedora/linux/extras/development/i386/repoview/perl-bioperl.html

and installable via yum (on the development branch) using:

$ yum install perl-bioperl

The FC-6 package will be available soon.

Alex


From cjfields at uiuc.edu  Tue Apr 17 12:18:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:18:19 -0500
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
	<1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>

On Apr 17, 2007, at 11:05 AM, Leighton Pritchard wrote:
...
>
>>> and so is something for the GO guys to fix, I guess.
>>
>> The lack of a database for certain xrefs surely is. If the escaped
>> comma does throw off the BioPerl parser then that part is for BioPerl
>> to fix.
>
> I thinkk the problems are now all in the data I downloaded from
> http://www.geneontology.org/GO.downloads.shtml - I believe the BioPerl
> parser to be innocent of these charges ;)  I've submitted the issue at
> the GO site, and with any luck they'll handle it quite soon (if it  
> is in
> fact their problem).
>
>> Note that you also have the --computetc switch which will compute the
>> transitive closure for you automatically.
>
> :D Excellent!  Thanks for the pointer, and again for your efforts,
>
> L.
...

If you do find anything that is BioSQL- or Bioperl-related then file  
a bug report so we can track it.  I agree with Hilmar that it's  
likely the parser is partly to blame.

http://bugzilla.open-bio.org/

We really appreciate the work you're putting into this!

chris


From cjfields at uiuc.edu  Tue Apr 17 12:19:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:19:02 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <nwy7krdof9.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
	<0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>
	<nwy7krdof9.fsf@delpy.biol.berkeley.edu>
Message-ID: <3963AFE3-68B6-43F0-8A20-82A575CA8806@uiuc.edu>


On Apr 17, 2007, at 10:13 AM, Alex Lancaster wrote:

>
> [...]
>
> CF> Do you mean the bioperl core modules instead of "bioperl-live"?
> CF> We use the term "bioperl-live" to designate code updated regularly
> CF> via CVS, which can be buggy depending on when it's retrieved.
>
> Yes, I am referring to the core package.  Called perl-bioperl in the
> Fedora naming scheme.
>
> CF> I'm not sure how others feel about this, but it's probably best to
> CF> stick with either the latest official releases (v 1.5.2 at this
> CF> time) or even GBrowse-sponsored interim releases (which fix
> CF> GBrowse-related bugs and normally pass tests).
>
> Yes I am sticking to the latest official release 1.5.2_102.  The
> package is here:
>
> http://download.fedora.redhat.com/pub/fedora/linux/extras/ 
> development/i386/repoview/perl-bioperl.html
>
> and installable via yum (on the development branch) using:
>
> $ yum install perl-bioperl
>
> The FC-6 package will be available soon.
>
> Alex

Sounds good.  Thanks Alex!

chris


From ioanniskirmitzoglou at gmail.com  Tue Apr 17 12:21:36 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Tue, 17 Apr 2007 19:21:36 +0300
Subject: [Bioperl-l]  Parsing FASTA m10 output
In-Reply-To: <b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
Message-ID: <b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>

Thanks for the prompt reply...
Seems like I will have to "quit talking and begin doing"
I will post the code here in case someone else finds himself in the same
situation...

-- 
Ioannis Kirmitzoglou, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


On 17/04/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
> I am parsing FASTA outputs these days.
>
> The m 10 format is a recent implementation, not so popular yet. So, I have

> first tested the Bio::SearchIO against a default output and everything is
> fine.
>
> I think future releases of Bio::SearchIO will deal with the m10 output. By
> now, you can run all again or code a little bit to parse what you want
(not
> a hard task).
>
> T.
>
>
> On 4/17/07, Ioannis Kirmitzoglou < IoannisKirmitzoglou at gmail.com> wrote:
> >
> > I apologize if this question has already been answered but my search
came
> up
> > with no relevant results.
> > I am new to the FASTA program and after reading the fasta3x.doc I
decided
> to
> > run it using the m10 output. The reason for doing such a choice was
> >
> > Quote from fasta3x.doc:
> >      -m 10 is a new, parseable format for use with other
> >      programs....
> >
> >
> > I ran FASTA in batch mode and waited about 3-4 days for the results.
> > My problem is that today, when i started writing a perl script to parse
> the
> > output I realized that SearchIO doesn't supports m10 format.
> > Seems like I had to be more careful...
> > Before starting coding a module that will be able to parse the output
(or
> > re-running FASTA with -m9 -d0 switches which will take 4 more days) I
> would
> > be really thankful if any of you knows of any other way to parse those
> > files?
> > Thanks in advance...
> >
> > Ioannis Kirmitzoglou, MSc
> > PhD. Student,
> > Bioinformatics Research Laboratory
> > Department of Biological Sciences
> > University of Cyprus
> >
> > --
> > View this message in context:
> http://www.nabble.com/Parsing-FASTA-m10-output-tf3590568.html#a10034698
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> "The way to get started is to quit talking and begin doing."
>       Walt Disney
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================


From cjfields at uiuc.edu  Tue Apr 17 12:49:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:49:53 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
Message-ID: <639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>

You can post here or add it to Bugzilla as an enhancement request if  
the code is particularly long.

chris

On Apr 17, 2007, at 11:21 AM, Ioannis Kirmitzoglou wrote:

> Thanks for the prompt reply...
> Seems like I will have to "quit talking and begin doing"
> I will post the code here in case someone else finds himself in the  
> same
> situation...
>
> -- 
> Ioannis Kirmitzoglou, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
>
>
> On 17/04/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
>> I am parsing FASTA outputs these days.
>>
>> The m 10 format is a recent implementation, not so popular yet.  
>> So, I have
>
>> first tested the Bio::SearchIO against a default output and  
>> everything is
>> fine.
>>
>> I think future releases of Bio::SearchIO will deal with the m10  
>> output. By
>> now, you can run all again or code a little bit to parse what you  
>> want
> (not
>> a hard task).
>>
>> T.
>>
>>
>> On 4/17/07, Ioannis Kirmitzoglou < IoannisKirmitzoglou at gmail.com>  
>> wrote:
>>>
>>> I apologize if this question has already been answered but my search
> came
>> up
>>> with no relevant results.
>>> I am new to the FASTA program and after reading the fasta3x.doc I
> decided
>> to
>>> run it using the m10 output. The reason for doing such a choice was
>>>
>>> Quote from fasta3x.doc:
>>>      -m 10 is a new, parseable format for use with other
>>>      programs....
>>>
>>>
>>> I ran FASTA in batch mode and waited about 3-4 days for the results.
>>> My problem is that today, when i started writing a perl script to  
>>> parse
>> the
>>> output I realized that SearchIO doesn't supports m10 format.
>>> Seems like I had to be more careful...
>>> Before starting coding a module that will be able to parse the  
>>> output
> (or
>>> re-running FASTA with -m9 -d0 switches which will take 4 more  
>>> days) I
>> would
>>> be really thankful if any of you knows of any other way to parse  
>>> those
>>> files?
>>> Thanks in advance...
>>>
>>> Ioannis Kirmitzoglou, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>>
>>> --
>>> View this message in context:
>> http://www.nabble.com/Parsing-FASTA-m10-output- 
>> tf3590568.html#a10034698
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> --
>> "The way to get started is to quit talking and begin doing."
>>       Walt Disney
>>
>> ========================
>> Thiago Motta Venancio, MSc
>> PhD student in Bioinformatics
>> University of Sao Paulo
>> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lpritc at scri.ac.uk  Tue Apr 17 09:35:44 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 14:35:44 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
Message-ID: <1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>

Hi Hilmar, 

Thanks for the very quick response.  Apologies for the long reply, but I
thought it might be useful if anyone else happens across the same
problems that I did.

On Tue, 2007-04-17 at 00:00 -0400, Hilmar Lapp wrote:
> Apparently the parser  
> fails to parse out database and accession for this db_xref of term GO: 
> 0018901.
> 
> If you can edit the obo file, you can try deleting the db_xref(s) for  
> that term that look odd (or delete all if you don't need them).

You're spot on - see further down for details...

> Note that the argument for --fmtargs here should read
> "-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
> there is no tilde expansion.)

D'oh!  Thanks for the note - my bad, there.

> This is one the things why you've got to love MySQL (and I am correct  
> in inferring that you're using MySQL?). 

The 'choice' was forced upon me ;)

> It may be necessary to widen the length of dbname.accession here, for  
> example to 80 chars? Let me know if you need help with the DDL  
> command to do this.

I've fixed that now (and added it to my local biosqldb-mysql.sql
schema), but with a clean BioSQL schema and using:

[lpritc at lplinuxdev sql]$ bp_load_ontology.pl --host localhost --dbname
biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass ********
--format goflat --fmtargs
"-defs_file,/home/lpritc/Downloads/GO.defs" /home/lpritc/Downloads/function.ontology 

I was still getting errors with the GO flatfile:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("","","0","") FKs ()
Column 'dbname' cannot be null
---------------------------------------------------
Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
lactonase activity':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0x88497a4)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x897f074)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x8d64ad8)', '-throw',
'CODE(0x851abc8)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

I tracked this down to an apparently poor formatting of the GO.defs file
(note that the first and third definition_lines appear to be two halves
of the same entry):

term: 2-pyrone-4,6-dicarboxylate lactonase activity
goid: GO:0047554
definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O
= 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
definition_reference: EC:3.1.1.57
definition_reference: MetaCyc:2-PYRONE-4

I found 43 similar errors for other GOIDs, and it appears to result from
the occurrence of the string "\," in a dbxref - mostly MetaCyc entries,
but also some UM-BBD_pathwayID entries.

These errors appear to have followed through into the generation of the
OBO format files in each case, e.g.:

def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE-LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]

and so is something for the GO guys to fix, I guess.


Another error is thrown after fixing the above, though (with the same
command as before):

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were
("GO:0006905","vesicle transport","OBSOLETE (was not defined before
being made obsolete).","X","") FKs (1)
Duplicate entry 'vesicle transport-1-X' for key 3
---------------------------------------------------
Could not store term GO:0006905, name 'vesicle transport':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbcac418)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x957805c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x995db20)', '-throw',
'CODE(0x9113bd0)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

There are duplicate terms, identical in the term table except for GOID:
GO:0006905 and GO:0005480.  They are both "vesicle transport", and
obsoleted:

term: vesicle transport
goid: GO:0005480
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because it represents a biological
process and not a molecular function. To update annotations, use the
biological process term 'vesicle-mediated transport ; GO:0016192'.

term: vesicle transport
goid: GO:0006905
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because the meaning of the term is
ambiguous. To update annotations, consider the biological process term
'vesicle-mediated transport ; GO:0016192'.

I used the --noobsolete flag to avoid this error - reasoning that since
I'm populating the database for the first time, ignoring the obsolete
terms won't hurt - but finally this error was thrown:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("PMID","","0","") FKs ()
Column 'accession' cannot be null
---------------------------------------------------
Could not store term GO:0032933, name 'SREBP-mediated signaling
pathway':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbe18f14)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x99bbf2c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x9da0ad8)', '-throw',
'CODE(0x9556bb4)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

with the offending entry being 

term: SREBP-mediated signaling pathway
goid: GO:0032933
definition: A series of molecular signals from the endoplasmic reticulum
to the nucleus generated as a consequence of altered levels of one or
more lipids, and resulting in the activation of transcription by SREBP.
definition_reference: GOC:mah
definition_reference: PMID:0

I commented out the definition_reference for PMID:0, which seemed to fix
matters.

The process.ontology and component.ontology files then went into the
database without a hitch.  Thanks again for your help,

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From lpritc at scri.ac.uk  Tue Apr 17 12:05:16 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 17:05:16 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
Message-ID: <1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>

Hello again,

On Tue, 2007-04-17 at 11:09 -0400, Hilmar Lapp wrote:
> Thanks for reporting all these.

No problem at all.

> On Apr 17, 2007, at 9:35 AM, Leighton Pritchard wrote:
> > term: 2-pyrone-4,6-dicarboxylate lactonase activity
[...]
> > definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
> 
> I wonder whether this is the line that throws the parser off. It  
> looks like the database part of the reference is missing - bad.

> > definition_reference: MetaCyc:2-PYRONE-4

I don't think the parser is to blame, here.  Note that if you join the
definition_reference strings from the GO.defs file, you get:

MetaCyc:2-PYRONE-4:6-DICARBOXYLATE-LACTONASE-RXN

Then if you replace the colon by "\," you get what should (I think)
actually be the MetaCyc entry:

MetaCyc:2-PYRONE-4\,6-DICARBOXYLATE-LACTONASE-RXN

> > I found 43 similar errors for other GOIDs, and it appears to result  
> > from
> > the occurrence of the string "\," in a dbxref - mostly MetaCyc  
> > entries,
> > but also some UM-BBD_pathwayID entries.
> 
> I'm not sure - although the string "\," might indeed trip up the  
> parser, would have to investigate to confirm. Could it be a  
> coincidence with definition_references that lack the database part  
> before the colon?

Inspecting the troublesome entries by eye seems to turn up the same
problem as above consistently: a GO term in the GO.defs file is
malformed.  The term should have a definition_reference field describing
a MetaCyc entry that matches the term field.  In the term string, there
would be an escaped comma, but the string ends where we expect this.
The string that would follow the escaped comma is present as the first
definition_reference.

This observation also extends to cases where there should be two
occurrences of "\," in the MetaCyc field, e.g.:

term: 2,3-dihydroxyindole 2,3-dioxygenase activity
goid: GO:0047528
definition: Catalysis of the reaction: 2,3-dihydroxyindole + O2 =
anthranilate + CO2.
definition_reference: :3-DIHYDROXYINDOLE-2
definition_reference: :3-DIOXYGENASE-RXN
definition_reference: EC:1.13.11.2
definition_reference: MetaCyc:2

It then appears as though the GO flatfiles were used automatically to
generate the OBO format files, and propagated the same error into the
square brackets in each case.

> > and so is something for the GO guys to fix, I guess.
> 
> The lack of a database for certain xrefs surely is. If the escaped  
> comma does throw off the BioPerl parser then that part is for BioPerl  
> to fix. 

I thinkk the problems are now all in the data I downloaded from
http://www.geneontology.org/GO.downloads.shtml - I believe the BioPerl
parser to be innocent of these charges ;)  I've submitted the issue at
the GO site, and with any luck they'll handle it quite soon (if it is in
fact their problem).

> Note that you also have the --computetc switch which will compute the  
> transitive closure for you automatically.

:D Excellent!  Thanks for the pointer, and again for your efforts,

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From stefan.kirov at bms.com  Tue Apr 17 11:09:30 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 17 Apr 2007 11:09:30 -0400
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph with
	Perl]
Message-ID: <4624E32A.6010704@bms.com>

Missed to send this to the list....
Stefan
-------------- next part --------------
An embedded message was scrubbed...
From: Stefan Kirov <stefan.kirov at bms.com>
Subject: Re: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
Date: Tue, 17 Apr 2007 10:30:11 -0400
Size: 2262
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070417/cc49d62a/attachment-0002.mht>

From lpritc at scri.ac.uk  Tue Apr 17 12:55:38 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 17:55:38 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
	<1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
	<146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>
Message-ID: <1176828938.988.133.camel@lplinuxdev.scri.sari.ac.uk>

Hi Chris,

On Tue, 2007-04-17 at 11:18 -0500, Chris Fields wrote:
> If you do find anything that is BioSQL- or Bioperl-related then file  
> a bug report so we can track it.  I agree with Hilmar that it's  
> likely the parser is partly to blame.
> 
> http://bugzilla.open-bio.org/

I've submitted a bug report, mostly replicating my first post in this
thread.  I added links to the appropriate point in the list archives so
that the rest of the discussion can be considered, too.

> We really appreciate the work you're putting into this!

Thanks - I'm just grateful that the Bio* repertoire is there at all so
that my problems are relatively minor (as opposed to attempting to
replicate the functionality independently).

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From lstein at cshl.edu  Tue Apr 17 13:47:25 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 17 Apr 2007 13:47:25 -0400
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <C2340DDA.D83F%bosborne11@verizon.net>
References: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<C2340DDA.D83F%bosborne11@verizon.net>
Message-ID: <6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>

Hi,

I've been updating the WIKI in anticipation of a new GBrowse release and
have added a "stub" for the biopackages.net install. Since I don't use yum
(I've been running Slackware for ages and have recently started working with
Ubuntu) I'm not sure I got the details right. Could someone check?


        http://www.gmod.org/wiki/index.php/GBrowse_RPM_HOWTO

Also, I think some verbiage on how to use yum to install MySQL and Apache
would be great, since it will be consistent with the Ubuntu install page.

Thanks,

Lincoln

On 3/31/07, Brian Osborne <bosborne11 at verizon.net> wrote:
>
> Allen et al.,
>
> What happened to the "GMOD" package or packages? I've had some
> conversations
> in the past few months with you-all suggesting that a GMOD package, or
> packages, would be useful.
>
> Brian O.
>
>
>
>
> On 3/30/07 8:30 PM, "Allen Day" <allenday at gmail.com> wrote:
>
> > Hi Alex,
> >
> > You've aptly noted that there are several classes of packages being
> > discussed here, and that they should not be treated equally.  From my
> > point of view and of specific relevance to the Bioperl community we
> > have at least:
> >
> > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran
> > dependencies.  These should all be in Fedora Extras, as they are of
> > general utility.  Biopackages.net currently hosts about 200 packages
> > (.spec files, specifically) that are like this.  Maybe 80 of these are
> > needed for Bioperl.
> >
> > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan,
> > etc.  From what I've seen, these typically have strange/custom
> > licenses that may not be valid for some users.  BLAT has a dual
> > licensing scheme for academic and non-academic licensees, for
> > instance.  These packages are not of general utility.  For these two
> > reasons, my stance is that they should not be included in Fedora
> > Extras.
> >
> > 3) Bioperl packages.  Several subsets here.  The Bioperl-run libraries
> > depend directly on type (2) packages, so aren't appropriate to include
> > in Fedora Extras.  Bioperl-live is not really that useful without type
> > (2) packages.  It is also sensible to all of the keep the Bioperl-*
> > packages in the same repository.  For these reasons, my stance is that
> > they should not be included in Fedora Extras.
> >
> > 4) Bioinformatics / Comp. Bio. data sets.  These don't have licensing
> > problems, but they tend to be large.  Usually in the 10E7 - 10E10 byte
> > range.  RPM can not even generate correct metadata for some of them
> > correctly if the files are too large (overflow problems).  Probably
> > not appropriate to put in Fedora Extras because they are too large and
> > not generally useful.
> >
> > 5) Bioinformatics-specific System databases / daemons.  These
> > high-level packages depend on types (2), (3), and (4), and so are not
> > appropriate to put into Fedora Extras.  An example is a BLAT daemon,
> > which relies on the BLAT server, as well as NIB-formatted genome
> > sequence files.
> >
> > That said, there are a lot of type (1) packages in the Biopackages.net
> > repository.  If you're interested in migrating the spec files from our
> > repository to the Fedora project it would save us (the Biopackages.net
> > maintainers) a ton of build and maintenance time, so please feel free
> > to take them, just let us know.  If we can reach some agreement on
> > where the bioinformatics-specific packages should be maintained/built
> > we may be able to work together on these as well.
> >
> > -Allen
> >
> >
> > On 3/30/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> >>>>>>> "AD" == Allen Day  writes:
> >>
> >> AD> Hi Alex, The Biopackages.net project is still active, we are
> >> AD> regularly adding packages to it, mostly R packages lately.  Most
> >> AD> of the systems we use are running CentOS at this point, which is
> >> AD> why you have not seen support for FC6 yet.  There is nothing
> >> AD> preventing building FC6 packages aside from lack of time to set up
> >> AD> the FC6 build farm nodes.
> >>
> >> Hi Allen and other,
> >>
> >> Great news to hear that Biopackages.net is still active!  I would like
> >> to help out if possible.  I don't believe in "FUD" either... ;)
> >>
> >> AD> If you're interested in packaging BioPerl or other
> >> AD> bioinformatics-related software, please join the Biopackages
> >> AD> project on SourceForge.  We object to the Fedora Extras FUD
> >> AD> tactics used to discourage people from using 3rd party
> >> AD> repositories, and suspect they may not want to host some of our
> >> AD> data packages, such as the >2GB genome packages.  Biopackages
> >> AD> project is likely to partially merge with RPMForge.  We are
> >> AD> already discussing with them how best to do it.
> >>
> >> The packages that I created which are currently available in Fedora
> >> Packages are Perl dependencies which, as I said are useful for
> >> packages outside the bioinformatics purview.  I do have a (base)
> >> bioperl package in review, but it is not yet released.
> >>
> >> As for third-party repos, I don't object to them at all, and for some
> >> kinds of projects they are indeed appropriate. (e.g. for non-free
> >> stuff like Livna or Freshrpms).  However I do have practical concerns
> >> about repository mixing, but I think that it does need to be handled
> >> carefully but that co-operation between Fedora and third-party repos
> >> can make it work.
> >>
> >> For example, one practical concern is that as of the
> >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there
> >> will be no distinction at the repository-level between formerly Extras
> >> packages and formerly Core packages (as of now there are only "Fedora
> >> Packages"), which means that it will not be possible for third-party
> >> repos to limit their dependencies to just those in a former base set
> >> (i.e. excluding Extras).
> >>
> >> I agree that a few years ago (circa 2003-2004) there was concern about
> >> the way some third party repositories were treated somewhat badly by
> >> the (then) Fedora Extras (with some people going so far as to say that
> >> third-party repos were bad in principle and should always be ignored
> >> which I disagree with too).  But it seems to me that culture has
> >> shifted since, with some notable packagers such as Matthias Saou (of
> >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to
> >> Fedora itself.  The process of contributing has also become much
> >> simpler and reviews are conducted speedily and efficiently, I had
> >> packages in the repository in a matter of a few days from initial
> >> submission.  Freshrpms itself now enables and depends on the (old)
> >> Extras.
> >>
> >> The real question for me, then is what packages it makes sense to go
> >> in Fedora, and what packages go in third party repositories.  It seems
> >> to me that in the case of Perl packages which could be dependencies
> >> for other packages not specific to the third-party repo in question,
> >> it makes sense for them to go into Fedora itself, so I think I will
> >> continue to package them.  This lessens the load on the third-party
> >> repo, while making them available for all other third-party repos.
> >> (This is approach that Freshrpms seems to be taking, Matthias has
> >> contributed most packages back to Fedora now other than the non-free
> >> ones).
> >>
> >> At the other end of the spectrum are packages like you mention, genome
> >> packages, which may be of concern because of their size and/or highly
> >> specialised nature, and, as you say, may make sense to go in a
> >> third-party repo like Biopackages.net.  Also packages which can't be
> >> packaged by Fedora for legal reasons like Clustal could/should go in
> >> Biopackages.net.
> >>
> >> In the middle are packages like bioperl itself which are potentially
> >> useful to perhaps a wider group of people than the genome packages but
> >> may not necessarily be dependencies for other packages.  I lean
> >> towards making them part of Fedora so that they will be available of
> >> out the box on the planned "Everything" DVD ISO, but I welcome a
> >> discussion on this.
> >>
> >> As I said, I'm glad to hear that Biopackages.net is alive and well and
> >> I welcome a discussion on how upstream Fedora can usefully interact
> >> with Biopackages.net (I guess perhaps on the Biopackages.net list).
> >>
> >> Regards,
> >> Alex
> >>
> >> PS.  As the upstream author If you could clarify the license on
> >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be great.
> >> --
> >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of
> Arizona
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From alexl at users.sourceforge.net  Wed Apr 18 04:50:51 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 18 Apr 2007 01:50:51 -0700
Subject: [Bioperl-l] bioperl-run and Bio::Root::AccessorMaker
Message-ID: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>

In packaging bioperl-run for Fedora, I think I stumbled across a bug
in the bioperl-run package.  It appears from this edit:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl

that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
bioperl-run 1.5.2_100 still contains modules that use this module:

$ cd bioperl-run-1.5.2_100
$ grep -r AccessorMaker  *
Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
class min_version)]);
Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
('$'=>[qw(input_file output_file)]);

This causes the automatic Perl dependency generator for RPM to add
Bio::Root::AccessorMake as a requires which means RPM will refuse to
install perl-bioperl-run because it's looking for the now-removed-
from-Core-bioperl module

$ sudo rpm -Uvh --test
/home/alex/rpmbuild/RPMS/noarch/perl-bioperl-run-1.5.2_100-1.noarch.rpm 
error: Failed dependencies:
        perl(Bio::Root::AccessorMaker) is needed by
        perl-bioperl-run-1.5.2_100-1.noarch

Are the SDI and JavaRunner modules being actively developed?  What's
the best course of action for these modules, should I just exclude
them from the package for now? since they won't work, even if if you
tell RPM to ignore the dependency warning.

Alex


From shameer at ncbs.res.in  Wed Apr 18 06:16:07 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Wed, 18 Apr 2007 15:46:07 +0530 (IST)
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph
 with Perl]
In-Reply-To: <4624E32A.6010704@bms.com>
References: <4624E32A.6010704@bms.com>
Message-ID: <36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>

Hi,

I am also interested to use the Bio::Graphics modules from dynamic image
display. I have a doubt,  I tried all the sample programs explained in
this page http://stein.cshl.org/genome_informatics/BioGraphics/index.html.
Is it possible to generate a png/jpg/gif image from this module by
altering the same program. Currently its using diplay option. I know this
can be done by using GD/Image::MAgick in Perl. But Is there any quick way
to accomplish it in BioPerl .

Thanks,


> Missed to send this to the list....
> Stefan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From sdavis2 at mail.nih.gov  Wed Apr 18 07:18:48 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 18 Apr 2007 07:18:48 -0400
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph
	with Perl]
In-Reply-To: <36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
References: <4624E32A.6010704@bms.com>
	<36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
Message-ID: <200704180718.48811.sdavis2@mail.nih.gov>

On Wednesday 18 April 2007 06:16, Shameer Khadar wrote:
> Hi,
>
> I am also interested to use the Bio::Graphics modules from dynamic image
> display. I have a doubt,  I tried all the sample programs explained in
> this page http://stein.cshl.org/genome_informatics/BioGraphics/index.html.
> Is it possible to generate a png/jpg/gif image from this module by
> altering the same program. Currently its using diplay option. 

You just need to print $panel->png to a file.

Sean


From bix at sendu.me.uk  Wed Apr 18 07:48:27 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 12:48:27 +0100
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
Message-ID: <4626058B.8090801@sendu.me.uk>

Alex Lancaster wrote:
> In packaging bioperl-run for Fedora, I think I stumbled across a bug
> in the bioperl-run package.  It appears from this edit:
> 
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
> 
> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
> bioperl-run 1.5.2_100 still contains modules that use this module:
> 
> $ cd bioperl-run-1.5.2_100
> $ grep -r AccessorMaker  *
> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
> class min_version)]);
> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
> ('$'=>[qw(input_file output_file)]);

It looks like I've implemented a similar idea to AccessorMaker and 
AbstractRunner in Bio::Root::Root->_set_from_args() and 
Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses 
AbstractRunner I propose deprecating it immediately.

Forester::SDI and JavaRunner have no tests which is why we didn't notice 
the problem. Since they've been out of use for a number of years now I 
also propose their immediate deprecation. Alternatively, it may not be 
too difficult to just update them to use _set_from_args and _setparams, 
but I've nothing to test against (and JavaRunner is self-described as 
"probably incomplete").


I can remove the modules from cvs and create bioperl-run-1.5.2_101, 
resolving the packaging issue. I plan on doing precisely this within the 
next seven days unless someone puts a hand up to stop me.


[BCC: author, Juguang Xiao]


From cjfields at uiuc.edu  Wed Apr 18 08:43:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 07:43:45 -0500
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <4626058B.8090801@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>


On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:

> It looks like I've implemented a similar idea to AccessorMaker and
> AbstractRunner in Bio::Root::Root->_set_from_args() and
> Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
> AbstractRunner I propose deprecating it immediately.

JavaRunner is-a AbstractRunner, but what you propose below takes care  
of that.

> Forester::SDI and JavaRunner have no tests which is why we didn't  
> notice
> the problem. Since they've been out of use for a number of years now I
> also propose their immediate deprecation. Alternatively, it may not be
> too difficult to just update them to use _set_from_args and  
> _setparams,
> but I've nothing to test against (and JavaRunner is self-described as
> "probably incomplete").
>
>
> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
> resolving the packaging issue. I plan on doing precisely this  
> within the
> next seven days unless someone puts a hand up to stop me.
>
>
> [BCC: author, Juguang Xiao]

I suppose you could just remove the modules from the branch for now,  
but (as you point out) the code appears largely incomplete, so might  
as well deprecate the entire lot.  The code will be in the 'attic'  
once removed if anyone's really interested in it.

You've forwarded the author and the mail list so let's see what the  
response is (if any)...

chris


From cjfields at uiuc.edu  Wed Apr 18 11:30:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 10:30:45 -0500
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <462634DB.2040701@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
Message-ID: <143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>


On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>>> resolving the packaging issue. I plan on doing precisely this  
>>> within the
>>> next seven days unless someone puts a hand up to stop me.
>>>
>>> [BCC: author, Juguang Xiao]
> [snip]
>> You've forwarded the author and the mail list so let's see what  
>> the response is (if any)...
>
> Unfortunately the mail was undeliverable, and I have no other  
> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few  
> more days for other responses on the list.
>
> I never made a branch for bioperl-run 1.5.2, so they'd be removed  
> from HEAD.

It might be a good idea to repost this using the module names  
affected in the subject, just in case, though the last post he made  
on the mail list was ~3 years ago using the same email:

http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/ 
match=xiao

He may be MIA.

chris


From bix at sendu.me.uk  Wed Apr 18 11:10:19 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 16:10:19 +0100
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
Message-ID: <462634DB.2040701@sendu.me.uk>

Chris Fields wrote:
> 
> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
> 
>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>> resolving the packaging issue. I plan on doing precisely this within the
>> next seven days unless someone puts a hand up to stop me.
>>
>> [BCC: author, Juguang Xiao]
[snip]
> You've forwarded the author and the mail list so let's see what the 
> response is (if any)...

Unfortunately the mail was undeliverable, and I have no other address 
for Juguang (I tried juguang at tll.org.sg). I'll wait a few more days for 
other responses on the list.

I never made a branch for bioperl-run 1.5.2, so they'd be removed from HEAD.


From hlapp at gmx.net  Wed Apr 18 11:59:52 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 18 Apr 2007 11:59:52 -0400
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
	<143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
Message-ID: <EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>

There is a Juguang Xiao at juguang.swf at gmail.com. Not sure he's  
the same, but sounds like it's a geek at least. (google and you'll  
see; has anyone here ever heard about neko??)

	-hilmar

On Apr 18, 2007, at 11:30 AM, Chris Fields wrote:

>
> On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:
>
>> Chris Fields wrote:
>>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>>>> resolving the packaging issue. I plan on doing precisely this
>>>> within the
>>>> next seven days unless someone puts a hand up to stop me.
>>>>
>>>> [BCC: author, Juguang Xiao]
>> [snip]
>>> You've forwarded the author and the mail list so let's see what
>>> the response is (if any)...
>>
>> Unfortunately the mail was undeliverable, and I have no other
>> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few
>> more days for other responses on the list.
>>
>> I never made a branch for bioperl-run 1.5.2, so they'd be removed
>> from HEAD.
>
> It might be a good idea to repost this using the module names
> affected in the subject, just in case, though the last post he made
> on the mail list was ~3 years ago using the same email:
>
> http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/
> match=xiao
>
> He may be MIA.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Apr 18 12:00:49 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 18 Apr 2007 12:00:49 -0400
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <4626058B.8090801@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <9159C9DF-41BC-46AA-8511-763AD9B7A3D0@gmx.net>

sounds good to me - the less cruft the better. -hilmar
On Apr 18, 2007, at 7:48 AM, Sendu Bala wrote:

> Alex Lancaster wrote:
>> In packaging bioperl-run for Fedora, I think I stumbled across a bug
>> in the bioperl-run package.  It appears from this edit:
>>
>> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/ 
>> Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
>>
>> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
>> bioperl-run 1.5.2_100 still contains modules that use this module:
>>
>> $ cd bioperl-run-1.5.2_100
>> $ grep -r AccessorMaker  *
>> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
>> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw 
>> (jar
>> class min_version)]);
>> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
>> ('$'=>[qw(input_file output_file)]);
>
> It looks like I've implemented a similar idea to AccessorMaker and
> AbstractRunner in Bio::Root::Root->_set_from_args() and
> Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
> AbstractRunner I propose deprecating it immediately.
>
> Forester::SDI and JavaRunner have no tests which is why we didn't  
> notice
> the problem. Since they've been out of use for a number of years now I
> also propose their immediate deprecation. Alternatively, it may not be
> too difficult to just update them to use _set_from_args and  
> _setparams,
> but I've nothing to test against (and JavaRunner is self-described as
> "probably incomplete").
>
>
> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
> resolving the packaging issue. I plan on doing precisely this  
> within the
> next seven days unless someone puts a hand up to stop me.
>
>
> [BCC: author, Juguang Xiao]
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Apr 18 12:25:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 11:25:54 -0500
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
	<143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
	<EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>
Message-ID: <E0195EBD-731D-4915-91AD-7FFE1FA9F608@uiuc.edu>

My guess is the hilmar's is the most current as posts were made this  
year.  I found another email: juguang at fugu-sg.org.  Looks like he  
added some stuff to Ensembl a while back (sorry about the long URL).

http://www.ensembl.org/info/software/Pdoc/ensembl/modules/Bio/EnsEMBL/ 
Utils/Converter/ens_bio_featurePair_raw.html

chris

On Apr 18, 2007, at 10:59 AM, Hilmar Lapp wrote:

> There is a Juguang Xiao at juguang.swf at gmail.com. Not sure he's
> the same, but sounds like it's a geek at least. (google and you'll
> see; has anyone here ever heard about neko??)
>
> 	-hilmar
>
> On Apr 18, 2007, at 11:30 AM, Chris Fields wrote:
>
>>
>> On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:
>>
>>> Chris Fields wrote:
>>>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>>>> I can remove the modules from cvs and create bioperl- 
>>>>> run-1.5.2_101,
>>>>> resolving the packaging issue. I plan on doing precisely this
>>>>> within the
>>>>> next seven days unless someone puts a hand up to stop me.
>>>>>
>>>>> [BCC: author, Juguang Xiao]
>>> [snip]
>>>> You've forwarded the author and the mail list so let's see what
>>>> the response is (if any)...
>>>
>>> Unfortunately the mail was undeliverable, and I have no other
>>> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few
>>> more days for other responses on the list.
>>>
>>> I never made a branch for bioperl-run 1.5.2, so they'd be removed
>>> from HEAD.
>>
>> It might be a good idea to repost this using the module names
>> affected in the subject, just in case, though the last post he made
>> on the mail list was ~3 years ago using the same email:
>>
>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/
>> match=xiao
>>
>> He may be MIA.
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 18 12:37:55 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 17:37:55 +0100
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
Message-ID: <46264963.9020306@sendu.me.uk>

Hi all,

t/DB.t is currently failing tests 40 and 41:

ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
                                          '-ids' => [qw(J00522 AF303112 
2981014)],
                                          -verbose => 1);

cmp_ok $query->count, '>', 0;

You can see that 
http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522%2CAF303112%2C2981014&retmax=100 
gives no results, where presumably it used to give 3. querying on the 3 
ids individually works fine. So... what changed and how do we get around it?


From cjfields at uiuc.edu  Wed Apr 18 13:05:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 12:05:12 -0500
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <46264963.9020306@sendu.me.uk>
References: <46264963.9020306@sendu.me.uk>
Message-ID: <6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>

I can verify on this end.  Not sure why, but the same accessions are  
used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)  
with success.

chris

On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:

> Hi all,
>
> t/DB.t is currently failing tests 40 and 41:
>
> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>                                           '-ids' => [qw(J00522  
> AF303112
> 2981014)],
>                                           -verbose => 1);
>
> cmp_ok $query->count, '>', 0;
>
> You can see that
> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi? 
> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522% 
> 2CAF303112%2C2981014&retmax=100
> gives no results, where presumably it used to give 3. querying on  
> the 3
> ids individually works fine. So... what changed and how do we get  
> around it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Apr 18 14:07:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 13:07:22 -0500
Subject: [Bioperl-l] Skipping/Failing tests
Message-ID: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>

To the BioPerl community at large,

I have noticed a problem with some BioPerl tests when converting to  
Test::More.  When using the following:

     while ($seq = $seqin->next_seq) {
         my $acc = $seq->accession;
         ok exists $result{ $acc };
         is $seq->length, $result{ $acc };
         delete $result{$acc};
     }

if $seq is undef then the test plan is off by a factor of 2 for every  
iteration of the loop.  Two serious problems:

1) No specific failures are seen until the end of the test suite when  
the test plan doesn't match the number of tests (which could be  
several hundred lines away from the actual failure).
2) Worse, if one were lazy enough to not track the actual number of  
tests (heh, not that would happen) they could inadvertently change  
the test plan to match the final number of tests.

There are several ways to work around this, such as using a counter  
to track the number of iterations and check to make sure they pass:

     $ct = 0;
     while ($seq = $seqin->next_seq) {
         $ct++;
         my $acc = $seq->accession;
         ok exists $result{ $acc };
         is $seq->length, $result{ $acc };
         delete $result{$acc};
     }
     is($ct, 3);

Here, if $ct is 0 you'll get an error.  However, the test count will  
still be off at the end (the test plan will be off by 6 tests).

My opinion is that we should try to match the plan, as a single fail  
doesn't reflect the severity of the bug (i.e. it should fail each  
test per iteration, as expected).  Skipping to match is an option as  
well (one I've used) but again doesn't reflect the severity of the  
problem in my opinion.  The flip side is that some consider any  
failed test significant, so there is no reason to try matching the  
tests up.

What I would like to do is hammer out something we can add to the  
Writing Tests HOWTO which addresses some ways to deal with the above  
for those who want to contribute code and tests to BioPerl.  I'm  
looking for some (any) additional opinions on the matter (or, if you  
have the initiative, adding some ideas to the HOWTO itself).

http://www.bioperl.org/wiki/Special:Recentchanges

Thanks!

chris


From ki.baik at roche.com  Wed Apr 18 14:32:35 2007
From: ki.baik at roche.com (Baik, Ki)
Date: Wed, 18 Apr 2007 11:32:35 -0700
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
References: <46264963.9020306@sendu.me.uk>
	<6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
Message-ID: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>

I have had similar problems in which a couple of accession numbers out
of a series were not retrieved, yet they do exist in ncbi.

Ki Baik

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Wednesday, April 18, 2007 10:05 AM
To: Sendu Bala
Cc: bioperl-l
Subject: Re: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures

I can verify on this end.  Not sure why, but the same accessions are  
used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)  
with success.

chris

On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:

> Hi all,
>
> t/DB.t is currently failing tests 40 and 41:
>
> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>                                           '-ids' => [qw(J00522  
> AF303112
> 2981014)],
>                                           -verbose => 1);
>
> cmp_ok $query->count, '>', 0;
>
> You can see that
> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi? 
> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522% 
> 2CAF303112%2C2981014&retmax=100
> gives no results, where presumably it used to give 3. querying on  
> the 3
> ids individually works fine. So... what changed and how do we get  
> around it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed Apr 18 15:12:29 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 18 Apr 2007 14:12:29 -0500
Subject: [Bioperl-l] Skipping/Failing tests
In-Reply-To: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>
References: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>
Message-ID: <46266D9D.1050703@campus.iztacala.unam.mx>

Hey Chris,

I don't know if this helps those working on the test suite but, there's 
a recently-cooked recipe for keeping track on the number of tests (thus 
helping to update the test plan accordingly):

http://www.perl.com/pub/a/2007/04/12/lightning-four.html?page=3

My quick .2 cents :)

Cheers,
Mauricio.

Chris Fields wrote:
> To the BioPerl community at large,
> 
> I have noticed a problem with some BioPerl tests when converting to  
> Test::More.  When using the following:
> 
>      while ($seq = $seqin->next_seq) {
>          my $acc = $seq->accession;
>          ok exists $result{ $acc };
>          is $seq->length, $result{ $acc };
>          delete $result{$acc};
>      }
> 
> if $seq is undef then the test plan is off by a factor of 2 for every  
> iteration of the loop.  Two serious problems:
> 
> 1) No specific failures are seen until the end of the test suite when  
> the test plan doesn't match the number of tests (which could be  
> several hundred lines away from the actual failure).
> 2) Worse, if one were lazy enough to not track the actual number of  
> tests (heh, not that would happen) they could inadvertently change  
> the test plan to match the final number of tests.
> 
> There are several ways to work around this, such as using a counter  
> to track the number of iterations and check to make sure they pass:
> 
>      $ct = 0;
>      while ($seq = $seqin->next_seq) {
>          $ct++;
>          my $acc = $seq->accession;
>          ok exists $result{ $acc };
>          is $seq->length, $result{ $acc };
>          delete $result{$acc};
>      }
>      is($ct, 3);
> 
> Here, if $ct is 0 you'll get an error.  However, the test count will  
> still be off at the end (the test plan will be off by 6 tests).
> 
> My opinion is that we should try to match the plan, as a single fail  
> doesn't reflect the severity of the bug (i.e. it should fail each  
> test per iteration, as expected).  Skipping to match is an option as  
> well (one I've used) but again doesn't reflect the severity of the  
> problem in my opinion.  The flip side is that some consider any  
> failed test significant, so there is no reason to try matching the  
> tests up.
> 
> What I would like to do is hammer out something we can add to the  
> Writing Tests HOWTO which addresses some ways to deal with the above  
> for those who want to contribute code and tests to BioPerl.  I'm  
> looking for some (any) additional opinions on the matter (or, if you  
> have the initiative, adding some ideas to the HOWTO itself).
> 
> http://www.bioperl.org/wiki/Special:Recentchanges
> 
> Thanks!
> 
> chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Wed Apr 18 15:41:56 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 14:41:56 -0500
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
References: <46264963.9020306@sendu.me.uk>
	<6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
	<6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
Message-ID: <208DCD0F-6A0B-4054-A1C7-D599D32AC344@uiuc.edu>

The problem appears to be with eutils.  Using bare accession numbers  
no longer works with esearch (which Bio::DB::Query::GenBank uses).   
Using them via efetch still works, which explains why  
Bio::DB::GenBank passes tests using the same accession/GI mix.

NCBI has added an extra field descriptor specifically for accessions  
in esearch, which means any queries with accessions must look like  
the following (the last is a GI):

'J00522[accession] OR AF303112[accession] OR 2981014'

'J00522[accession] | AF303112[accession] | 2981014' also works.

We could separate them into two groups based on presence of letters  
and set up the query that way, or we can define exactly what kind of  
ID is acceptable for passing to ids() (GI or accession), or have ids 
() be GI and have a new method for accessions (or vice versa).   
Thoughts?

chris

On Apr 18, 2007, at 1:32 PM, Baik, Ki wrote:

> I have had similar problems in which a couple of accession numbers out
> of a series were not retrieved, yet they do exist in ncbi.
>
> Ki Baik
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
> Fields
> Sent: Wednesday, April 18, 2007 10:05 AM
> To: Sendu Bala
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
>
> I can verify on this end.  Not sure why, but the same accessions are
> used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)
> with success.
>
> chris
>
> On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:
>
>> Hi all,
>>
>> t/DB.t is currently failing tests 40 and 41:
>>
>> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>>                                           '-ids' => [qw(J00522
>> AF303112
>> 2981014)],
>>                                           -verbose => 1);
>>
>> cmp_ok $query->count, '>', 0;
>>
>> You can see that
>> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?
>> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522%
>> 2CAF303112%2C2981014&retmax=100
>> gives no results, where presumably it used to give 3. querying on
>> the 3
>> ids individually works fine. So... what changed and how do we get
>> around it?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From boconnor at ucla.edu  Wed Apr 18 15:00:32 2007
From: boconnor at ucla.edu (Brian O'Connor)
Date: Wed, 18 Apr 2007 12:00:32 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>
References: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>	
	<C2340DDA.D83F%bosborne11@verizon.net>
	<6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>
Message-ID: <46266AD0.3070209@ucla.edu>

Hey Lincoln,

This looks good but the configuration step is about to change for 
Biopackages.  I'm writing config RPMs today so the end user can just 
install the config RPM for their distro and they don't have to manually 
change the yum.conf file.  It will also install the biopackages gpg key 
too so we can support signed packages.  I'll update the wiki when these 
config RPMs are available.

--Brian

Lincoln Stein wrote:

> Hi,
>
> I've been updating the WIKI in anticipation of a new GBrowse release 
> and have added a "stub" for the biopackages.net 
> <http://biopackages.net> install. Since I don't use yum (I've been 
> running Slackware for ages and have recently started working with 
> Ubuntu) I'm not sure I got the details right. Could someone check?
>
>
>         http://www.gmod.org/wiki/index.php/GBrowse_RPM_HOWTO
>
> Also, I think some verbiage on how to use yum to install MySQL and 
> Apache would be great, since it will be consistent with the Ubuntu 
> install page.
>
> Thanks,
>
> Lincoln
>
> On 3/31/07, *Brian Osborne* <bosborne11 at verizon.net 
> <mailto:bosborne11 at verizon.net>> wrote:
>
>     Allen et al.,
>
>     What happened to the "GMOD" package or packages? I've had some
>     conversations
>     in the past few months with you-all suggesting that a GMOD package, or
>     packages, would be useful.
>
>     Brian O.
>
>
>
>
>     On 3/30/07 8:30 PM, "Allen Day" <allenday at gmail.com
>     <mailto:allenday at gmail.com>> wrote:
>
>     > Hi Alex,
>     >
>     > You've aptly noted that there are several classes of packages being
>     > discussed here, and that they should not be treated
>     equally.  From my
>     > point of view and of specific relevance to the Bioperl community we
>     > have at least:
>     >
>     > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran
>     > dependencies.  These should all be in Fedora Extras, as they are of
>     > general utility.  Biopackages.net <http://Biopackages.net>
>     currently hosts about 200 packages
>     > (.spec files, specifically) that are like this.  Maybe 80 of
>     these are
>     > needed for Bioperl.
>     >
>     > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan,
>     > etc.  From what I've seen, these typically have strange/custom
>     > licenses that may not be valid for some users.  BLAT has a dual
>     > licensing scheme for academic and non-academic licensees, for
>     > instance.  These packages are not of general utility.  For these two
>     > reasons, my stance is that they should not be included in Fedora
>     > Extras.
>     >
>     > 3) Bioperl packages.  Several subsets here.  The Bioperl-run
>     libraries
>     > depend directly on type (2) packages, so aren't appropriate to
>     include
>     > in Fedora Extras.  Bioperl-live is not really that useful
>     without type
>     > (2) packages.  It is also sensible to all of the keep the Bioperl-*
>     > packages in the same repository.  For these reasons, my stance
>     is that
>     > they should not be included in Fedora Extras.
>     >
>     > 4) Bioinformatics / Comp. Bio. data sets.  These don't have
>     licensing
>     > problems, but they tend to be large.  Usually in the 10E7 -
>     10E10 byte
>     > range.  RPM can not even generate correct metadata for some of them
>     > correctly if the files are too large (overflow problems).  Probably
>     > not appropriate to put in Fedora Extras because they are too
>     large and
>     > not generally useful.
>     >
>     > 5) Bioinformatics-specific System databases / daemons.  These
>     > high-level packages depend on types (2), (3), and (4), and so
>     are not
>     > appropriate to put into Fedora Extras.  An example is a BLAT daemon,
>     > which relies on the BLAT server, as well as NIB-formatted genome
>     > sequence files.
>     >
>     > That said, there are a lot of type (1) packages in the
>     Biopackages.net <http://Biopackages.net>
>     > repository.  If you're interested in migrating the spec files
>     from our
>     > repository to the Fedora project it would save us (the
>     Biopackages.net <http://Biopackages.net>
>     > maintainers) a ton of build and maintenance time, so please feel
>     free
>     > to take them, just let us know.  If we can reach some agreement on
>     > where the bioinformatics-specific packages should be
>     maintained/built
>     > we may be able to work together on these as well.
>     >
>     > -Allen
>     >
>     >
>     > On 3/30/07, Alex Lancaster < alexl at users.sourceforge.net
>     <mailto:alexl at users.sourceforge.net>> wrote:
>     >>>>>>> "AD" == Allen Day  writes:
>     >>
>     >> AD> Hi Alex, The Biopackages.net <http://Biopackages.net>
>     project is still active, we are
>     >> AD> regularly adding packages to it, mostly R packages
>     lately.  Most
>     >> AD> of the systems we use are running CentOS at this point,
>     which is
>     >> AD> why you have not seen support for FC6 yet.  There is nothing
>     >> AD> preventing building FC6 packages aside from lack of time to
>     set up
>     >> AD> the FC6 build farm nodes.
>     >>
>     >> Hi Allen and other,
>     >>
>     >> Great news to hear that Biopackages.net
>     <http://Biopackages.net> is still active!  I would like
>     >> to help out if possible.  I don't believe in "FUD" either... ;)
>     >>
>     >> AD> If you're interested in packaging BioPerl or other
>     >> AD> bioinformatics-related software, please join the Biopackages
>     >> AD> project on SourceForge.  We object to the Fedora Extras FUD
>     >> AD> tactics used to discourage people from using 3rd party
>     >> AD> repositories, and suspect they may not want to host some of our
>     >> AD> data packages, such as the >2GB genome packages.  Biopackages
>     >> AD> project is likely to partially merge with RPMForge.  We are
>     >> AD> already discussing with them how best to do it.
>     >>
>     >> The packages that I created which are currently available in Fedora
>     >> Packages are Perl dependencies which, as I said are useful for
>     >> packages outside the bioinformatics purview.  I do have a (base)
>     >> bioperl package in review, but it is not yet released.
>     >>
>     >> As for third-party repos, I don't object to them at all, and
>     for some
>     >> kinds of projects they are indeed appropriate. (e.g. for non-free
>     >> stuff like Livna or Freshrpms).  However I do have practical
>     concerns
>     >> about repository mixing, but I think that it does need to be
>     handled
>     >> carefully but that co-operation between Fedora and third-party
>     repos
>     >> can make it work.
>     >>
>     >> For example, one practical concern is that as of the
>     >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there
>     >> will be no distinction at the repository-level between formerly
>     Extras
>     >> packages and formerly Core packages (as of now there are only
>     "Fedora
>     >> Packages"), which means that it will not be possible for
>     third-party
>     >> repos to limit their dependencies to just those in a former
>     base set
>     >> (i.e. excluding Extras).
>     >>
>     >> I agree that a few years ago (circa 2003-2004) there was
>     concern about
>     >> the way some third party repositories were treated somewhat
>     badly by
>     >> the (then) Fedora Extras (with some people going so far as to
>     say that
>     >> third-party repos were bad in principle and should always be
>     ignored
>     >> which I disagree with too).  But it seems to me that culture has
>     >> shifted since, with some notable packagers such as Matthias
>     Saou (of
>     >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to
>     >> Fedora itself.  The process of contributing has also become much
>     >> simpler and reviews are conducted speedily and efficiently, I had
>     >> packages in the repository in a matter of a few days from initial
>     >> submission.  Freshrpms itself now enables and depends on the (old)
>     >> Extras.
>     >>
>     >> The real question for me, then is what packages it makes sense
>     to go
>     >> in Fedora, and what packages go in third party
>     repositories.  It seems
>     >> to me that in the case of Perl packages which could be
>     dependencies
>     >> for other packages not specific to the third-party repo in
>     question,
>     >> it makes sense for them to go into Fedora itself, so I think I will
>     >> continue to package them.  This lessens the load on the
>     third-party
>     >> repo, while making them available for all other third-party repos.
>     >> (This is approach that Freshrpms seems to be taking, Matthias has
>     >> contributed most packages back to Fedora now other than the
>     non-free
>     >> ones).
>     >>
>     >> At the other end of the spectrum are packages like you mention,
>     genome
>     >> packages, which may be of concern because of their size and/or
>     highly
>     >> specialised nature, and, as you say, may make sense to go in a
>     >> third-party repo like Biopackages.net
>     <http://Biopackages.net>.  Also packages which can't be
>     >> packaged by Fedora for legal reasons like Clustal could/should
>     go in
>     >> Biopackages.net <http://Biopackages.net>.
>     >>
>     >> In the middle are packages like bioperl itself which are
>     potentially
>     >> useful to perhaps a wider group of people than the genome
>     packages but
>     >> may not necessarily be dependencies for other packages.  I lean
>     >> towards making them part of Fedora so that they will be
>     available of
>     >> out the box on the planned "Everything" DVD ISO, but I welcome a
>     >> discussion on this.
>     >>
>     >> As I said, I'm glad to hear that Biopackages.net
>     <http://Biopackages.net> is alive and well and
>     >> I welcome a discussion on how upstream Fedora can usefully interact
>     >> with Biopackages.net <http://Biopackages.net> (I guess perhaps
>     on the Biopackages.net <http://Biopackages.net> list).
>     >>
>     >> Regards,
>     >> Alex
>     >>
>     >> PS.  As the upstream author If you could clarify the license on
>     >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be
>     great.
>     >> --
>     >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology,
>     University of Arizona
>     >>
>     >>
>     >>
>     >> _______________________________________________
>     >> Bioperl-l mailing list
>     >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>     >>
>     > _______________________________________________
>     > Bioperl-l mailing list
>     > Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>     _______________________________________________
>     Bioperl-l mailing list
>     Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu <mailto:michelse at cshl.edu> 


From alexl at users.sourceforge.net  Wed Apr 18 21:17:34 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 18 Apr 2007 18:17:34 -0700
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <4626058B.8090801@sendu.me.uk> (Sendu Bala's message of "Wed\,
	18 Apr 2007 12\:48\:27 +0100")
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <e43b2x6u35.fsf@delpy.biol.berkeley.edu>

>>>>> "SB" == Sendu Bala  writes:

[...]

SB> I can remove the modules from cvs and create
SB> bioperl-run-1.5.2_101, resolving the packaging issue. I plan on
SB> doing precisely this within the next seven days unless someone
SB> puts a hand up to stop me.

In the meantime, until bioperl-run-1.5.2_101 is available, is it safe
just to remove these four .pm files during the packaging so they
don't get installed?  It looks like these four files are
self-contained and are only required/used by each other:

$ grep -r AccessorMaker *
Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar class min_version)]);
Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(input_file output_file)]);

$ grep -r AbstractRunner *
Tools/Run/JavaRunner.pm:use Bio::Tools::Run::AbstractRunner;
Tools/Run/JavaRunner.pm:our @ISA=qw(Bio::Tools::Run::AbstractRunner);
Tools/Run/AbstractRunner.pm:package Bio::Tools::Run::AbstractRunner;
Tools/Run/AbstractRunner.pm:Bio::Tools::Run::AbstractRunner

$ grep -r JavaRunner *
Tools/Run/Phylo/Forester/SDI.pm:use Bio::Tools::Run::JavaRunner;
Tools/Run/Phylo/Forester/SDI.pm:our @ISA=qw(Bio::Tools::Run::JavaRunner);
Tools/Run/JavaRunner.pm:package Bio::Tools::Run::JavaRunner;
Tools/Run/JavaRunner.pm: Usage   : $runner = Bio::Tools::Run::JavaRunner->new(-jar => $jar)
Tools/Run/JavaRunner.pm: Function: Builds a new Bio::Tools::Run::JavaRunner object
Tools/Run/JavaRunner.pm: Returns : Bio::Tools::Run::JavaRunner
Tools/Run/JavaRunner.pm:Bio::Tools::Run::JavaRunner - run java programs
Tools/Run/JavaRunner.pm:   my $runner = Bio::Tools::Run::JavaRunner->new(-jar => $jar);

$ grep -r Forester *
Tools/Run/Phylo/Forester/SDI.pm:package Bio::Tools::Run::Phylo::Forester::SDI;
Tools/Run/Phylo/Forester/SDI.pm:Bio::Tools::Run::Phylo::Forester::SDI
Tools/Run/Phylo/Forester/SDI.pm:    my $runner = Bio::Tools::Run::Phylo::Forester::SDI->new();
Tools/Run/Phylo/Forester/SDI.pm:This wrapper is for SDI in Forester package. 
Tools/Run/Phylo/Forester/SDI.pm:For more details on Forester, please see 

Alex


From sac at bioperl.org  Thu Apr 19 01:14:02 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 18 Apr 2007 22:14:02 -0700
Subject: [Bioperl-l] GenericHit->start/end needs tiled hsps?
In-Reply-To: <461F3FBA.2010101@sendu.me.uk>
References: <461F3FBA.2010101@sendu.me.uk>
Message-ID: <8f200b4c0704182214j77a4accy72f71b2061764d5b@mail.gmail.com>

Sendu,

Your thinking here seems correct and in fact agrees with the documentation
for those methods:

start():  If there is more than one HSP, the lowest start
           value of all HSPs is returned.

end():  If there is more than one HSP, the largest end
          value of all HSPs is returned.

It would be fine with me to change the implementation in GenericHit as you
suggest and to not tile the HSPs. Tiling is only necessary for data that is
summed across the region covered by all HSPs, as is done by these methods:
matches(), gaps(), frac_* and percent_*.

Steve

On 4/13/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Hi all,
>
> I want to double-check my thinking regarding
> Bio::Search::Hit::GenericHit->start() and end(). Right now the docs
> claim that hsps of the hit object must be tiled before the answer can be
> produced. The code is implemented in that way
> (Bio::Search::SearchUtils::tile_hsps($self)).
>
> Yet as far as I can see, all you need to do is loop through all hsps and
> pick out the smallest start and largest end respectively in terms of
> subject and query.
>
> This comes up because I have a blast report where a single hit contains
> over 80000 hsps and the tiling takes over an hour (I gave up on it,
> don't know how long it really takes). The simple loop through hsps takes
> seconds or less.
>
> Now in this situation the answer isn't especially useful (to me). An
> alternative way of fixing the problem would be to re-write the tiling
> algorithm (again) to somehow make it hundreds of times faster, then
> provide some way in start() and end() for the user to request the start
> and end of the best contig, or other contig of choice. Easier said than
> done though!
>
>
> What do people think?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bix at sendu.me.uk  Thu Apr 19 06:52:45 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 11:52:45 +0100
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <e43b2x6u35.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>	<4626058B.8090801@sendu.me.uk>
	<e43b2x6u35.fsf@delpy.biol.berkeley.edu>
Message-ID: <462749FD.3080603@sendu.me.uk>

Alex Lancaster wrote:
>>>>>> "SB" == Sendu Bala  writes:
> 
> [...]
> 
> SB> I can remove the modules from cvs and create
> SB> bioperl-run-1.5.2_101, resolving the packaging issue. I plan on
> SB> doing precisely this within the next seven days unless someone
> SB> puts a hand up to stop me.
> 
> In the meantime, until bioperl-run-1.5.2_101 is available, is it safe
> just to remove these four .pm files during the packaging so they
> don't get installed?

Sure, go ahead with that.


From bix at sendu.me.uk  Thu Apr 19 06:51:53 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 11:51:53 +0100
Subject: [Bioperl-l] To be deprecated: Bio::Tools::Run::AbstractRunner,
 Bio::Tools::Run::Phylo::Forester::SDI and
 Bio::Tools::Run::JavaRunner
In-Reply-To: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
Message-ID: <462749C9.1040503@sendu.me.uk>

[repost under new subject to make sure it is seen by those it may concern]

[BCC: Juguang Xiao at a variety of possible email addresses]


Alex Lancaster wrote:
> In packaging bioperl-run for Fedora, I think I stumbled across a bug
> in the bioperl-run package.  It appears from this edit:
> 
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
> 
> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
> bioperl-run 1.5.2_100 still contains modules that use this module:
> 
> $ cd bioperl-run-1.5.2_100
> $ grep -r AccessorMaker  *
> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
> class min_version)]);
> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
> ('$'=>[qw(input_file output_file)]);

It looks like I've implemented a similar idea to AccessorMaker and
AbstractRunner in Bio::Root::Root->_set_from_args() and
Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
AbstractRunner I propose deprecating it immediately.

Forester::SDI and JavaRunner have no tests which is why we didn't notice
the problem. Since they've been out of use for a number of years now I
also propose their immediate deprecation. Alternatively, it may not be
too difficult to just update them to use _set_from_args and _setparams,
but I've nothing to test against (and JavaRunner is self-described as
"probably incomplete").


I can remove the modules from cvs and create bioperl-run-1.5.2_101,
resolving the packaging issue. I plan on doing precisely this within the
next seven days unless someone puts a hand up to stop me.


From bix at sendu.me.uk  Thu Apr 19 08:17:19 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 13:17:19 +0100
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
In-Reply-To: <200704050059.l350xNF07452@cricket.bio.indiana.edu>
References: <200704050059.l350xNF07452@cricket.bio.indiana.edu>
Message-ID: <46275DCF.6030103@sendu.me.uk>

Don Gilbert wrote:
> Dear Bioperl list,
> 
> There is a small bug in what I think is the current Bio::Tools::GFF.pm,
> that blocks output of Target attributes (in gff3 at least).  See a patch
> here
> 
> http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF

The patch was applied by Brian but is currently generating this warning:

./Build test --test_files t/GbrowseGFF.t --verbose
t/GbrowseGFF....1..5
ok 1 - use Bio::SearchIO;
ok 2 - use Bio::SearchIO::Writer::GbrowseGFF;
ok 3 - use Bio::Root::IO;
ok 4
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
ok 5
ok
All tests successful.

Can this patch be looked at again and rolled-back if the problem can't 
be fixed?


Cheers,
Sendu.


From sm8 at sanger.ac.uk  Thu Apr 19 07:49:30 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 12:49:30 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>

 is there an existing method for copying a Bio::Tree::Tree object by
value?

All the best,
Stephen


From bix at sendu.me.uk  Thu Apr 19 08:43:44 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 13:43:44 +0100
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
Message-ID: <46276400.2020207@sendu.me.uk>

Stephen Montgomery wrote:
>  is there an existing method for copying a Bio::Tree::Tree object by
> value?

What do you mean? Describe in a little more detail what you're trying to do.


From sm8 at sanger.ac.uk  Thu Apr 19 09:13:44 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 14:13:44 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>

my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object

as an example, a method like
my $tree_copy = $tree->clone; #copies by value (this method doesn't
exist) or
my $tree_copy = Storable::dclone($tree); 

Cheers,
Stephen

-----Original Message-----
From: Sendu Bala [mailto:bix at sendu.me.uk] 
Sent: 19 April 2007 13:44
To: Stephen Montgomery
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] tree copy by-value

Stephen Montgomery wrote:
>  is there an existing method for copying a Bio::Tree::Tree object by
> value?

What do you mean? Describe in a little more detail what you're trying to
do.


From jason at bioperl.org  Thu Apr 19 09:19:05 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 19 Apr 2007 06:19:05 -0700
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
Message-ID: <35813ADC-6597-46FC-8FB8-C70AA3541BEC@bioperl.org>

I don't think so, worst case you serialize to/from TreeIO and get a  
new one, but the _internal_id of the nodes will be necessarily  
different (and new).

-jason
On Apr 19, 2007, at 4:49 AM, Stephen Montgomery wrote:

>  is there an existing method for copying a Bio::Tree::Tree object by
> value?
>
> All the best,
> Stephen
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From bix at sendu.me.uk  Thu Apr 19 09:24:41 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 14:24:41 +0100
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>
Message-ID: <46276D99.2060108@sendu.me.uk>

Stephen Montgomery wrote:
> my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object
> 
> as an example, a method like
> my $tree_copy = $tree->clone; #copies by value (this method doesn't
> exist) or
> my $tree_copy = Storable::dclone($tree); 

Right, sorry for being a little slow on the uptake. As a matter of fact 
I recently added _clone() to Bio::Tree::TreeFunctionsI which does a 
"safe tree clone that doesn't seg fault". Its undocumented and I thought 
would only be needed by simplify_to_leaves_string(), but I guess I can 
document it and make it public (ie. remove the underscore from the name) 
if this might be popular.

Oh, it's also not that well tested, so proceed with caution and provide 
feedback if you can.


Cheers,
Sendu.


From ewijaya at gmail.com  Thu Apr 19 09:27:45 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Thu, 19 Apr 2007 21:27:45 +0800
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
	Enable Connector
Message-ID: <3521d3670704190627u6aba98b1nc3892833b6a77c1c@mail.gmail.com>

Dear expert,

My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png
is created with the script (down below).

How can I modify the script such that:

1. The arrow track is represented in negative form.
    I.e. instead of 1 to 300, we use -300 to 0.

I tried this, but won't do:

my $flen = Bio::SeqFeature::Generic->new(
        -start => -300,
        -end => 0, );

And how can I make these number to appear
for every gridpoints (not just two as I have now).


2. How can I enable the connector with grid just like
   I had in the first panel? (as you can see, my script
   has connector added, but still doesn't show).

All, in all, I am trying to mimic this figure:
http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg

And here is my script:

__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
use List::Compare;
use List::Util qw(max);

my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
my @seqid = keys %nofseq;
my @lenlist = values %nofseq;
my $maxlen = max (@lenlist);
#print Dumper \@seqid ;

my $panel = Bio::Graphics::Panel->new(
    -length    => 300,
    -width     => 500,
    -pad_left  => 70,
    -pad_right => 70,
    -key_style => 'left',
    -connector => 'solid',
);

my $flen = Bio::SeqFeature::Generic->new(
        -start => 1,   # tried -300
        -end => 300, # and 0, but failed.
);

    my $track1 = $panel->add_track(
        $flen,
        -glyph   => 'arrow',
        -tick    => 2,
        -fgcolor => 'black',
        -double  => 1,
    );


my %nlist;

while ( <DATA> ) {
    chomp;
    next if /^\#/;
    my ($sqi,$pos,$str,$progname) = split /\,/;
    my $start = $pos + $nofseq{$sqi};
    my $end = $start + length($str) + 1;
    push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
}

# Check which sequence has no motifs;
my @bssi = keys %nlist;

my $lc = List::Compare->new(\@seqid, \@bssi);
my @comp = $lc->get_unique;


foreach my $comp ( @comp  ) {
    push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";

}

my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );

foreach my $seqid ( sort keys %nlist ) {


    my $track = $panel->add_track(
        -glyph     => 'graded_segments',
        -key       => "SEQ ". $seqid,
        -connector => "dashed"
        -label     => 1,
        -bgcolor   => 'blue',
		-bump      =>  +1,
		-height    =>  8,
        -min_score => 0,
        -max_score => 5000
    );


    foreach my $range ( @{$nlist{$seqid}} ) {

        my ($st,$en,$progname) = split(" ", $range);
        my $dname = " ";
        if ( $st != 0 and $en !=0  ) {
           $dname = "Seq ". $seqid;
        }

        my $score;
        if ( $progname eq "WEEDER" ) {
            $score = $prog_color{$progname};

        }
        elsif ($progname eq "MEME" ) {
            $score = $prog_color{$progname};
        }

        my $feature = Bio::SeqFeature::Generic->new(
            -display_name => $dname,
            -start        => $st,
            -end          => $en,
            -score        => $score
        );

        $track->add_feature($feature);

    }

}

print $panel->png;

#The DATA is simply just list of string and its location in their
respective sequence.
# The figure is just the plot of it out.
__DATA__
# sequence number,pos,binding sites,program
4,-63,AGCTTTCTCT,MEME
0,-22,AACTTTGTAC,WEEDER
1,-13,AAGTTTCTCT,WEEDER
5,-228,ACCTTTGCCA,MEME
5,-121,AAGTTTGTCT,WEEDER
5,-88,AAGTTTTTCC,SPACE
3,-148,AACTTAGTCA,MEME
0,-184,AACTTTGTCT,MEME
__END__


Thanks and hope to hear from you again.

--
Regards,
Edward WIJAYA


From sm8 at sanger.ac.uk  Thu Apr 19 09:33:18 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 14:33:18 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B038675FB@exchsrv2.internal.sanger.ac.uk>

Thanks Sendu!  That is perfect.
Cheers
Stephen

-----Original Message-----
From: Sendu Bala [mailto:bix at sendu.me.uk] 
Sent: 19 April 2007 14:25
To: Stephen Montgomery
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] tree copy by-value

Stephen Montgomery wrote:
> my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object
> 
> as an example, a method like
> my $tree_copy = $tree->clone; #copies by value (this method doesn't
> exist) or
> my $tree_copy = Storable::dclone($tree); 

Right, sorry for being a little slow on the uptake. As a matter of fact 
I recently added _clone() to Bio::Tree::TreeFunctionsI which does a 
"safe tree clone that doesn't seg fault". Its undocumented and I thought

would only be needed by simplify_to_leaves_string(), but I guess I can 
document it and make it public (ie. remove the underscore from the name)

if this might be popular.

Oh, it's also not that well tested, so proceed with caution and provide 
feedback if you can.


Cheers,
Sendu.


From ewijaya at i2r.a-star.edu.sg  Thu Apr 19 09:59:05 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Thu, 19 Apr 2007 21:59:05 +0800
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
	Enable Connector
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>


Dear expert,

My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png <http://defiant.i2r.a-star.edu.sg/%7Eewijaya/misc/foo2.png> 
is created with the script (down below).

How can I modify the script such that:

1. The arrow track is represented in negative form.
   I.e. instead of 1 to 300, we use -300 to 0.

I tried this, but won't do:

my $flen = Bio::SeqFeature::Generic->new(
       -start => -300,
       -end => 0, );

And how can I make these number to appear
for every gridpoints (not just two as I have now).


2. How can I enable the connector with grid just like
  I had in the first panel? (as you can see, my script
  has connector added, but still doesn't show).

All, in all, I am trying to mimic this figure:
http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg <http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg> 

And here is my script:

__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
use List::Compare;
use List::Util qw(max);

my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
my @seqid = keys %nofseq;
my @lenlist = values %nofseq;
my $maxlen = max (@lenlist);
#print Dumper \@seqid ;

my $panel = Bio::Graphics::Panel->new(
   -length    => 300,
   -width     => 500,
   -pad_left  => 70,
   -pad_right => 70,
   -key_style => 'left',
   -connector => 'solid',
);

my $flen = Bio::SeqFeature::Generic->new(
       -start => 1,   # tried -300
       -end => 300, # and 0, but failed.
);

   my $track1 = $panel->add_track(
       $flen,
       -glyph   => 'arrow',
       -tick    => 2,
       -fgcolor => 'black',
       -double  => 1,
   );


my %nlist;

while ( <DATA> ) {
   chomp;
   next if /^\#/;
   my ($sqi,$pos,$str,$progname) = split /\,/;
   my $start = $pos + $nofseq{$sqi};
   my $end = $start + length($str) + 1;
   push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
}

# Check which sequence has no motifs;
my @bssi = keys %nlist;

my $lc = List::Compare->new(\@seqid, \@bssi);
my @comp = $lc->get_unique;


foreach my $comp ( @comp  ) {
   push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";

}

my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );

foreach my $seqid ( sort keys %nlist ) {


   my $track = $panel->add_track(
       -glyph     => 'graded_segments',
       -key       => "SEQ ". $seqid,
       -connector => "dashed"
       -label     => 1,
       -bgcolor   => 'blue',
               -bump      =>  +1,
               -height    =>  8,
       -min_score => 0,
       -max_score => 5000
   );


   foreach my $range ( @{$nlist{$seqid}} ) {

       my ($st,$en,$progname) = split(" ", $range);
       my $dname = " ";
       if ( $st != 0 and $en !=0  ) {
          $dname = "Seq ". $seqid;
       }

       my $score;
       if ( $progname eq "WEEDER" ) {
           $score = $prog_color{$progname};

       }
       elsif ($progname eq "MEME" ) {
           $score = $prog_color{$progname};
       }

       my $feature = Bio::SeqFeature::Generic->new(
           -display_name => $dname,
           -start        => $st,
           -end          => $en,
           -score        => $score
       );

       $track->add_feature($feature);

   }

}

print $panel->png;

#The DATA is simply just list of string and its location in their
respective sequence.
# The figure is just the plot of it out.
__DATA__
# sequence number,pos,binding sites,program
4,-63,AGCTTTCTCT,MEME
0,-22,AACTTTGTAC,WEEDER
1,-13,AAGTTTCTCT,WEEDER
5,-228,ACCTTTGCCA,MEME
5,-121,AAGTTTGTCT,WEEDER
5,-88,AAGTTTTTCC,SPACE
3,-148,AACTTAGTCA,MEME
0,-184,AACTTTGTCT,MEME
__END__


Thanks and hope to hear from you again.

--
Regards,

Edward WIJAYA

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From ioanniskirmitzoglou at gmail.com  Thu Apr 19 10:06:06 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Thu, 19 Apr 2007 17:06:06 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
Message-ID: <b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>

I have reported it as a bug on the bugzilla but due to bugzilla problems I
was not able to attach my code and/or sample m10 files.
Nevertheless here is the code that converts an m10 fasta output to an m8
BLAST output which is parseable by the vast majority of software.

<----------- CODE BEGINS HERE ------------------->

#!/usr/bin/perl -w

=head1 NAME

fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular output

=head1 SYNOPSIS

 fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...

=head1 DESCRIPTION

Command line options:
  --header                -- boolean flag to print column header
  -o/--out                -- optional outputfile to write data,
                             otherwise will write to STDOUT
  -h/--help               -- show this documentation

Not technically a SearchIO script as this doesn't use any Bioperl
components but is a useful and fast.  The output is tabular output
with the standard NCBI -m8 columns.

 queryname
 hit name
 percent identity
 alignment length
 number mismatches
 number gaps
 query start  (if on rev-strand start > end)
 query end
 hit start (if on rev-strand start > end)
 hit end
 evalue
 bit score

Additionally 4 more columns are provided
 percent similar
 query length
 hit length
 query gaps
 hit gaps

=head1 AUTHOR - Ioannis Kirmitzoglou

Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org

=head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou

Headers as well as portions of code were taken
from fastam9_to_table.pl by Jason Stajich

=head1 DISCLAIMER

Copyright (c) <2007> <Ioannis Kirmitzolgou>

Permission to use, copy, modify, merge, publish and distribute
this software and its documentation, with or without modification,
for any purpose, and without fee or royalty to the copyright holder(s)
is hereby granted with no restictions and/or prerequisites.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

=cut

use strict;
use Getopt::Long;

my %data=();

my $outfile=''; my $header='';
GetOptions(
    'header'              => \$header,
    'o|out|outfile:s'     => \$outfile,
    'h|help'              => sub { exec('perldoc',$0); exit; }
       );

my $outfh;
if( $outfile ) {
    open($outfh, ">$outfile") || die("$outfile: $!");
} else {
    $outfh = \*STDOUT;
}


$/="\n>>>";

my @fields = qw(qname hname percid alen mmcount gapcount
        qstart qend hstart hend evalue bits percsim qlen hlen qgap hgap);

print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)), "\n" if
$header;

while (<>) {

        chomp;
        if ($_=~/^>/ || $_=~/^\#/) {next;}
        my @hits = split(/\d+>>/, $_);
        @hits= split("\n>>", $hits[0]);

        my $hit = shift @hits;

        ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d+)/));

        foreach my $align (@hits) {

            my @details= split ("\n>", $align);
           my $detail = shift @details;
            ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
            $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
            $data{'bits'}=$1;
            $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
            $data{'evalue'}=$1;

            my $term = quotemeta("; sw_score");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'score'}=$1;

            $term = quotemeta("; sw_ident:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'percid'}=$1;

            $term = quotemeta("; sw_sim:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'percsim'}=$1;

            $term = quotemeta("; sw_overlap:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'alen'}=$1;

            $detail = shift @details;

            $term = quotemeta("; al_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'qstart'}=$1;

            $term = quotemeta("; al_stop:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'qend'}=$1;

            $term = quotemeta("; al_display_start:");
            $term =~ s/\\ /\\s/;
            my $lakis ='';
            $detail=~ /$term.+\s\-*([\-\w\s]+)/g;

            $data{'qgap'}=($1 =~ tr/\-//);

            $detail = shift @details;

            $term = quotemeta("; sq_len:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hlen'}=$1;

            $term = quotemeta("; al_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hstart'}=$1;

            $term = quotemeta("; al_stop:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hend'}=$1;

            $term = quotemeta("; al_display_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
            $data{'hgap'}=($1 =~ tr/-//);
            $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
            $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
$data{'alen'}) + $data{'gapcount'});

for ( $data{'percid'}, $data{'percsim'} ) {
    $_ = sprintf("%.2f",$_*100);
}

            print $outfh join( "\t",map { $data{$_} } @fields),"\n"
        }

}

<----------------- CODE ENDS HERE ---------------------->

-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


From gilbertd at cricket.bio.indiana.edu  Thu Apr 19 13:38:05 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Thu, 19 Apr 2007 12:38:05 -0500 (EST)
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
Message-ID: <200704191738.l3JHc5s10658@cricket.bio.indiana.edu>


I'm not sure what kind of test data would have bad Target strings,
but this should clear up those warnings -- insert the '+' line:

  sub _gff3_string:
    for my $tag ( @all_tags ) {
       ##dgg.patch.was# next if $tag eq 'Target';
      if ($tag eq 'Target'
         and ! $origfeat->isa('Bio::SeqFeature::FeaturePair'))
       {  
       my($target_id, $b,$e,$strand)= $feat->get_tag_values($tag); 
+       next unless(defined($e) && defined($b) && $target_id);
       ($b,$e)= ($e,$b) if(defined $strand && $strand<0);
       $target_id =~ s/([\t\n\r%&\=;,])/sprintf("%%%X",ord($1))/ge;    
       push @groups, sprintf("Target=%s %d %d", $target_id,$b,$e);
       next;
       }

-- Don


From stefan.kirov at bms.com  Thu Apr 19 14:01:28 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 19 Apr 2007 14:01:28 -0400
Subject: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
In-Reply-To: <4626E1A3.4070405@i2r.a-star.edu.sg>
References: <462473B7.4070905@i2r.a-star.edu.sg> <4624D9F3.5050805@bms.com>
	<4626E1A3.4070405@i2r.a-star.edu.sg>
Message-ID: <4627AE78.200@bms.com>

I will see if I can post it or perhaps commit something to the bp 
scripts. In any case it won't be before Monday- I have deadlines to meet.
Stefan
Edward WIJAYA wrote:
>
> Hi Stefan,
>> I believe you can use Bio::Graphics for this. I have done so in the 
>> past and I find it quite straightforward.
> Do you still have that sample script? I don't find it simple to do.
> I was thinking of doing something like this:
>
> http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg 
>
>
> Appreciate if you can share it with us.
>
> -- 
> Edward
>
>
>>
>>
>> Edward WIJAYA wrote:
>>> Dear all,
>>>
>>> How do you usually construct a graph for TFBS (binding sites) position
>>> within their sequences? I was thinking to build something like this 
>>> kind of
>>> visualization tool:
>>>
>>> http://research.i2r.a-star.edu.sg/Dragon/Motif_Search/cgi-bin/tmp/29740M1.html 
>>>
>>>
>>> or
>>>
>>> http://wingless.cs.washington.edu:8080/assessment/servlet?filenameID=submission/SPACE.D9F26D506DE90E9A0A0010BB6BCCAEF3&pageType=visualizationForm&action=Visualize+It 
>>>
>>>
>>> Is there a BioPerl module to do that?
>>>
>>> -- 
>>> Edward
>>>
>>>
>>>
>>> ------------ Institute For Infocomm Research - Disclaimer -------------
>>> This email is confidential and may be privileged.  If you are not 
>>> the intended recipient, please delete it and notify us immediately. 
>>> Please do not copy or use it for any purpose, or disclose its 
>>> contents to any other person. Thank you.
>>> --------------------------------------------------------
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>   
>>
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the 
> intended recipient, please delete it and notify us immediately. Please 
> do not copy or use it for any purpose, or disclose its contents to any 
> other person. Thank you.
> --------------------------------------------------------
>


From shameer at ncbs.res.in  Fri Apr 20 07:45:23 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Fri, 20 Apr 2007 17:15:23 +0530 (IST)
Subject: [Bioperl-l] Protparam using BioPerl
In-Reply-To: <200704180718.48811.sdavis2@mail.nih.gov>
References: <4624E32A.6010704@bms.com>
	<36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
	<200704180718.48811.sdavis2@mail.nih.gov>
Message-ID: <45682.192.168.1.1.1177069523.squirrel@mail.ncbs.res.in>

Hi,

I would like to know whether Bioperl have a wrapper for protparam from
Expasy.
I need to calculate Instability Index using Guruprasad et.al 1990 values
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2075190&dopt=Abstract)
for 100 sequences I did some googling and I didnt get any valid
information.

Thanks,
-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From basu at pharm.sunysb.edu  Fri Apr 20 12:37:57 2007
From: basu at pharm.sunysb.edu (Siddhartha Basu)
Date: Fri, 20 Apr 2007 12:37:57 -0400
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
 Enable Connector
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
Message-ID: <4628EC65.7070505@pharm.sunysb.edu>

Hi,

Wijaya Edward wrote:
> Dear expert,
> 
> My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png <http://defiant.i2r.a-star.edu.sg/%7Eewijaya/misc/foo2.png> 
> is created with the script (down below).
> 
> How can I modify the script such that:
> 
> 1. The arrow track is represented in negative form.
>    I.e. instead of 1 to 300, we use -300 to 0.
> 
> I tried this, but won't do:
> 
> my $flen = Bio::SeqFeature::Generic->new(
>        -start => -300,
>        -end => 0, );

It works if you pass the 'SeqFeature' object to the '-segment' option of 
  "Bio::Graphics::Panel".

  my $panel = Bio::Graphics::Panel->new(
    -length    => 300,
   -width     => 500,
   -pad_left  => 70,
   -pad_right => 70,
    -key_style => 'left',
   -connector => 'solid',
      -segment => $flen,
);

For more, read one of the previous posting,
http://article.gmane.org/gmane.comp.lang.perl.bio.general/1721/match=negative+seqfeature

-siddhartha

> 
> And how can I make these number to appear
> for every gridpoints (not just two as I have now).
> 
> 
> 2. How can I enable the connector with grid just like
>   I had in the first panel? (as you can see, my script
>   has connector added, but still doesn't show).
> 
> All, in all, I am trying to mimic this figure:
> http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg <http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg> 
> 
> And here is my script:
> 
> __BEGIN__
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> use List::Compare;
> use List::Util qw(max);
> 
> my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
> my @seqid = keys %nofseq;
> my @lenlist = values %nofseq;
> my $maxlen = max (@lenlist);
> #print Dumper \@seqid ;
> 
> my $panel = Bio::Graphics::Panel->new(
>    -length    => 300,
>    -width     => 500,
>    -pad_left  => 70,
>    -pad_right => 70,
>    -key_style => 'left',
>    -connector => 'solid',
> );
> 
> my $flen = Bio::SeqFeature::Generic->new(
>        -start => 1,   # tried -300
>        -end => 300, # and 0, but failed.
> );
> 
>    my $track1 = $panel->add_track(
>        $flen,
>        -glyph   => 'arrow',
>        -tick    => 2,
>        -fgcolor => 'black',
>        -double  => 1,
>    );
> 
> 
> 
> my %nlist;
> 
> while ( <DATA> ) {
>    chomp;
>    next if /^\#/;
>    my ($sqi,$pos,$str,$progname) = split /\,/;
>    my $start = $pos + $nofseq{$sqi};
>    my $end = $start + length($str) + 1;
>    push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
> }
> 
> # Check which sequence has no motifs;
> my @bssi = keys %nlist;
> 
> my $lc = List::Compare->new(\@seqid, \@bssi);
> my @comp = $lc->get_unique;
> 
> 
> foreach my $comp ( @comp  ) {
>    push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";
> 
> }
> 
> my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );
> 
> foreach my $seqid ( sort keys %nlist ) {
> 
> 
>    my $track = $panel->add_track(
>        -glyph     => 'graded_segments',
>        -key       => "SEQ ". $seqid,
>        -connector => "dashed"
>        -label     => 1,
>        -bgcolor   => 'blue',
>                -bump      =>  +1,
>                -height    =>  8,
>        -min_score => 0,
>        -max_score => 5000
>    );
> 
> 
>    foreach my $range ( @{$nlist{$seqid}} ) {
> 
>        my ($st,$en,$progname) = split(" ", $range);
>        my $dname = " ";
>        if ( $st != 0 and $en !=0  ) {
>           $dname = "Seq ". $seqid;
>        }
> 
>        my $score;
>        if ( $progname eq "WEEDER" ) {
>            $score = $prog_color{$progname};
> 
>        }
>        elsif ($progname eq "MEME" ) {
>            $score = $prog_color{$progname};
>        }
> 
>        my $feature = Bio::SeqFeature::Generic->new(
>            -display_name => $dname,
>            -start        => $st,
>            -end          => $en,
>            -score        => $score
>        );
> 
>        $track->add_feature($feature);
> 
>    }
> 
> }
> 
> print $panel->png;
> 
> #The DATA is simply just list of string and its location in their
> respective sequence.
> # The figure is just the plot of it out.
> __DATA__
> # sequence number,pos,binding sites,program
> 4,-63,AGCTTTCTCT,MEME
> 0,-22,AACTTTGTAC,WEEDER
> 1,-13,AAGTTTCTCT,WEEDER
> 5,-228,ACCTTTGCCA,MEME
> 5,-121,AAGTTTGTCT,WEEDER
> 5,-88,AAGTTTTTCC,SPACE
> 3,-148,AACTTAGTCA,MEME
> 0,-184,AACTTTGTCT,MEME
> __END__
> 
> 
> Thanks and hope to hear from you again.
> 
> --
> Regards,
> 
> Edward WIJAYA
> 
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Fri Apr 20 15:47:30 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 20 Apr 2007 15:47:30 -0400
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
In-Reply-To: <200704191738.l3JHc5s10658@cricket.bio.indiana.edu>
Message-ID: <C24E9112.DD2B%bosborne11@verizon.net>

Applied.


On 4/19/07 1:38 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu> wrote:

> 
> I'm not sure what kind of test data would have bad Target strings,
> but this should clear up those warnings -- insert the '+' line:
> 
>   sub _gff3_string:
>     for my $tag ( @all_tags ) {
>        ##dgg.patch.was# next if $tag eq 'Target';
>       if ($tag eq 'Target'
>          and ! $origfeat->isa('Bio::SeqFeature::FeaturePair'))
>        {  
>        my($target_id, $b,$e,$strand)= $feat->get_tag_values($tag);
> +       next unless(defined($e) && defined($b) && $target_id);
>        ($b,$e)= ($e,$b) if(defined $strand && $strand<0);
>        $target_id =~ s/([\t\n\r%&\=;,])/sprintf("%%%X",ord($1))/ge;
>        push @groups, sprintf("Target=%s %d %d", $target_id,$b,$e);
>        next;
>        }
> 
> -- Don
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ewijaya at i2r.a-star.edu.sg  Sat Apr 21 10:44:08 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sat, 21 Apr 2007 22:44:08 +0800
Subject: [Bioperl-l] Getting Gene Sequences with Bioperl
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06168D@mailbe01.teak.local.net>


Hi all,
 
Is there a BioPerl module that allow us to extract
gene sequence given a list of gene names (gene symbol)?
 
In particular we would pass window size of the sequence,
then returning  upstream, downstream or ORF sequences for that list of genes.
We may also prespecify the on specific organism or all organsims.
 
Is there also a freely downloadable gene database that support
BioPerl module for that task?
 
Thanks and hope to hear from you again.
 
--
Edward WIJAYA
SINGAPORE

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From hlapp at gmx.net  Sat Apr 21 13:14:10 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 21 Apr 2007 13:14:10 -0400
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
Message-ID: <19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>

I haven't kept track of this - did this go anywhere? Do we not have  
an -m10 fasta output parser in SearchIO? (I.e., my first thought  
would be that that would be the desired solution; am I misled in this?)

	-hilmar

On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:

> I have reported it as a bug on the bugzilla but due to bugzilla  
> problems I
> was not able to attach my code and/or sample m10 files.
> Nevertheless here is the code that converts an m10 fasta output to  
> an m8
> BLAST output which is parseable by the vast majority of software.
>
> <----------- CODE BEGINS HERE ------------------->
>
> #!/usr/bin/perl -w
>
> =head1 NAME
>
> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular  
> output
>
> =head1 SYNOPSIS
>
>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>
> =head1 DESCRIPTION
>
> Command line options:
>   --header                -- boolean flag to print column header
>   -o/--out                -- optional outputfile to write data,
>                              otherwise will write to STDOUT
>   -h/--help               -- show this documentation
>
> Not technically a SearchIO script as this doesn't use any Bioperl
> components but is a useful and fast.  The output is tabular output
> with the standard NCBI -m8 columns.
>
>  queryname
>  hit name
>  percent identity
>  alignment length
>  number mismatches
>  number gaps
>  query start  (if on rev-strand start > end)
>  query end
>  hit start (if on rev-strand start > end)
>  hit end
>  evalue
>  bit score
>
> Additionally 4 more columns are provided
>  percent similar
>  query length
>  hit length
>  query gaps
>  hit gaps
>
> =head1 AUTHOR - Ioannis Kirmitzoglou
>
> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>
> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>
> Headers as well as portions of code were taken
>> from fastam9_to_table.pl by Jason Stajich
>
> =head1 DISCLAIMER
>
> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>
> Permission to use, copy, modify, merge, publish and distribute
> this software and its documentation, with or without modification,
> for any purpose, and without fee or royalty to the copyright holder(s)
> is hereby granted with no restictions and/or prerequisites.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>
> =cut
>
> use strict;
> use Getopt::Long;
>
> my %data=();
>
> my $outfile=''; my $header='';
> GetOptions(
>     'header'              => \$header,
>     'o|out|outfile:s'     => \$outfile,
>     'h|help'              => sub { exec('perldoc',$0); exit; }
>        );
>
> my $outfh;
> if( $outfile ) {
>     open($outfh, ">$outfile") || die("$outfile: $!");
> } else {
>     $outfh = \*STDOUT;
> }
>
>
> $/="\n>>>";
>
> my @fields = qw(qname hname percid alen mmcount gapcount
>         qstart qend hstart hend evalue bits percsim qlen hlen qgap  
> hgap);
>
> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),  
> "\n" if
> $header;
>
> while (<>) {
>
>         chomp;
>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>         my @hits = split(/\d+>>/, $_);
>         @hits= split("\n>>", $hits[0]);
>
>         my $hit = shift @hits;
>
>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d 
> +)/));
>
>         foreach my $align (@hits) {
>
>             my @details= split ("\n>", $align);
>            my $detail = shift @details;
>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>             $data{'bits'}=$1;
>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>             $data{'evalue'}=$1;
>
>             my $term = quotemeta("; sw_score");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'score'}=$1;
>
>             $term = quotemeta("; sw_ident:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percid'}=$1;
>
>             $term = quotemeta("; sw_sim:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percsim'}=$1;
>
>             $term = quotemeta("; sw_overlap:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'alen'}=$1;
>
>             $detail = shift @details;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             my $lakis ='';
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>
>             $data{'qgap'}=($1 =~ tr/\-//);
>
>             $detail = shift @details;
>
>             $term = quotemeta("; sq_len:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hlen'}=$1;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>             $data{'hgap'}=($1 =~ tr/-//);
>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>             $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
> $data{'alen'}) + $data{'gapcount'});
>
> for ( $data{'percid'}, $data{'percsim'} ) {
>     $_ = sprintf("%.2f",$_*100);
> }
>
>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>         }
>
> }
>
> <----------------- CODE ENDS HERE ---------------------->
>
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Sat Apr 21 13:44:00 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 21 Apr 2007 10:44:00 -0700
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
Message-ID: <E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>

We don't have one yet. This is a new format introduced in the most  
recent release of FASTA.  Hopefully someone can make some time to add  
some code to SearchIO::fasta for it.

I do find that I when I need a fast FASTA to TAB converter that the  
simple script (fastam9_to_table) is more efficient that SearchIO  
framework so Ioannis is making a parallel one for the new m10  
output.  So I think having both is useful.

-jason
On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:

> I haven't kept track of this - did this go anywhere? Do we not have
> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> would be that that would be the desired solution; am I misled in  
> this?)
>
> 	-hilmar
>
> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>
>> I have reported it as a bug on the bugzilla but due to bugzilla
>> problems I
>> was not able to attach my code and/or sample m10 files.
>> Nevertheless here is the code that converts an m10 fasta output to
>> an m8
>> BLAST output which is parseable by the vast majority of software.
>>
>> <----------- CODE BEGINS HERE ------------------->
>>
>> #!/usr/bin/perl -w
>>
>> =head1 NAME
>>
>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
>> output
>>
>> =head1 SYNOPSIS
>>
>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>
>> =head1 DESCRIPTION
>>
>> Command line options:
>>   --header                -- boolean flag to print column header
>>   -o/--out                -- optional outputfile to write data,
>>                              otherwise will write to STDOUT
>>   -h/--help               -- show this documentation
>>
>> Not technically a SearchIO script as this doesn't use any Bioperl
>> components but is a useful and fast.  The output is tabular output
>> with the standard NCBI -m8 columns.
>>
>>  queryname
>>  hit name
>>  percent identity
>>  alignment length
>>  number mismatches
>>  number gaps
>>  query start  (if on rev-strand start > end)
>>  query end
>>  hit start (if on rev-strand start > end)
>>  hit end
>>  evalue
>>  bit score
>>
>> Additionally 4 more columns are provided
>>  percent similar
>>  query length
>>  hit length
>>  query gaps
>>  hit gaps
>>
>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>
>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>
>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>
>> Headers as well as portions of code were taken
>>> from fastam9_to_table.pl by Jason Stajich
>>
>> =head1 DISCLAIMER
>>
>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>
>> Permission to use, copy, modify, merge, publish and distribute
>> this software and its documentation, with or without modification,
>> for any purpose, and without fee or royalty to the copyright holder 
>> (s)
>> is hereby granted with no restictions and/or prerequisites.
>>
>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND  
>> NONINFRINGEMENT.
>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>
>> =cut
>>
>> use strict;
>> use Getopt::Long;
>>
>> my %data=();
>>
>> my $outfile=''; my $header='';
>> GetOptions(
>>     'header'              => \$header,
>>     'o|out|outfile:s'     => \$outfile,
>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>>        );
>>
>> my $outfh;
>> if( $outfile ) {
>>     open($outfh, ">$outfile") || die("$outfile: $!");
>> } else {
>>     $outfh = \*STDOUT;
>> }
>>
>>
>> $/="\n>>>";
>>
>> my @fields = qw(qname hname percid alen mmcount gapcount
>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
>> hgap);
>>
>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>> "\n" if
>> $header;
>>
>> while (<>) {
>>
>>         chomp;
>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>>         my @hits = split(/\d+>>/, $_);
>>         @hits= split("\n>>", $hits[0]);
>>
>>         my $hit = shift @hits;
>>
>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> +)/));
>>
>>         foreach my $align (@hits) {
>>
>>             my @details= split ("\n>", $align);
>>            my $detail = shift @details;
>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>>             $data{'bits'}=$1;
>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>>             $data{'evalue'}=$1;
>>
>>             my $term = quotemeta("; sw_score");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'score'}=$1;
>>
>>             $term = quotemeta("; sw_ident:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percid'}=$1;
>>
>>             $term = quotemeta("; sw_sim:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percsim'}=$1;
>>
>>             $term = quotemeta("; sw_overlap:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'alen'}=$1;
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             my $lakis ='';
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>
>>             $data{'qgap'}=($1 =~ tr/\-//);
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; sq_len:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hlen'}=$1;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>             $data{'hgap'}=($1 =~ tr/-//);
>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>>             $data{'mmcount'} = $data{'alen'} - ( int($data 
>> {'percid'} *
>> $data{'alen'}) + $data{'gapcount'});
>>
>> for ( $data{'percid'}, $data{'percsim'} ) {
>>     $_ = sprintf("%.2f",$_*100);
>> }
>>
>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>>         }
>>
>> }
>>
>> <----------------- CODE ENDS HERE ---------------------->
>>
>> -- 
>>
>> *Ioannis Kirmitzoglou*, MSc
>> PhD. Student,
>> Bioinformatics Research Laboratory
>> Department of Biological Sciences
>> University of Cyprus
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From akozik at atgc.org  Sat Apr 21 13:40:47 2007
From: akozik at atgc.org (Alexander Kozik)
Date: Sat, 21 Apr 2007 10:40:47 -0700
Subject: [Bioperl-l] ncbi blast -V T option
Message-ID: <462A4C9F.8010902@atgc.org>

Hi all,

It was many postings about parsing problems of stand-alone (local) NCBI 
Blast output of version 2.2.15 or later. Recently, I (re?)-discovered 
that Blast option '-V T' fixes the problem with old parsers I have. 
Option '-V T' generates detailed statistics after _each_ query sequence 
in Blast output, like:
... ... ...
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 17,650,109
Number of Sequences: 26534
Number of extensions: 430364
Number of successful extensions: 1496
Number of sequences better than 1.0e-020: 1
Number of HSP's better than  0.0 without gapping: 1400
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 1495
length of database: 11,047,616
effective HSP length: 96
effective length of database: 8,500,352
effective search space used: 1275052800
frameshift window, decay const: 40,  0.1
... ... ...

Option '-V F' (default) will generate statistics at the end of batch 
Blast output summarizing all query hits together.

Did I miss something from previous postings?
Sorry, if it was already discussed.

-Alex

-- 
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/


From gdorjee at hotmail.com  Sat Apr 21 15:14:05 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 21 Apr 2007 12:14:05 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
	<10024333.post@talk.nabble.com>
	<54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>
Message-ID: <10120148.post@talk.nabble.com>


hi
how do i check to see if i've installed the bioperl on my system properly. i
think i installed the bioperl-1.5.2_101 version, but i can't say for sure.
althought i can use some of the modules like Bio::SearchIO and
Bio::SearchIO, i can't seem to get the remote blast working for some reason.
is this something to do with the bioperl installation? i'm using perl v5.6.1
built for sun4-solaris-64int. 
i tried to install the same bioperl version on my Linux machine which has
perl v5.8.5 built for i386-linux-thread-multi, and it seem to give me the
same problem with the remote blast.
your help would be much appreciated.
thanks


Chris Fields wrote:
> 
> What version of bioperl are you using?  I get an error but it is b/c  
> the ID doesn't exist.
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: acc KPYK_ECOLI does not exist
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
> Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /Users/cjfields/src/bioperl- 
> live/Bio/DB/WebDBSeqI.pm:181
> STACK: genpept.pl:21
> -----------------------------------------------------------
> 
> The actual accession is 'KPYK1_ECOLI'.
> 
> chris
> 
> On Apr 16, 2007, at 3:42 PM, DeeGee wrote:
> 
>>
>> hi
>> i tried the following code just to check the network, and it worked  
>> fine
>> except for the SwissProt part, for which i got the error message  
>> instead of
>> the sequence:
>>
>> ------------- EXCEPTION  -------------
>> MSG: swissprot stream with no ID. Not swissprot in my book
>> STACK Bio::SeqIO::swiss::next_seq
>> /usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
>> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
>> /usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
>> STACK toplevel bbbbb.pl:21
>> --------------------------------------
>>
>> #### check #####
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::GenBank;
>> use Bio::DB::SwissProt;
>> use Bio::DB::GenPept;
>> use Bio::SeqIO;
>>
>> my $genpeptdb = new Bio::DB::GenPept();
>> my $genbankdb = new Bio::DB::GenBank();
>> my $swissdb = new Bio::DB::SwissProt();
>>
>> my $seqio = new Bio::SeqIO(-format => 'fasta',
>>                            -fh     => \*STDOUT);
>>
>> my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
>> $seqio->write_seq($protseq);
>>
>> my $seq = $genbankdb->get_Seq_by_acc('AF303112');
>> $seqio->write_seq($seq);
>>
>> $protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
>> $seqio->write_seq($protseq);
>>
>> thanks a lot.
>>
>>
>> Chris Fields wrote:
>>>
>>> The 'verbose' setting doesn't change the way the BLAST query is sent,
>>> it just sends the raw output from the repeated attempts to retrieve
>>> the report (using the RID) to STDERR.  The error you saw won't be
>>> fixed by doing so.
>>>
>>> What I was interested in was the raw HTML output dumped to the
>>> screen.  If it is querying the NCBI server it should dump stuff that
>>> includes something like this:
>>>
>>> ...
>>> <HTML>
>>> <p></p>
>>> <!--
>>> QBlastInfoBegin
>>>          Status=WAITING
>>> QBlastInfoEnd
>>> --><p></p>
>>> <SCRIPT LANGUAGE="JavaScript"><!--
>>> ...
>>>
>>> which indicates you have a request in the BLAST queue.  If you aren't
>>> seeing anything then the problem is likely network-related on your
>>> end, so getting the latest RemoteBlast won't help.  Do any other
>>> BioPerl modules requiring network access work (Bio::DB::GenBank, for
>>> instance)?  If not it could be a proxy issue...
>>>
>>> Just in case, here's the browsable CVS location for RemoteBlast:
>>>
>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>>> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
>>>
>>> Click on the download link and save over your local version.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
>>>
>>>>
>>>> hi Chris,
>>>> thanks for your reply. i set the RemoteBlast factory to a verbosity
>>>> of 1,
>>>> and i get the same error message. i'm new to all these. so, could
>>>> you plz
>>>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>>>
>>>> cheers!!!
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/error-while- 
>> remote-blast-against-swissprot-db-tf3577674.html#a10024333
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10120148
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Sat Apr 21 16:09:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 21 Apr 2007 15:09:48 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
Message-ID: <A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>

Ioannis's fastm10_to_table script is available in the bugzilla  
enhancement request (as an attachment) if anyone's interested:

http://bugzilla.open-bio.org/show_bug.cgi?id=2278

I haven't had a chance to really look into m10 output yet but it  
looks easy enough to parse; may not be hard to get something SearchIO- 
based up and running.

chris

On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:

> We don't have one yet. This is a new format introduced in the most
> recent release of FASTA.  Hopefully someone can make some time to add
> some code to SearchIO::fasta for it.
>
> I do find that I when I need a fast FASTA to TAB converter that the
> simple script (fastam9_to_table) is more efficient that SearchIO
> framework so Ioannis is making a parallel one for the new m10
> output.  So I think having both is useful.
>
> -jason
> On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
>
>> I haven't kept track of this - did this go anywhere? Do we not have
>> an -m10 fasta output parser in SearchIO? (I.e., my first thought
>> would be that that would be the desired solution; am I misled in
>> this?)
>>
>> 	-hilmar
>>
>> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>>
>>> I have reported it as a bug on the bugzilla but due to bugzilla
>>> problems I
>>> was not able to attach my code and/or sample m10 files.
>>> Nevertheless here is the code that converts an m10 fasta output to
>>> an m8
>>> BLAST output which is parseable by the vast majority of software.
>>>
>>> <----------- CODE BEGINS HERE ------------------->
>>>
>>> #!/usr/bin/perl -w
>>>
>>> =head1 NAME
>>>
>>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
>>> output
>>>
>>> =head1 SYNOPSIS
>>>
>>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>>
>>> =head1 DESCRIPTION
>>>
>>> Command line options:
>>>   --header                -- boolean flag to print column header
>>>   -o/--out                -- optional outputfile to write data,
>>>                              otherwise will write to STDOUT
>>>   -h/--help               -- show this documentation
>>>
>>> Not technically a SearchIO script as this doesn't use any Bioperl
>>> components but is a useful and fast.  The output is tabular output
>>> with the standard NCBI -m8 columns.
>>>
>>>  queryname
>>>  hit name
>>>  percent identity
>>>  alignment length
>>>  number mismatches
>>>  number gaps
>>>  query start  (if on rev-strand start > end)
>>>  query end
>>>  hit start (if on rev-strand start > end)
>>>  hit end
>>>  evalue
>>>  bit score
>>>
>>> Additionally 4 more columns are provided
>>>  percent similar
>>>  query length
>>>  hit length
>>>  query gaps
>>>  hit gaps
>>>
>>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>>
>>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>>
>>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>>
>>> Headers as well as portions of code were taken
>>>> from fastam9_to_table.pl by Jason Stajich
>>>
>>> =head1 DISCLAIMER
>>>
>>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>>
>>> Permission to use, copy, modify, merge, publish and distribute
>>> this software and its documentation, with or without modification,
>>> for any purpose, and without fee or royalty to the copyright holder
>>> (s)
>>> is hereby granted with no restictions and/or prerequisites.
>>>
>>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>> NONINFRINGEMENT.
>>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>>
>>> =cut
>>>
>>> use strict;
>>> use Getopt::Long;
>>>
>>> my %data=();
>>>
>>> my $outfile=''; my $header='';
>>> GetOptions(
>>>     'header'              => \$header,
>>>     'o|out|outfile:s'     => \$outfile,
>>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>>>        );
>>>
>>> my $outfh;
>>> if( $outfile ) {
>>>     open($outfh, ">$outfile") || die("$outfile: $!");
>>> } else {
>>>     $outfh = \*STDOUT;
>>> }
>>>
>>>
>>> $/="\n>>>";
>>>
>>> my @fields = qw(qname hname percid alen mmcount gapcount
>>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
>>> hgap);
>>>
>>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>>> "\n" if
>>> $header;
>>>
>>> while (<>) {
>>>
>>>         chomp;
>>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>>>         my @hits = split(/\d+>>/, $_);
>>>         @hits= split("\n>>", $hits[0]);
>>>
>>>         my $hit = shift @hits;
>>>
>>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>>> +)/));
>>>
>>>         foreach my $align (@hits) {
>>>
>>>             my @details= split ("\n>", $align);
>>>            my $detail = shift @details;
>>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>>>             $data{'bits'}=$1;
>>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>>>             $data{'evalue'}=$1;
>>>
>>>             my $term = quotemeta("; sw_score");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'score'}=$1;
>>>
>>>             $term = quotemeta("; sw_ident:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'percid'}=$1;
>>>
>>>             $term = quotemeta("; sw_sim:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'percsim'}=$1;
>>>
>>>             $term = quotemeta("; sw_overlap:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'alen'}=$1;
>>>
>>>             $detail = shift @details;
>>>
>>>             $term = quotemeta("; al_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'qstart'}=$1;
>>>
>>>             $term = quotemeta("; al_stop:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'qend'}=$1;
>>>
>>>             $term = quotemeta("; al_display_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             my $lakis ='';
>>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>>
>>>             $data{'qgap'}=($1 =~ tr/\-//);
>>>
>>>             $detail = shift @details;
>>>
>>>             $term = quotemeta("; sq_len:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hlen'}=$1;
>>>
>>>             $term = quotemeta("; al_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hstart'}=$1;
>>>
>>>             $term = quotemeta("; al_stop:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hend'}=$1;
>>>
>>>             $term = quotemeta("; al_display_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>>             $data{'hgap'}=($1 =~ tr/-//);
>>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>>>             $data{'mmcount'} = $data{'alen'} - ( int($data
>>> {'percid'} *
>>> $data{'alen'}) + $data{'gapcount'});
>>>
>>> for ( $data{'percid'}, $data{'percsim'} ) {
>>>     $_ = sprintf("%.2f",$_*100);
>>> }
>>>
>>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>>>         }
>>>
>>> }
>>>
>>> <----------------- CODE ENDS HERE ---------------------->
>>>
>>> -- 
>>>
>>> *Ioannis Kirmitzoglou*, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at i2r.a-star.edu.sg  Sun Apr 22 07:59:28 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sun, 22 Apr 2007 19:59:28 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	
	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com><3ACF03E372996
	C4EACD542EA8A05E66A061684@mailbe01.teak.local.net><AAF82F3A-3C75-4D51-AFD4-
	FDE358391A03@fruitfly.org>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061690@mailbe01.teak.local.net>


Hi Chris,
 
I've downloaded GO Database.
Which of these we should install in our MySQL database,
so that it can be used for GO::AppHandle task below?
 
-rw-rw-r--   1 ewijaya ewijaya 1.6G Apr  9 12:23 go_200704-assocdb-data
-rw-rw-r--   1 ewijaya ewijaya 483M Apr  9 12:23 go_200704-assocdb.rdf-xml
-rw-rw-r--   1 ewijaya ewijaya 3.2K Apr  9 12:23 go_200704-assocdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  7 00:41 go_200704-assocdb-tables
-rw-rw-r--   1 ewijaya ewijaya 3.3K Apr  9 12:23 go_200704-obo-xml.dtd
-rw-rw-r--   1 ewijaya ewijaya 4.5K Apr  9 12:23 go_200704-rdf.dtd
-rw-rw-r--   1 ewijaya ewijaya  29K Apr  9 12:23 go_200704-schema-mysql.sql
-rw-rw-r--   1 ewijaya ewijaya 3.1G Apr  9 12:25 go_200704-seqdb-data
-rw-rw-r--   1 ewijaya ewijaya  93M Apr  9 12:26 go_200704-seqdb.fasta
-rw-rw-r--   1 ewijaya ewijaya 3.2K Apr  9 12:25 go_200704-seqdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  8 05:38 go_200704-seqdb-tables
-rw-rw-r--   1 ewijaya ewijaya  51M Apr  9 12:26 go_200704-termdb-data
-rw-rw-r--   1 ewijaya ewijaya  18M Apr  9 12:26 go_200704-termdb.obo-xml
-rw-rw-r--   1 ewijaya ewijaya  39M Apr  9 12:26 go_200704-termdb.owl
-rw-rw-r--   1 ewijaya ewijaya  29M Apr  9 12:26 go_200704-termdb.rdf-xml
-rw-rw-r--   1 ewijaya ewijaya  749 Apr  9 12:26 go_200704-termdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  2 00:31 go_200704-termdb-tables
drwxrwxr-x  22 ewijaya ewijaya 4.0K Apr  1 23:35 go_200704-utilities-src

Or is there a way we can upload all of them automatically to mysql database?
Thanks and hope to hear from you again.
 
--
Edward
 

________________________________

From: Chris Mungall [mailto:cjm at fruitfly.org]
Sent: Tue 4/17/2007 2:49 AM
To: Wijaya Edward
Cc: spiros at lokku.com; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


Download:
http://search.cpan.org/~cmungall/go-db-perl

or do:

cpan GO::AppHandle

The API call you want is here:
http://search.cpan.org/~cmungall/go-db-perl/GO/
AppHandle.pm#get_deep_products

Here is an example snippet:

   use GO::AppHandle;
   my $apph=GO::AppHandle->connect(@ARGV);
   my $go_acc = shift @ARGV;
   my $gps = $apph->get_deep_products({term=>{acc=>$go_acc}});
   foreach my $gp (@$gps) {
     printf "%s %s\n", $gp->xref->acc, $gp->symbol;
   }

You will need to download the GO Database.

Cheers
Chris

On Apr 16, 2007, at 8:14 AM, Wijaya Edward wrote:

>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) 
> with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer 
>> -------------
>> This email is confidential and may be privileged.  If you are not 
>> the intended recipient, please delete it and notify us 
>> immediately. Please do not copy or use it for any purpose, or 
>> disclose its contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer 
> -------------
> This email is confidential and may be privileged.  If you are not 
> the intended recipient, please delete it and notify us immediately. 
> Please do not copy or use it for any purpose, or disclose its 
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From ioanniskirmitzoglou at gmail.com  Sun Apr 22 13:11:35 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Sun, 22 Apr 2007 20:11:35 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
Message-ID: <b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>

I agree with Jason. Both scripts (fastam9_to_table and fastam10_to_table)
are way faster and easier to use than the searchIO. Still, there are a lot
of cases where searchIO support for m10 would be useful (e.g when trying to
represent the alignment in a graphical way).
Nevertheless I do think that FASTA needs an output similar to the BLAST m8
one which is really compact. Although I haven't tried it yet I do believe
that both scripts can be piped, so one easy and rather fast way to produce
an tabular output from FASTA would be to pipe its output directly to one of
the scripts.
-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


On 21/04/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> Ioannis's fastm10_to_table script is available in the bugzilla
> enhancement request (as an attachment) if anyone's interested:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2278
>
> I haven't had a chance to really look into m10 output yet but it
> looks easy enough to parse; may not be hard to get something SearchIO-
> based up and running.
>
> chris
>
> On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:
>
> > We don't have one yet. This is a new format introduced in the most
> > recent release of FASTA.  Hopefully someone can make some time to add
> > some code to SearchIO::fasta for it.
> >
> > I do find that I when I need a fast FASTA to TAB converter that the
> > simple script (fastam9_to_table) is more efficient that SearchIO
> > framework so Ioannis is making a parallel one for the new m10
> > output.  So I think having both is useful.
> >
> > -jason
> > On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
> >
> >> I haven't kept track of this - did this go anywhere? Do we not have
> >> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> >> would be that that would be the desired solution; am I misled in
> >> this?)
> >>
> >>      -hilmar
> >>
> >> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
> >>
> >>> I have reported it as a bug on the bugzilla but due to bugzilla
> >>> problems I
> >>> was not able to attach my code and/or sample m10 files.
> >>> Nevertheless here is the code that converts an m10 fasta output to
> >>> an m8
> >>> BLAST output which is parseable by the vast majority of software.
> >>>
> >>> <----------- CODE BEGINS HERE ------------------->
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> =head1 NAME
> >>>
> >>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
> >>> output
> >>>
> >>> =head1 SYNOPSIS
> >>>
> >>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
> >>>
> >>> =head1 DESCRIPTION
> >>>
> >>> Command line options:
> >>>   --header                -- boolean flag to print column header
> >>>   -o/--out                -- optional outputfile to write data,
> >>>                              otherwise will write to STDOUT
> >>>   -h/--help               -- show this documentation
> >>>
> >>> Not technically a SearchIO script as this doesn't use any Bioperl
> >>> components but is a useful and fast.  The output is tabular output
> >>> with the standard NCBI -m8 columns.
> >>>
> >>>  queryname
> >>>  hit name
> >>>  percent identity
> >>>  alignment length
> >>>  number mismatches
> >>>  number gaps
> >>>  query start  (if on rev-strand start > end)
> >>>  query end
> >>>  hit start (if on rev-strand start > end)
> >>>  hit end
> >>>  evalue
> >>>  bit score
> >>>
> >>> Additionally 4 more columns are provided
> >>>  percent similar
> >>>  query length
> >>>  hit length
> >>>  query gaps
> >>>  hit gaps
> >>>
> >>> =head1 AUTHOR - Ioannis Kirmitzoglou
> >>>
> >>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
> >>>
> >>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
> >>>
> >>> Headers as well as portions of code were taken
> >>>> from fastam9_to_table.pl by Jason Stajich
> >>>
> >>> =head1 DISCLAIMER
> >>>
> >>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
> >>>
> >>> Permission to use, copy, modify, merge, publish and distribute
> >>> this software and its documentation, with or without modification,
> >>> for any purpose, and without fee or royalty to the copyright holder
> >>> (s)
> >>> is hereby granted with no restictions and/or prerequisites.
> >>>
> >>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> >>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> >>> NONINFRINGEMENT.
> >>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> >>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> >>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> >>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> >>>
> >>> =cut
> >>>
> >>> use strict;
> >>> use Getopt::Long;
> >>>
> >>> my %data=();
> >>>
> >>> my $outfile=''; my $header='';
> >>> GetOptions(
> >>>     'header'              => \$header,
> >>>     'o|out|outfile:s'     => \$outfile,
> >>>     'h|help'              => sub { exec('perldoc',$0); exit; }
> >>>        );
> >>>
> >>> my $outfh;
> >>> if( $outfile ) {
> >>>     open($outfh, ">$outfile") || die("$outfile: $!");
> >>> } else {
> >>>     $outfh = \*STDOUT;
> >>> }
> >>>
> >>>
> >>> $/="\n>>>";
> >>>
> >>> my @fields = qw(qname hname percid alen mmcount gapcount
> >>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
> >>> hgap);
> >>>
> >>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
> >>> "\n" if
> >>> $header;
> >>>
> >>> while (<>) {
> >>>
> >>>         chomp;
> >>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
> >>>         my @hits = split(/\d+>>/, $_);
> >>>         @hits= split("\n>>", $hits[0]);
> >>>
> >>>         my $hit = shift @hits;
> >>>
> >>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
> >>> +)/));
> >>>
> >>>         foreach my $align (@hits) {
> >>>
> >>>             my @details= split ("\n>", $align);
> >>>            my $detail = shift @details;
> >>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
> >>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
> >>>             $data{'bits'}=$1;
> >>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
> >>>             $data{'evalue'}=$1;
> >>>
> >>>             my $term = quotemeta("; sw_score");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'score'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_ident:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'percid'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_sim:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'percsim'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_overlap:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'alen'}=$1;
> >>>
> >>>             $detail = shift @details;
> >>>
> >>>             $term = quotemeta("; al_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'qstart'}=$1;
> >>>
> >>>             $term = quotemeta("; al_stop:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'qend'}=$1;
> >>>
> >>>             $term = quotemeta("; al_display_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             my $lakis ='';
> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
> >>>
> >>>             $data{'qgap'}=($1 =~ tr/\-//);
> >>>
> >>>             $detail = shift @details;
> >>>
> >>>             $term = quotemeta("; sq_len:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hlen'}=$1;
> >>>
> >>>             $term = quotemeta("; al_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hstart'}=$1;
> >>>
> >>>             $term = quotemeta("; al_stop:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hend'}=$1;
> >>>
> >>>             $term = quotemeta("; al_display_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
> >>>             $data{'hgap'}=($1 =~ tr/-//);
> >>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
> >>>             $data{'mmcount'} = $data{'alen'} - ( int($data
> >>> {'percid'} *
> >>> $data{'alen'}) + $data{'gapcount'});
> >>>
> >>> for ( $data{'percid'}, $data{'percsim'} ) {
> >>>     $_ = sprintf("%.2f",$_*100);
> >>> }
> >>>
> >>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
> >>>         }
> >>>
> >>> }
> >>>
> >>> <----------------- CODE ENDS HERE ---------------------->
> >>>
> >>> --
> >>>
> >>> *Ioannis Kirmitzoglou*, MSc
> >>> PhD. Student,
> >>> Bioinformatics Research Laboratory
> >>> Department of Biological Sciences
> >>> University of Cyprus
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org
> > http://jason.open-bio.org/
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From jason at bioperl.org  Sun Apr 22 16:24:23 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 22 Apr 2007 13:24:23 -0700
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
Message-ID: <69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>

I do think that m9 is pretty compact if you don't need to see the  
alignment and just want the pairwise statistics and is analogous to  
BLAST m8/9 format.   I typically just use that + fastam9_to_table for  
input to MCL and other systems that can process tabular formats.

I cleaned up a few things in SearchIO::fasta but have not been able  
to see whether we can auto-detect m10 format and insert the necessary  
code just yet.

-jason
On Apr 22, 2007, at 10:11 AM, Ioannis Kirmitzoglou wrote:

> I agree with Jason. Both scripts (fastam9_to_table and  
> fastam10_to_table)
> are way faster and easier to use than the searchIO. Still, there  
> are a lot
> of cases where searchIO support for m10 would be useful (e.g when  
> trying to
> represent the alignment in a graphical way).
> Nevertheless I do think that FASTA needs an output similar to the  
> BLAST m8
> one which is really compact. Although I haven't tried it yet I do  
> believe
> that both scripts can be piped, so one easy and rather fast way to  
> produce
> an tabular output from FASTA would be to pipe its output directly  
> to one of
> the scripts.
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
>
>
>
> On 21/04/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>> Ioannis's fastm10_to_table script is available in the bugzilla
>> enhancement request (as an attachment) if anyone's interested:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2278
>>
>> I haven't had a chance to really look into m10 output yet but it
>> looks easy enough to parse; may not be hard to get something  
>> SearchIO-
>> based up and running.
>>
>> chris
>>
>> On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:
>>
>> > We don't have one yet. This is a new format introduced in the most
>> > recent release of FASTA.  Hopefully someone can make some time  
>> to add
>> > some code to SearchIO::fasta for it.
>> >
>> > I do find that I when I need a fast FASTA to TAB converter that the
>> > simple script (fastam9_to_table) is more efficient that SearchIO
>> > framework so Ioannis is making a parallel one for the new m10
>> > output.  So I think having both is useful.
>> >
>> > -jason
>> > On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
>> >
>> >> I haven't kept track of this - did this go anywhere? Do we not  
>> have
>> >> an -m10 fasta output parser in SearchIO? (I.e., my first thought
>> >> would be that that would be the desired solution; am I misled in
>> >> this?)
>> >>
>> >>      -hilmar
>> >>
>> >> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>> >>
>> >>> I have reported it as a bug on the bugzilla but due to bugzilla
>> >>> problems I
>> >>> was not able to attach my code and/or sample m10 files.
>> >>> Nevertheless here is the code that converts an m10 fasta  
>> output to
>> >>> an m8
>> >>> BLAST output which is parseable by the vast majority of software.
>> >>>
>> >>> <----------- CODE BEGINS HERE ------------------->
>> >>>
>> >>> #!/usr/bin/perl -w
>> >>>
>> >>> =head1 NAME
>> >>>
>> >>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8  
>> tabular
>> >>> output
>> >>>
>> >>> =head1 SYNOPSIS
>> >>>
>> >>>  fastam10_to_table [--header] [-o outfile] inputfile1  
>> inputfile2 ...
>> >>>
>> >>> =head1 DESCRIPTION
>> >>>
>> >>> Command line options:
>> >>>   --header                -- boolean flag to print column header
>> >>>   -o/--out                -- optional outputfile to write data,
>> >>>                              otherwise will write to STDOUT
>> >>>   -h/--help               -- show this documentation
>> >>>
>> >>> Not technically a SearchIO script as this doesn't use any Bioperl
>> >>> components but is a useful and fast.  The output is tabular  
>> output
>> >>> with the standard NCBI -m8 columns.
>> >>>
>> >>>  queryname
>> >>>  hit name
>> >>>  percent identity
>> >>>  alignment length
>> >>>  number mismatches
>> >>>  number gaps
>> >>>  query start  (if on rev-strand start > end)
>> >>>  query end
>> >>>  hit start (if on rev-strand start > end)
>> >>>  hit end
>> >>>  evalue
>> >>>  bit score
>> >>>
>> >>> Additionally 4 more columns are provided
>> >>>  percent similar
>> >>>  query length
>> >>>  hit length
>> >>>  query gaps
>> >>>  hit gaps
>> >>>
>> >>> =head1 AUTHOR - Ioannis Kirmitzoglou
>> >>>
>> >>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>> >>>
>> >>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>> >>>
>> >>> Headers as well as portions of code were taken
>> >>>> from fastam9_to_table.pl by Jason Stajich
>> >>>
>> >>> =head1 DISCLAIMER
>> >>>
>> >>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>> >>>
>> >>> Permission to use, copy, modify, merge, publish and distribute
>> >>> this software and its documentation, with or without  
>> modification,
>> >>> for any purpose, and without fee or royalty to the copyright  
>> holder
>> >>> (s)
>> >>> is hereby granted with no restictions and/or prerequisites.
>> >>>
>> >>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> >>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE  
>> WARRANTIES OF
>> >>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> >>> NONINFRINGEMENT.
>> >>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE  
>> FOR ANY
>> >>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF  
>> CONTRACT,
>> >>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> >>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>> >>>
>> >>> =cut
>> >>>
>> >>> use strict;
>> >>> use Getopt::Long;
>> >>>
>> >>> my %data=();
>> >>>
>> >>> my $outfile=''; my $header='';
>> >>> GetOptions(
>> >>>     'header'              => \$header,
>> >>>     'o|out|outfile:s'     => \$outfile,
>> >>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>> >>>        );
>> >>>
>> >>> my $outfh;
>> >>> if( $outfile ) {
>> >>>     open($outfh, ">$outfile") || die("$outfile: $!");
>> >>> } else {
>> >>>     $outfh = \*STDOUT;
>> >>> }
>> >>>
>> >>>
>> >>> $/="\n>>>";
>> >>>
>> >>> my @fields = qw(qname hname percid alen mmcount gapcount
>> >>>         qstart qend hstart hend evalue bits percsim qlen hlen  
>> qgap
>> >>> hgap);
>> >>>
>> >>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) }  
>> @fields)),
>> >>> "\n" if
>> >>> $header;
>> >>>
>> >>> while (<>) {
>> >>>
>> >>>         chomp;
>> >>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>> >>>         my @hits = split(/\d+>>/, $_);
>> >>>         @hits= split("\n>>", $hits[0]);
>> >>>
>> >>>         my $hit = shift @hits;
>> >>>
>> >>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> >>> +)/));
>> >>>
>> >>>         foreach my $align (@hits) {
>> >>>
>> >>>             my @details= split ("\n>", $align);
>> >>>            my $detail = shift @details;
>> >>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>> >>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>> >>>             $data{'bits'}=$1;
>> >>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>> >>>             $data{'evalue'}=$1;
>> >>>
>> >>>             my $term = quotemeta("; sw_score");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'score'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_ident:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'percid'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_sim:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'percsim'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_overlap:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'alen'}=$1;
>> >>>
>> >>>             $detail = shift @details;
>> >>>
>> >>>             $term = quotemeta("; al_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'qstart'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_stop:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'qend'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_display_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             my $lakis ='';
>> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>> >>>
>> >>>             $data{'qgap'}=($1 =~ tr/\-//);
>> >>>
>> >>>             $detail = shift @details;
>> >>>
>> >>>             $term = quotemeta("; sq_len:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hlen'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hstart'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_stop:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hend'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_display_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>> >>>             $data{'hgap'}=($1 =~ tr/-//);
>> >>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>> >>>             $data{'mmcount'} = $data{'alen'} - ( int($data
>> >>> {'percid'} *
>> >>> $data{'alen'}) + $data{'gapcount'});
>> >>>
>> >>> for ( $data{'percid'}, $data{'percsim'} ) {
>> >>>     $_ = sprintf("%.2f",$_*100);
>> >>> }
>> >>>
>> >>>             print $outfh join( "\t",map { $data{$_} }  
>> @fields),"\n"
>> >>>         }
>> >>>
>> >>> }
>> >>>
>> >>> <----------------- CODE ENDS HERE ---------------------->
>> >>>
>> >>> --
>> >>>
>> >>> *Ioannis Kirmitzoglou*, MSc
>> >>> PhD. Student,
>> >>> Bioinformatics Research Laboratory
>> >>> Department of Biological Sciences
>> >>> University of Cyprus
>> >>> _______________________________________________
>> >>> Bioperl-l mailing list
>> >>> Bioperl-l at lists.open-bio.org
>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> >> --
>> >> ===========================================================
>> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> >> ===========================================================
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> > --
>> > Jason Stajich
>> > jason at bioperl.org
>> > http://jason.open-bio.org/
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From ioanniskirmitzoglou at gmail.com  Mon Apr 23 05:45:53 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Mon, 23 Apr 2007 12:45:53 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
References: <10034698.post@talk.nabble.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
	<69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
Message-ID: <b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>

I don't know about older versions but the latest version of FASTA starts its
output with a line similar to those:
# fasta34.exe -m9 -d0 -Q test.faa test.faa OR
# fasta34.exe -m10 -Q test.faa test.faa

This very first line is also the only one in the output that starts with
'#'.
Isn't this an easy way to determine the output type?


-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


From cjfields at uiuc.edu  Mon Apr 23 08:46:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Apr 2007 07:46:40 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
	<69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
	<b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>
Message-ID: <333A7BEF-71E3-4E15-B2EC-384AEBAA13B7@uiuc.edu>

That's true, but older versions of fasta don't do this.  For  
instance, the example files in the bioperl distribution in t/data  
(HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this  
line.

 From the fasta changelog:

-------------------------------------------------------------
 >>Nov 14-22, 2002  CVS fa34t20b6

Include compile-time define (-DPGM_DOC) that causes all the fasta
programs to provide the same command line echo that is provided by the
PVM and MPI parallel programs.  Thus, if you run the program:

     fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12

the first lines of output from FASTA will be:

     # fasta34_t -q gtt1_drome.aa /slib/swissprot
      FASTA searches a protein or DNA sequence data bank
      version 3.4t20 Nov 10, 2002
     Please cite:
      W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

This has been turned on by default in most FASTA Makefiles.
-------------------------------------------------------------

We could only support newer fasta output (newer that the above  
version) since there have been several bug fixes and changes to  
output; not sure how everyone else feels about this.

chris

On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:

> I don't know about older versions but the latest version of FASTA  
> starts its
> output with a line similar to those:
> # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
> # fasta34.exe -m10 -Q test.faa test.faa
>
> This very first line is also the only one in the output that starts  
> with
> '#'.
> Isn't this an easy way to determine the output type?
>
>
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Apr 23 09:49:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Apr 2007 08:49:45 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>
References: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>
Message-ID: <12707EA8-F245-4AE7-BFD1-EE861F431F3D@uiuc.edu>

Aaron,

I find -m 10 defined way back in fasta2 notes:

--------------------------------------------------------------
Changes with 2.0x4  (January, 1996)

The major change in with 2.0x4 is the ability to get a parseable
output from FASTA/TFASTA/SSEARCH.  This can be done using output
option -m 10.  ...
--------------------------------------------------------------

It goes on to define it in more detail (which is nice to have  
around!).  It's possible it wasn't implemented until recently for  
fasta3 but I find references to it in the various fasta3 notes going  
back to at least 2001, so maybe it wasn't not compiled by default  
until recently?  The extra '#' line was added in 2002 to all output  
as far as I can tell.

We could just have SearchIO::fasta fall back to default parsing if  
'#' isn't present.  The default format and m10 are sufficiently  
different enough that we probably want to separate m10 parsing into  
it's own parser subroutine so we don't screw with the default parsing  
too much.

chris

On Apr 23, 2007, at 8:29 AM, aaron.j.mackey at gsk.com wrote:

> Since -m10 is newer than PGM_DOC, you should be fine to use the  
> first line
> as a detection for m10, when that first line exists (when it does  
> not, the
> format cannot be m10, unless someone has re-compiled FASTA with an
> undefined PGM_DOC).
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 04/23/2007 08:46:40 AM:
>
>> That's true, but older versions of fasta don't do this.  For
>> instance, the example files in the bioperl distribution in t/data
>> (HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this
>> line.
>>
>>  From the fasta changelog:
>>
>> -------------------------------------------------------------
>>>> Nov 14-22, 2002  CVS fa34t20b6
>>
>> Include compile-time define (-DPGM_DOC) that causes all the fasta
>> programs to provide the same command line echo that is provided by  
>> the
>> PVM and MPI parallel programs.  Thus, if you run the program:
>>
>>      fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12
>>
>> the first lines of output from FASTA will be:
>>
>>      # fasta34_t -q gtt1_drome.aa /slib/swissprot
>>       FASTA searches a protein or DNA sequence data bank
>>       version 3.4t20 Nov 10, 2002
>>      Please cite:
>>       W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
>>
>> This has been turned on by default in most FASTA Makefiles.
>> -------------------------------------------------------------
>>
>> We could only support newer fasta output (newer that the above
>> version) since there have been several bug fixes and changes to
>> output; not sure how everyone else feels about this.
>>
>> chris
>>
>> On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:
>>
>>> I don't know about older versions but the latest version of FASTA
>>> starts its
>>> output with a line similar to those:
>>> # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
>>> # fasta34.exe -m10 -Q test.faa test.faa
>>>
>>> This very first line is also the only one in the output that starts
>>> with
>>> '#'.
>>> Isn't this an easy way to determine the output type?
>>>
>>>
>>> -- 
>>>
>>> *Ioannis Kirmitzoglou*, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From aaron.j.mackey at gsk.com  Mon Apr 23 09:29:39 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 23 Apr 2007 09:29:39 -0400
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <333A7BEF-71E3-4E15-B2EC-384AEBAA13B7@uiuc.edu>
Message-ID: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>

Since -m10 is newer than PGM_DOC, you should be fine to use the first line 
as a detection for m10, when that first line exists (when it does not, the 
format cannot be m10, unless someone has re-compiled FASTA with an 
undefined PGM_DOC).

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 04/23/2007 08:46:40 AM:

> That's true, but older versions of fasta don't do this.  For 
> instance, the example files in the bioperl distribution in t/data 
> (HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this 
> line.
> 
>  From the fasta changelog:
> 
> -------------------------------------------------------------
>  >>Nov 14-22, 2002  CVS fa34t20b6
> 
> Include compile-time define (-DPGM_DOC) that causes all the fasta
> programs to provide the same command line echo that is provided by the
> PVM and MPI parallel programs.  Thus, if you run the program:
> 
>      fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12
> 
> the first lines of output from FASTA will be:
> 
>      # fasta34_t -q gtt1_drome.aa /slib/swissprot
>       FASTA searches a protein or DNA sequence data bank
>       version 3.4t20 Nov 10, 2002
>      Please cite:
>       W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
> 
> This has been turned on by default in most FASTA Makefiles.
> -------------------------------------------------------------
> 
> We could only support newer fasta output (newer that the above 
> version) since there have been several bug fixes and changes to 
> output; not sure how everyone else feels about this.
> 
> chris
> 
> On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:
> 
> > I don't know about older versions but the latest version of FASTA 
> > starts its
> > output with a line similar to those:
> > # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
> > # fasta34.exe -m10 -Q test.faa test.faa
> >
> > This very first line is also the only one in the output that starts 
> > with
> > '#'.
> > Isn't this an easy way to determine the output type?
> >
> >
> > -- 
> >
> > *Ioannis Kirmitzoglou*, MSc
> > PhD. Student,
> > Bioinformatics Research Laboratory
> > Department of Biological Sciences
> > University of Cyprus
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bix at sendu.me.uk  Tue Apr 24 06:21:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 24 Apr 2007 11:21:29 +0100
Subject: [Bioperl-l] WrapperBase / StandAloneBlast executable() method
	confusion
Message-ID: <462DDA29.4090104@sendu.me.uk>

Hi,

I'm a little unsure of the intent for executable() in wrapper modules. 
The WrapperBase version of the method and the StandAloneBlast version 
have the same POD but different implementations.

WrapperBase takes as a first arg an 'exe' which it will blindly trust is 
the path to a working executable. (That doesn't seem sensible already.) 
It is only capable of storing one such path.

If no arg is supplied it uses program_path() (which uses program_name()) 
to find the executable. Failing that it does a further direct test on 
program_name() to see if its executable.


StandAloneBlast takes as a first arg merely the name of your exe and 
also (undocumented) the path to the corresponding executable (which is 
tested to see if it really executable). It can store executable paths 
for multiple different exenames (corresponding better with the docs for 
the first arg: "name of executable to set path to").

If no second arg is supplied it does something similar to WrapperBase, 
except that it uses the first arg exename (or a default if that wasn't 
supplied) in place of program_name().


I'm trying to generalize this so StandAloneBlast can just use the 
WrapperBase version (and so other wrappers can then store executable 
paths for different sub-programs). Any suggestions for a good way of 
melding these two together whilst somehow retaining backward compatibility?


From cjfields at uiuc.edu  Tue Apr 24 08:55:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 24 Apr 2007 07:55:43 -0500
Subject: [Bioperl-l] WrapperBase / StandAloneBlast executable() method
	confusion
In-Reply-To: <462DDA29.4090104@sendu.me.uk>
References: <462DDA29.4090104@sendu.me.uk>
Message-ID: <8F1427D6-8654-461E-B9AA-E51CC3A20318@uiuc.edu>

I'm not sure, but you might want to bring Torsten in on this as he  
took over maintaining StandAloneBlast.  Much of the confusion may  
stem from the independent evolution of StandAloneBlast and WrapperBase.

Also, (a bit unrelated), there were plans for unifying the  
Bio::Tools::Run BLAST modules described here:

http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

Seemed like there was a general consensus at the time on the need to  
refactor StandAloneBlast and RemoteBlast code, so maybe the best  
place to start is StandAloneBlast (the others could be added in from  
there).  We could just deprecate use of the older modules at some  
point in favor of the new scheme.

chris

On Apr 24, 2007, at 5:21 AM, Sendu Bala wrote:

> Hi,
>
> I'm a little unsure of the intent for executable() in wrapper modules.
> The WrapperBase version of the method and the StandAloneBlast version
> have the same POD but different implementations.
>
> WrapperBase takes as a first arg an 'exe' which it will blindly  
> trust is
> the path to a working executable. (That doesn't seem sensible  
> already.)
> It is only capable of storing one such path.
>
> If no arg is supplied it uses program_path() (which uses  
> program_name())
> to find the executable. Failing that it does a further direct test on
> program_name() to see if its executable.
>
>
> StandAloneBlast takes as a first arg merely the name of your exe and
> also (undocumented) the path to the corresponding executable (which is
> tested to see if it really executable). It can store executable paths
> for multiple different exenames (corresponding better with the docs  
> for
> the first arg: "name of executable to set path to").
>
> If no second arg is supplied it does something similar to WrapperBase,
> except that it uses the first arg exename (or a default if that wasn't
> supplied) in place of program_name().
>
>
> I'm trying to generalize this so StandAloneBlast can just use the
> WrapperBase version (and so other wrappers can then store executable
> paths for different sub-programs). Any suggestions for a good way of
> melding these two together whilst somehow retaining backward  
> compatibility?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From avilella at gmail.com  Tue Apr 24 12:10:19 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 24 Apr 2007 17:10:19 +0100
Subject: [Bioperl-l] lack of markers for some genotypes in some
	Bio::PopGen::Statistics methods
Message-ID: <358f4d650704240910u4c90864cqd6c4e38ecedef4c5@mail.gmail.com>

Hi,

I have some genotype data where some individuals don't have a given marker
in the population.

This means that some methods in Bio::PopGen::Statistics will fail when
trying to get them, so I've added a couple of "next unless (defined($sth));"
around to overcome this. But I am not sure if this breaks any assumption
made when implementing the methods.

Anyone able to check this?

Thanks,

    Albert.

avilella at magneto:~$ diff -u
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm.modif
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm
---
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm.modif
2007-04-24 15:05:51.000000000 +0100
+++
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm
2007-04-22 16:03:24.000000000 +0100
@@ -546,7 +546,6 @@
        # separate genotypes into 'chromosomes'
        for my $marker_name( @marker_names ) {
           my ($genotype) = $ind->get_Genotypes(-marker => $marker_name);
-           next unless defined($genotype); #FIXME -- is this correct?
           my $i =0;
           for my $allele ( $genotype->get_Alleles ) {
               push @{$chromosomes[$i]},

avilella at magneto:~$ diff -u
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm.modif
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm
---
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm.modif
2007-04-24 15:04:51.000000000 +0100
+++
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm
2007-04-22 16:03:24.000000000 +0100
@@ -656,8 +656,6 @@
                return 0;
            }
            foreach my $m ( @marker_names ) {
-              my $genotype = $ind->get_Genotypes($m);
-              next unless defined($genotype); #FIXME -- is this correct?
                foreach my $allele (map { $_->get_Alleles}
                               $ind->get_Genotypes($m) ) {
                    $data{$m}->{$allele}++;


From MEC at stowers-institute.org  Thu Apr 26 12:48:45 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 26 Apr 2007 11:48:45 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
Message-ID: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>

Lincoln, et al,

I find that the gff3_string for Bio::DB::SeqFeature objects retreived
from a Bio::DB::SeqFeature::Store that were initially created with
-seqments (i.e. whose location was discontiguous) does not display any
other attributes in column 9 than "Name".

What do you think of the following patch to Bio::Graphics::FeatureBase,
whose effect is to "contrive to return (duplicated) common group values"
(which otherwise get lost when "collapsing" "homogenous" parent/child
features) 

Another approach would be to copy the attributes from the parent to the
children when the -seqments are first created.

Another approach would be to use Bio::SeqFeature::Generic  as the db's
-seqfeature_class and save with -location being a Bio::Location::Split,
but this was wrougth with other problems.

Any other suggestions?  Do you want me to commit this patch?

Cheers,

Malcolm
 
Patch follows:


Index: FeatureBase.pm
===================================================================
RCS file:
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
retrieving revision 1.29
diff -c -r1.29 FeatureBase.pm
*** FeatureBase.pm	16 Apr 2007 19:55:33 -0000	1.29
--- FeatureBase.pm	26 Apr 2007 16:30:23 -0000
***************
*** 581,587 ****
      foreach (@children) { 
        s/Parent=/ID=/g; 
      } # replace Parent tag with ID
!     return join "\n", at children;
    }
  
    return join("\n",$p, at children);
--- 581,589 ----
      foreach (@children) { 
        s/Parent=/ID=/g; 
      } # replace Parent tag with ID
!     #return join "\n", at children;
!     # Instead of above, additionally, contrive to return (duplicated)
common group values
!     return(join("$group\n", at children) . $group);
    }
  
    return join("\n",$p, at children);


From emeric.sevin at univ-rennes1.fr  Thu Apr 26 04:48:37 2007
From: emeric.sevin at univ-rennes1.fr (Emeric Sevin)
Date: Thu, 26 Apr 2007 10:48:37 +0200
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
	<7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>
Message-ID: <4ef54906af35b3cbf231303285527055@univ-rennes1.fr>

hi! sorry for the delay, took a little vacation ;-)

indeed I don't see any trouble in coding a supplementary test, I'm just 
not at all familiar with the patch release/bioperl package update and 
would prefer leave that to you. For that purpouse I'll take care of 
that bug post in the coming hours!
Thank you very much
Emeric

Le 13 avr. 07, ? 22:13, Jason Stajich a ?crit :

> I think it just needs an edit the code in the to_string which checks
> for the type of algorithm.  You'd need to add to the if/elsif cascade
> and add something for the RPSBLAST type and codes the query and
> target dbs and query and target sequence types properly.  This would
> be very trivial to code in, have you tried adding this to see if it
> works?
>
> if you submit a bug with and example report we'd be able to make
> appropriate changes faster.
>
> -jason
> On Apr 11, 2007, at 6:32 AM, Emeric Sevin wrote:
>
>> Hi everybody,
>>
>> I'm sorry to bug, but either I missed something so obvious nobody
>> bothered to answer, either I'm being a little boycotted here...
>> A little help would be very much appreciated
>>
>> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>>
>>> Hello,
>>>
>>> I am new to this community, and apologize if this subject has been
>>> posted before.
>>>
>>> I want to print out only selected results from a multiple blast-
>>> alignments results file. Problem is, the algorithm used is
>>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the
>>> actual writing task yields "unclean" warnings. Although an ouput
>>> is actually written, the writer
>>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by
>>> the fact rpsblast DBs are not labeled with
>>> "protein"/"nucleic"/"translated".
>>> Does anybody know of an easy fix to that bug, or of another way to
>>> come around it?
>>>
>>> Thank you very much
>>>
>>> Emeric SEVIN
>>> Universit? de Rennes 1_______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From budd at embl-heidelberg.de  Thu Apr 26 06:18:11 2007
From: budd at embl-heidelberg.de (Aidan Budd)
Date: Thu, 26 Apr 2007 12:18:11 +0200 (CEST)
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
Message-ID: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>

Hi Bioperlers,

I'm trying to parse a FASTA search output file (see attached .out file) 
using Bioperl 1.4. My Bioperl installation has otherwise been working 
fine, however I currently get the following error when running a simple 
script that attempts to access result from this outfile via bioperl.

Is this a problem with the parser?
Or have I executed FASTA wrongly creating output that isn't covered by the 
parser?

Any suggestions on how to deal with this much appreciated.

Best wishes,

Aidan

Script:

#!/usr/bin/perl -w
$^W=1;
use strict;
use Bio::SearchIO;

my $fasta_report = new Bio::SearchIO ('-format' => 'fasta',
                                      '-file'   => $ARGV[0]);
                                      
my $result = $fasta_report->next_result();            

Errors:

Use of uninitialized value in concatenation (.) or string at 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/GenericHSP.pm 
line 231, <GEN3> line 47.

------------- EXCEPTION  -------------
MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm 
FASTP -score 62.4 -hit_frame 0 -hsp_length 180 -hit_seq  -hit_length 0 
-query_length 128 -query_frame 0 -swscore 122 -rank 1 -query_seq 
GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS--PALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD-LYCHKSD 
-homology_seq                              
MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKRAKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RKCSL...LL.SVNL.K.ADHE.A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTMEPATLSPKSMR 
-hit_name YFL031W -bits 19.4 -query_name CREB1_MONKEY -evalue 1.1 (qs='
STACK Bio::Search::HSP::GenericHSP::new 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/GenericHSP.pm:231
STACK Bio::Search::HSP::FastaHSP::new 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/FastaHSP.pm:97
STACK Bio::Factory::ObjectFactory::create_object 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Factory/ObjectFactory.pm:150
STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/SearchResultEventBuilder.pm:275
STACK Bio::SearchIO::fasta::end_element 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/fasta.pm:872
STACK Bio::SearchIO::fasta::next_result 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/fasta.pm:403
STACK toplevel 
/Users/budd/scripts/test_scripts/test_parsing_fasta_output.pl:22

--------------------------------------

-- 
----------------------------------------------------------------------
Aidan Budd, PhD                               tel:+49 (0)6221 387 8530
EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
Meyerhofstr. 1, 69117 Heidelberg, Germany

URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html
-------------- next part --------------
# fasta34 -m 2 creb1_human.fasta yeast_bzips_from_ensembl.fasta
FASTA searches a protein or DNA sequence data bank
 version 34.26 January 12, 2007
Please cite:
 W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

Query library creb1_human.fasta vs yeast_bzips_from_ensembl.fasta library
searching yeast_bzips_from_ensembl.fasta library

  1>>>CREB1_MONKEY 341 aa - 341 aa
 vs  yeast_bzips_from_ensembl.fasta library

   3683 residues in    10 sequences
 MLE_cen statistics: Lambda= 0.0338;  K=8.757e-05 (cen=0)

FASTA (3.5 Sept 2006) function [optimized, BL50 matrix (15:-5)] ktup: 2
 join: 37, opt: 25, open/ext: -10/-2, width:  16
 Scan time:  0.000
The best scores are:                                      opt bits E(10)
YFL031W                                            ( 238)  122 19.4     1.1
YEL009C                                            ( 281)  121 19.4     1.3
YIL036W                                            ( 587)  129 19.8       2
YIR017C                                            ( 187)   83 17.5     2.9
YVNL167C                                           ( 647)  119 19.3     2.9
YIR018W                                            ( 245)   67 16.7     5.3
YER045C                                            ( 489)   73 17.0     7.1
YDR259C                                            ( 383)   62 16.5     7.5
YOR028C                                            ( 296)   41 15.5     8.9
YHL009C                                            ( 330)   33 15.1     9.6

>>YFL031W                                                 (238 aa)
 initn: 107 init1: 107 opt: 122  Z-score: 62.4  bits: 19.4 E():  1.1
Smith-Waterman score: 122;  27.660% identity (63.830% similar) in 94 aa overlap (248-337:2-95)

       220       230       240       250       260       270       
CREB1_ GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS--PALP
YFL031                              MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKR

         280       290       300       310       320        330    
CREB1_ TQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD
YFL031 AKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RKCSL...LL.SVNL.K.ADHE.

           340                                                     
CREB1_ -LYCHKSD                                                    
YFL031 A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTMEPATLSPKSMR

>>YEL009C                                                 (281 aa)
 initn: 138 init1:  83 opt: 121  Z-score: 60.8  bits: 19.4 E():  1.3
Smith-Waterman score: 121;  29.412% identity (55.462% similar) in 119 aa overlap (219-335:165-277)

      190       200       210       220       230       240        
CREB1_ GAIQLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGD
YEL009 VSLADKAIESTEEVSLVPSNLEVSTTSFLP.PV.ED.KL.QTRKVKK.NS--..KKSHHV

      250       260       270         280       290       300      
CREB1_ VQTYQIRTAPTSTIAPGVVMASSPALPTQP--AEEAARKREVRLMKNREAARECRRKKKE
YEL009 GKDDES.LDHLGVV.YNRKQR.I.LS.IV.ESSDP..L..----AR.T....RS.AR.LQ

        310       320       330       340 
CREB1_ YVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
YEL009 RM.Q..DK.EE.LSK.YH.EN.VAR..K.VGER  

>>YIL036W                                                 (587 aa)
 initn: 132 init1:  70 opt: 129  Z-score: 57.2  bits: 19.8 E():    2
Smith-Waterman score: 129;  18.750% identity (55.682% similar) in 352 aa overlap (2-335:137-477)

                                            10        20           
CREB1_                              MTMESGAENQQSGDAAVTEAENQQM--TVQA
YIL036 RVVKPSANSNYQQAAYLRQQQQQDQRQQSPS.KTEE.S.LY..ILMNSGVV.D.HQNLAT

      30        40        50        60        70        80         
CREB1_ QPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQSS
YIL036 HTNLSQ.SSTRKS.PNDSTT...-NASNIA.--.AS.NKQMYFMNMNMNNN.HALNDP.I

      90         100       110         120       130       140     
CREB1_ CKDLKRLFS--GTQISTIAESEDS--QESVDSVTDSQKRREILSRRPSYRKILNDL----
YIL036 LET.SPF.QPF.VDVAHLPMTNPPIF.S.LPGCDEPIR..R.SISNGQISQLGE.IETLE

                150       160          170        180       190    
CREB1_ ---SSDAPGVPRIEEEKSEEET---SAPAITTVTVP-TPIYQTSSGQYIAITQGGAIQLA
YIL036 NLHNTQP.PM.NFHNYNGLSQ.RNV.NKPVFNQA..VSS.P.YNAKKV.NP.KDS.--.G

          200       210       220       230       240       250    
CREB1_ NNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQI
YIL036 DQSVIYSKSQ.RNFVNAPSKNT.AES.----SDLE.MTTFA.TTGGENRGK.ALRESHSN

           260       270       280       290       300       310   
CREB1_ RT-APTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLEN
YIL036 PSFT.K.QGSHLNLA.NTQGN.I-.GT-T.W..ARL.ER..I..SK..QR..VAQLQ.QK

           320       330       340                                 
CREB1_ RVAVLENQNKTLIEELKALKDLYCHKSD                                
YIL036 EFNEIKDE.RI.LKK.NYYEK.ISKFKKFSKIHLREHEKLNKDSDNNVNGTNSSNKNESM

>>YIR017C                                                 (187 aa)
 initn:  43 init1:  43 opt:  83  Z-score: 54.0  bits: 17.5 E():  2.9
Smith-Waterman score: 84;  22.785% identity (56.962% similar) in 158 aa overlap (176-330:9-148)

         150       160       170       180       190       200     
CREB1_ PGVPRIEEEKSEEETSAPAITTVTVPTPIYQTSSGQYIAITQGGAIQLANNGTDGVQGLQ
YIR017                       MSAKQGWEKK.TNID..SRK.MNV---..LSEHL.N.I

         210       220       230       240        250       260    
CREB1_ TLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASG-DVQTYQIRTAPTS--TI
YIR017 S------SDSEL.SRL.SLLLVSS.N-----AEELISMINN.Q..SQFKKLRE.RKGKVA

            270       280       290       300       310       320  
CREB1_ APGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQN
YIR017 .TTA.VVKEEEA.VSTSN.LDKIKQE.RR..T..SQRF.IR..Q--.NF..-MNK.Q.L.

            330       340                             
CREB1_ KTLIEELKALKDLYCHKSD                            
YIR017 -.Q.NK.RDRIEQLNKENEFWKAKLNDINEIKSLKLLNDIKRRNMGR

>>YVNL167C                                                (647 aa)
 initn: 142 init1: 119 opt: 119  Z-score: 53.8  bits: 19.3 E():  2.9
Smith-Waterman score: 119;  39.623% identity (62.264% similar) in 53 aa overlap (280-332:426-478)

     250       260       270       280       290       300         
CREB1_ QTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVK
YVNL16 RKNSAVTTAPAQKDDVENNKISNNVTLDEN..QE...KEF.ER..V..SKF.KR....I.

     310       320       330       340                             
CREB1_ CLENRVAVLENQNKTLIEELKALKDLYCHKSD                            
YVNL16 KI..DLQFY.SEYDD.TQVIGK.CGIIPSSSSNSQFNVNVSTPSSSSPPSTSLIALLESS

>>YIR018W                                                 (245 aa)
 initn:  61 init1:  61 opt:  67  Z-score: 47.6  bits: 16.7 E():  5.3
Smith-Waterman score: 67;  25.455% identity (61.818% similar) in 55 aa overlap (280-334:55-109)

     250       260       270       280       290       300         
CREB1_ QTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVK
YIR018 SKNWKLPPRLPHRAAQRRKRVHRLHEDYET..NDEELQKKKRQ..D.Q.AY.ER.NNKLQ

     310       320       330       340                             
CREB1_ CLENRVAVLENQNKTLIEELKALKDLYCHKSD                            
YIR018 V..ETIES.SKVV.NYETK.NR.QNELQAKESENHALKQKLETLTLKQASVPAQDPILQN

>>YER045C                                                 (489 aa)
 initn: 111 init1:  70 opt:  73  Z-score: 43.8  bits: 17.0 E():  7.1
Smith-Waterman score: 97;  22.826% identity (67.391% similar) in 92 aa overlap (3-92:210-300)

                                           10        20         30 
CREB1_                             MTMESGAENQQSGDAAVTEAE-NQQMTVQAQP
YER045 QTGSKNIYAAMTPYDSNIKLNIPAVAATCDIP.ATPSIP...STMNQ.YI.M.LRL...M

              40        50        60         70        80        90
CREB1_ QIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGV-IQAAQPSVIQSPQVQTVQSSC
YER045 .TKAWKNAQL-NV.PCTP.SNSSVSSSSSC.NIND.NIEN.SVHS.ISHGVNHH..NN..

              100       110       120       130       140       150
CREB1_ KDLKRLFSGTQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPR
YER045 QNAELNISSSLPYESKCPDVNLTHANSKPQYKDATSALKNNINSEKDVHTAPFSSMHTTA

>>YDR259C                                                 (383 aa)
 initn:  84 init1:  52 opt:  62  Z-score: 42.8  bits: 16.5 E():  7.5
Smith-Waterman score: 81;  33.333% identity (64.583% similar) in 48 aa overlap (289-330:227-274)

      260       270       280       290       300       310        
CREB1_ TSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVL
YDR259 NDNNDNVTKPVPDKDTQLISSSGKTLRNTR.AAQ..T.QKAF.QR.EK.I.N..QKSKIF

           320        330       340                                
CREB1_ -----ENQN-KTLIEELKALKDLYCHKSD                               
YDR259 DDLLA..N.F.S.NDS.RNDNNILIAQHEAIRNAITMLRSEYDVLCNENNMLKNENSIIK

>>YOR028C                                                 (296 aa)
 initn:  35 init1:  35 opt:  41  Z-score: 39.3  bits: 15.5 E():  8.9
Smith-Waterman score: 80;  33.962% identity (66.038% similar) in 53 aa overlap (289-334:243-295)

      260       270       280       290       300       310        
CREB1_ TSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVL
YOR028 LSEQVFNEGERYNNDGQLIGKTGKPLRNTK.AAQ..S.QKAF.QRREK.I.N..EKSKLF

           320        330        340 
CREB1_ -----ENQN-KTLIEELKA-LKDLYCHKSD
YOR028 DGLMK..SEL.KM..S..SK..E*      

>>YHL009C                                                 (330 aa)
 initn:  33 init1:  33 opt:  33  Z-score: 36.4  bits: 15.1 E():  9.6
Smith-Waterman score: 91;  21.667% identity (57.500% similar) in 120 aa overlap (222-333:79-194)

             200       210       220       230             240     
CREB1_ QLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQI-LVP-----SNQVVVQAA
YHL009 EQTAPFPILEDQCPALNLDRSNNDLLLQNNISFPKGS.L.A.Q.T.ISGDY.TY.MADNN

         250         260       270       280       290       300   
CREB1_ SGDVQTYQIRT--APTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRK
YHL009 NN.NDS.SNTNYFSKNNG.S.SSRSP.VAHNENV.DDSK.K.KA----Q..A.QKAF.ER

           310       320       330       340                       
CREB1_ KKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD                      
YHL009 .EARM.E.QDKLLES.RNRQS.LK.IEE.RKANTEINAENRLLLRSGNENFSKDIEDDTN


341 residues in 1 query   sequences
3683 residues in 10 library sequences
 Scomplib [34.26]
 start: Thu Apr 26 11:52:16 2007 done: Thu Apr 26 11:52:16 2007
 Total Scan time:  0.000 Total Display time:  0.010

Function used was FASTA [version 34.26 January 12, 2007]
-------------- next part --------------
>CREB1_MONKEY
MTMESGAENQQSGDAAVTEAENQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPN
GQTVQVHGVIQAAQPSVIQSPQVQTVQSSCKDLKRLFSGTQISTIAESEDSQESVDSVTD
SQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETSAPAITTVTVPTPIYQTSSG
QYIAITQGGAIQLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQV
VVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAAREC
RRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
-------------- next part --------------
>YIL036W
MFTGQEYHSVDSNSNKQKDNNKRGIDDTSKILNNKIPHSVSDTSAAATTTSTMNNSALSR
SLDPTDINYSTNMAGVVDQIHDYTTSNRNSLTPQYSIAAGNVNSHDRVVKPSANSNYQQA
AYLRQQQQQDQRQQSPSMKTEEESQLYGDILMNSGVVQDMHQNLATHTNLSQLSSTRKSA
PNDSTTAPTNASNIANTASVNKQMYFMNMNMNNNPHALNDPSILETLSPFFQPFGVDVAH
LPMTNPPIFQSSLPGCDEPIRRRRISISNGQISQLGEDIETLENLHNTQPPPMPNFHNYN
GLSQTRNVSNKPVFNQAVPVSSIPQYNAKKVINPTKDSALGDQSVIYSKSQQRNFVNAPS
KNTPAESISDLEGMTTFAPTTGGENRGKSALRESHSNPSFTPKSQGSHLNLAANTQGNPI
PGTTAWKRARLLERNRIAASKCRQRKKVAQLQLQKEFNEIKDENRILLKKLNYYEKLISK
FKKFSKIHLREHEKLNKDSDNNVNGTNSSNKNESMTVDSLKIIEELLMIDSDVTEVDKDT
GKIIAIKHEPYSQRFGSDTDDDDIDLKPVEGGKDPDNQSLPNSEKIK
>YIR017C
MSAKQGWEKKSTNIDIASRKGMNVNNLSEHLQNLISSDSELGSRLLSLLLVSSGNAEELI
SMINNGQDVSQFKKLREPRKGKVAATTAVVVKEEEAPVSTSNELDKIKQERRRKNTEASQ
RFRIRKKQKNFENMNKLQNLNTQINKLRDRIEQLNKENEFWKAKLNDINEIKSLKLLNDI
KRRNMGR
>YVNL167C
MSSEERSRQPSTVSTFDLEPNPFEQSFASSKKALSLPGTISHPSLPKELSRNNSTSTITQ
HSQRSTHSLNSIPEENGNSTVTDNSNHNDVKKDSPSFLPGQQRPTIISPPILTPGGSKRL
PPLLLSPSILYQANSTTNPSQNSHSVSVSNSNPSAIGVSSTSGSLYPNSSSPSGTSLIRQ
PRNSNVTTSNSGNGFPTNDSQMPGFLLNLSKSGLTPNESNIRTGLTPGILTQSYNYPVLP
SINKNTITGSKNVNKSVTVNGSIENHPHVNIMHPTVNGTPLTPGLSSLLNLPSTGVLANP
VFKSTPTTNTTDGTVNNSISNSNFSPNTSTKAAVKMDNPAEFNAIEHSAHNHKENENLTT
QIENNDQFNNKTRKRKRRMSSTSSTSKASRKNSISRKNSAVTTAPAQKDDVENNKISNNV
TLDENEEQERKRKEFLERNRVAASKFRKRKKEYIKKIENDLQFYESEYDDLTQVIGKLCG
IIPSSSSNSQFNVNVSTPSSSSPPSTSLIALLESSISRSDYSSAMSVLSNMKQLICETNF
YRRGGKNPRDDMDGQEDSFNKDTNVVKSENAGYPSVNSRPIILDKKYSLNSGANISKSNT
TTNNVGNSAQNIINSCYSVTNPLVINANSDTHDTNKHDVLSTLPHNN
>YER045C
MDYKHNFATSPDSFLDGRQNPLLYTDFLSSNKELIYKQPSGPGLVDSAYNFHHQNSLHDR
SVQENLGPMFQPFGVDISHLPITNPPIFQSSLPAFDQPVYKRRISISNGQISQLGEDLET
VENLYNCQPPILSSKAQQNPNPQQVANPSAAIYPSFSSNELQNVPQPHEQATVIPEAAPQ
TGSKNIYAAMTPYDSNIKLNIPAVAATCDIPSATPSIPSGDSTMNQAYINMQLRLQAQMQ
TKAWKNAQLNVHPCTPASNSSVSSSSSCQNINDHNIENQSVHSSISHGVNHHTVNNSCQN
AELNISSSLPYESKCPDVNLTHANSKPQYKDATSALKNNINSEKDVHTAPFSSMHTTATF
QIKQEARPQKIENNTAGLKDGAKAWKRARLLERNRIAASKCRQRKKMSQLQLQREFDQIS
KENTMMKKKIENYEKLVQKMKKISRLHMQECTINGGNNSYQSLQNKDSDVNGFLKMIEEM
IRSSSLYDE
>YIR018W
MALPLIKPKESEESHLALLSKIHVSKNWKLPPRLPHRAAQRRKRVHRLHEDYETEENDEE
LQKKKRQNRDAQRAYRERKNNKLQVLEETIESLSKVVKNYETKLNRLQNELQAKESENHA
LKQKLETLTLKQASVPAQDPILQNLIENFKPMKAIPIKYNTAIKRHQHSTELPSSVKCGF
CNDNTTCVCKELETDHRKSDDGVATEQKDMSMPHAECNNKDNPNGLCSNCTNIDKSCIDI
RSIIH
>YHL009C
MTPSNMDDNTSGFMKFINPQCQEEDCCIRNSLFQEDSKCIKQQPDLLSEQTAPFPILEDQ
CPALNLDRSNNDLLLQNNISFPKGSDLQAIQLTPISGDYSTYVMADNNNNDNDSYSNTNY
FSKNNGISPSSRSPSVAHNENVPDDSKAKKKAQNRAAQKAFRERKEARMKELQDKLLESE
RNRQSLLKEIEELRKANTEINAENRLLLRSGNENFSKDIEDDTNYKYSFPTKDEFFTSMV
LESKLNHKGKYSLKDNEIMKRNTQYTDEAGRHVLTVPATWEYLYKLSEERDFDVTYVMSK
LQGQECCHTHGPAYPRSLIDFLVEEATLNE
>YOR028C
MLMQIKMDNHPFNFQPILASHSMTRDSTKPKKMTDTAFVPSPPVGFIKEENKADLHTISV
VASNVTLPQIQLPKIATLEEPGYESRTGSLTDLSGRRNSVNIGALCEDVPNTAGPHIARP
VTINNLIPPSLPRLNTYQLRPQLSDTHLNCHFNSNPYTTASHAPFESSYTTASTFTSQPA
ASYFPSNSTPATRKNSATTNLPSEERRRVSVSLSEQVFNEGERYNNDGQLIGKTGKPLRN
TKRAAQNRSAQKAFRQRREKYIKNLEEKSKLFDGLMKENSELKKMIESLKSKLKE*
>YEL009C
MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD
TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW
TSLFDNDIPVTTDDVSLADKAIESTEEVSLVPSNLEVSTTSFLPTPVLEDAKLTQTRKVK
KPNSVVKKSHHVGKDDESRLDHLGVVAYNRKQRSIPLSPIVPESSDPAALKRARNTEAAR
RSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
>YDR259C
MQNPPLIRPDMYNQGSSSMATYNASEKNLNEHPSPQIAQPSTSQKLPYRINPTTTNGDTD
ISVNSNPIQPPLPNLMHLSGPSDYRSMHQSPIHPSYIIPPHSNERKQSASYNRPQNAHVS
IQPSVVFPPKSYSISYAPYQINPPLPNGLPNQSISLNKEYIAEEQLSTLPSRNTSVTTAP
PSFQNSADTAKNSADNNDNNDNVTKPVPDKDTQLISSSGKTLRNTRRAAQNRTAQKAFRQ
RKEKYIKNLEQKSKIFDDLLAENNNFKSLNDSLRNDNNILIAQHEAIRNAITMLRSEYDV
LCNENNMLKNENSIIKNEHNMSRNENENLKLENKRFHAEYIRMIEDIENTKRKEQEQRDE
IEQLKKKIRSLEEIVGRHSDSAT
>YFL031W
MEMTDFELTSNSQSNLAIPTNFKSTLPPRKRAKTKEEKEQRRIERILRNRRAAHQSREKK
RLHLQYLERKCSLLENLLNSVNLEKLADHEDALTCSHDAFVASLDEYRDFQSTRGASLDT
RASSHSSSDTFTPSPLNCTMEPATLSPKSMRDSASDQETSWELQMFKTENVPESTTLPAV
DNNNLFDAVASPLADPLCDDIAGNSLPFDNSIDLDNWRNPEAQSGLNSFELNDFFITS

From jason at bioperl.org  Thu Apr 26 15:27:24 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Apr 2007 12:27:24 -0700
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
In-Reply-To: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
References: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
Message-ID: <7C782DA2-5A80-413A-9B5A-94EEBEA9EF6E@bioperl.org>

Unfortunately there are some changes in the FASTA output in that  
version. The latest version of Bioperl 1.52 can handle it though so  
you'll need to upgrade Bioperl.

-jason
On Apr 26, 2007, at 3:18 AM, Aidan Budd wrote:

> Hi Bioperlers,
>
> I'm trying to parse a FASTA search output file (see attached .out  
> file)
> using Bioperl 1.4. My Bioperl installation has otherwise been working
> fine, however I currently get the following error when running a  
> simple
> script that attempts to access result from this outfile via bioperl.
>
> Is this a problem with the parser?
> Or have I executed FASTA wrongly creating output that isn't covered  
> by the
> parser?
>
> Any suggestions on how to deal with this much appreciated.
>
> Best wishes,
>
> Aidan
>
> Script:
>
> #!/usr/bin/perl -w
> $^W=1;
> use strict;
> use Bio::SearchIO;
>
> my $fasta_report = new Bio::SearchIO ('-format' => 'fasta',
>                                       '-file'   => $ARGV[0]);
>
> my $result = $fasta_report->next_result();
>
> Errors:
>
> Use of uninitialized value in concatenation (.) or string at
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> GenericHSP.pm
> line 231, <GEN3> line 47.
>
> ------------- EXCEPTION  -------------
> MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm
> FASTP -score 62.4 -hit_frame 0 -hsp_length 180 -hit_seq  -hit_length 0
> -query_length 128 -query_frame 0 -swscore 122 -rank 1 -query_seq
> GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS-- 
> PALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD- 
> LYCHKSD
> -homology_seq
> MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKRAKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RK 
> CSL...LL.SVNL.K.ADHE.A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTM 
> EPATLSPKSMR
> -hit_name YFL031W -bits 19.4 -query_name CREB1_MONKEY -evalue 1.1  
> (qs='
> STACK Bio::Search::HSP::GenericHSP::new
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> GenericHSP.pm:231
> STACK Bio::Search::HSP::FastaHSP::new
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> FastaHSP.pm:97
> STACK Bio::Factory::ObjectFactory::create_object
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Factory/ 
> ObjectFactory.pm:150
> STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> SearchResultEventBuilder.pm:275
> STACK Bio::SearchIO::fasta::end_element
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> fasta.pm:872
> STACK Bio::SearchIO::fasta::next_result
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> fasta.pm:403
> STACK toplevel
> /Users/budd/scripts/test_scripts/test_parsing_fasta_output.pl:22
>
> --------------------------------------
>
> -- 
> ----------------------------------------------------------------------
> Aidan Budd, PhD                               tel:+49 (0)6221 387 8530
> EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
> Meyerhofstr. 1, 69117 Heidelberg, Germany
>
> URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html
> <creb_vs_yeast_manual_fasta_changed_infile_formats.out>
> <creb1_human.fasta>
> <yeast_bzips_from_ensembl.fasta>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From dmessina at wustl.edu  Thu Apr 26 15:42:02 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 26 Apr 2007 14:42:02 -0500
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
In-Reply-To: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
References: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
Message-ID: <D41F5BDD-B992-4787-91C5-732B41683908@wustl.edu>

Hi Aidan,

Bioperl 1.4 is ~3 years old now, and FASTA output has probably  
changed since then. Your code should work if you install Bioperl  
1.5.2, the latest release.

	http://www.bioperl.org/wiki/Installing_BioPerl

Please let us know if that doesn't solve the problem.

Dave


From gopu_36 at yahoo.com  Thu Apr 26 21:29:03 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Thu, 26 Apr 2007 18:29:03 -0700 (PDT)
Subject: [Bioperl-l] check for the continous segments to extract the
	sequences
Message-ID: <10211951.post@talk.nabble.com>


As a newbee to programming, thx for the support from this group. Please
ignore the message if this message is not relevant to this group as my
problem may be a typical computer science recursive one! (as I am not aware)

I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000, 5001, 6000,
6001, 7000, 7001, 8000, 12001, 13000);
The above array gives the posiiton of sequences like '1' shows the start
position and the second element '1000' gives the end of the sequence and so
on. All the even positions like 0,2,4... shows the starting points of the
sequence and odd positions like 1000, 2000, 5000 gives the END positions of
the sequences to be retrieved. basically I have to see whwther any continous
segments lie in the list and add them together to form a one whole chunk.
For example 1-1000 and 1001-2000 can be joined together to extract sequences
from 1-2000. In the same way 4001-8000 should be extracted and 12001-13000
and so on. As I said earlier, after checking the position, I will be able to
extract that part of sequence from a whole genome. Thanks for taking ur
time. Any tip or help would be greatly appreciated.

Regards
Gopu 
-- 
View this message in context: http://www.nabble.com/check-for-the-continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Thu Apr 26 21:54:59 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Apr 2007 18:54:59 -0700
Subject: [Bioperl-l] check for the continous segments to extract the
	sequences
In-Reply-To: <10211951.post@talk.nabble.com>
References: <10211951.post@talk.nabble.com>
Message-ID: <EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>

You want a connectivity algorithm.  One can be found on perlmonks.org  
as well as in Bio::Search::SearchUtils the method collapse_nums().  
You'll have to modify aspects of it to deal with ranges.

Good luck.
-jason
On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:

>
> As a newbee to programming, thx for the support from this group.  
> Please
> ignore the message if this message is not relevant to this group as my
> problem may be a typical computer science recursive one! (as I am  
> not aware)
>
> I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,  
> 5001, 6000,
> 6001, 7000, 7001, 8000, 12001, 13000);
> The above array gives the posiiton of sequences like '1' shows the  
> start
> position and the second element '1000' gives the end of the  
> sequence and so
> on. All the even positions like 0,2,4... shows the starting points  
> of the
> sequence and odd positions like 1000, 2000, 5000 gives the END  
> positions of
> the sequences to be retrieved. basically I have to see whwther any  
> continous
> segments lie in the list and add them together to form a one whole  
> chunk.
> For example 1-1000 and 1001-2000 can be joined together to extract  
> sequences
> from 1-2000. In the same way 4001-8000 should be extracted and  
> 12001-13000
> and so on. As I said earlier, after checking the position, I will  
> be able to
> extract that part of sequence from a whole genome. Thanks for  
> taking ur
> time. Any tip or help would be greatly appreciated.
>
> Regards
> Gopu
> -- 
> View this message in context: http://www.nabble.com/check-for-the- 
> continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From MEC at stowers-institute.org  Fri Apr 27 09:52:10 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 27 Apr 2007 08:52:10 -0500
Subject: [Bioperl-l] check for the continous segments to extract
	thesequences
In-Reply-To: <EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>
References: <10211951.post@talk.nabble.com>
	<EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>
Message-ID: <CED81D34E37D5043A1211565277A51E507E22F28@exchkc02.stowers-institute.org>

Gopu/Jason,

Another option is Set::IntSpan, available on CPAN at
http://search.cpan.org/~swmcd/Set-IntSpan-1.11/IntSpan.pm

Here's a perl one-liner that shows you how easy it is:

perl -MSet::IntSpan -e 'my @array = ( 1, 1000, 1001, 2000, 4001, 5000,
5001, 6000, 6001, 7000, 7001, 8000, 12001, 13000); my $is =
Set::IntSpan->new;  while (@array) {$is->U(shift(@array) . "-" .
shift(@array))}; print $is;'
1-2000,4001-8000,12001-13000

I use it all the time to great effect and have utility functions that
convert between bioperl split locations and IntSpans.

There is another module which extends it nicely, Set::IntSpan::Island,
worth a gander.

Cheers,

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Thursday, April 26, 2007 8:55 PM
> To: gopu_36
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] check for the continous segments to 
> extract thesequences
> 
> You want a connectivity algorithm.  One can be found on 
> perlmonks.org  
> as well as in Bio::Search::SearchUtils the method collapse_nums().  
> You'll have to modify aspects of it to deal with ranges.
> 
> Good luck.
> -jason
> On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:
> 
> >
> > As a newbee to programming, thx for the support from this group.  
> > Please
> > ignore the message if this message is not relevant to this 
> group as my
> > problem may be a typical computer science recursive one! (as I am  
> > not aware)
> >
> > I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,  
> > 5001, 6000,
> > 6001, 7000, 7001, 8000, 12001, 13000);
> > The above array gives the posiiton of sequences like '1' shows the  
> > start
> > position and the second element '1000' gives the end of the  
> > sequence and so
> > on. All the even positions like 0,2,4... shows the starting points  
> > of the
> > sequence and odd positions like 1000, 2000, 5000 gives the END  
> > positions of
> > the sequences to be retrieved. basically I have to see whwther any  
> > continous
> > segments lie in the list and add them together to form a one whole  
> > chunk.
> > For example 1-1000 and 1001-2000 can be joined together to extract  
> > sequences
> > from 1-2000. In the same way 4001-8000 should be extracted and  
> > 12001-13000
> > and so on. As I said earlier, after checking the position, I will  
> > be able to
> > extract that part of sequence from a whole genome. Thanks for  
> > taking ur
> > time. Any tip or help would be greatly appreciated.
> >
> > Regards
> > Gopu
> > -- 
> > View this message in context: http://www.nabble.com/check-for-the- 
> > continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From lstein at cshl.edu  Fri Apr 27 13:44:59 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 27 Apr 2007 13:44:59 -0400
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>

Hi Malcom,

This is absolutely ok and you can go ahead and commit. Thanks for figuring
this out!

Lincoln

On 4/26/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, et al,
>
> I find that the gff3_string for Bio::DB::SeqFeature objects retreived
> from a Bio::DB::SeqFeature::Store that were initially created with
> -seqments (i.e. whose location was discontiguous) does not display any
> other attributes in column 9 than "Name".
>
> What do you think of the following patch to Bio::Graphics::FeatureBase,
> whose effect is to "contrive to return (duplicated) common group values"
> (which otherwise get lost when "collapsing" "homogenous" parent/child
> features)
>
> Another approach would be to copy the attributes from the parent to the
> children when the -seqments are first created.
>
> Another approach would be to use Bio::SeqFeature::Generic  as the db's
> -seqfeature_class and save with -location being a Bio::Location::Split,
> but this was wrougth with other problems.
>
> Any other suggestions?  Do you want me to commit this patch?
>
> Cheers,
>
> Malcolm
>
> Patch follows:
>
>
>
>
> Index: FeatureBase.pm
> ===================================================================
> RCS file:
> /home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
> retrieving revision 1.29
> diff -c -r1.29 FeatureBase.pm
> *** FeatureBase.pm      16 Apr 2007 19:55:33 -0000      1.29
> --- FeatureBase.pm      26 Apr 2007 16:30:23 -0000
> ***************
> *** 581,587 ****
>       foreach (@children) {
>         s/Parent=/ID=/g;
>       } # replace Parent tag with ID
> !     return join "\n", at children;
>     }
>
>     return join("\n",$p, at children);
> --- 581,589 ----
>       foreach (@children) {
>         s/Parent=/ID=/g;
>       } # replace Parent tag with ID
> !     #return join "\n", at children;
> !     # Instead of above, additionally, contrive to return (duplicated)
> common group values
> !     return(join("$group\n", at children) . $group);
>     }
>
>     return join("\n",$p, at children);
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From MEC at stowers-institute.org  Fri Apr 27 14:45:04 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 27 Apr 2007 13:45:04 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
	<6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E507E22F59@exchkc02.stowers-institute.org>

Hi Lincoln,
 
Cool.
 
The principal of what I figured out I still think holds but the
implementation is slightly broke.  Improved patch forthoming next week.
 

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Friday, April 27, 2007 12:45 PM
	To: Cook, Malcolm
	Cc: lstein at cshl.org; bioperl list
	Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Malcom,
	
	This is absolutely ok and you can go ahead and commit. Thanks
for figuring this out!
	
	Lincoln
	
	
	On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

		Lincoln, et al,
		
		I find that the gff3_string for Bio::DB::SeqFeature
objects retreived 
		from a Bio::DB::SeqFeature::Store that were initially
created with
		-seqments (i.e. whose location was discontiguous) does
not display any
		other attributes in column 9 than "Name".
		
		What do you think of the following patch to
Bio::Graphics::FeatureBase, 
		whose effect is to "contrive to return (duplicated)
common group values"
		(which otherwise get lost when "collapsing" "homogenous"
parent/child
		features)
		
		Another approach would be to copy the attributes from
the parent to the 
		children when the -seqments are first created.
		
		Another approach would be to use
Bio::SeqFeature::Generic  as the db's
		-seqfeature_class and save with -location being a
Bio::Location::Split,
		but this was wrougth with other problems. 
		
		Any other suggestions?  Do you want me to commit this
patch?
		
		Cheers,
		
		Malcolm
		
		Patch follows:
		
		
		Index: FeatureBase.pm
	
=================================================================== 
		RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
		retrieving revision 1.29
		diff -c -r1.29 FeatureBase.pm
		*** FeatureBase.pm      16 Apr 2007 19:55:33 -0000
1.29
		--- FeatureBase.pm       26 Apr 2007 16:30:23 -0000
		***************
		*** 581,587 ****
		      foreach (@children) {
		        s/Parent=/ID=/g;
		      } # replace Parent tag with ID
		!     return join "\n", at children;
		    }
		
		    return join("\n",$p, at children);
		--- 581,589 ----
		      foreach (@children) {
		        s/Parent=/ID=/g;
		      } # replace Parent tag with ID
		!     #return join "\n", at children; 
		!     # Instead of above, additionally, contrive to
return (duplicated)
		common group values
		!     return(join("$group\n", at children) . $group);
		    }
		
		    return join("\n",$p, at children); 
		

	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From bernd at kirx.de  Sat Apr 28 10:36:07 2007
From: bernd at kirx.de (Bernd Mueller)
Date: Sat, 28 Apr 2007 16:36:07 +0200
Subject: [Bioperl-l] bioperl::db
Message-ID: <46335BD7.8040306@kirx.de>

Hi,

I followed those instructions on bioperl.org for installing bioperl via 
cpan. But actually it is impossible for me to install the bioperl::db 
module.

How does this work?

Moreover none of these Birney distribution are installable on my system. 
After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always 
fail. So I have to install the CRAFFI bundle but it does not seem that 
Bio::DB module is included in this bundle because my programs using that 
module do not work.

Help would be appreciated :)

Cheers,
Bernd

Appendix:

cpan[6]> d /bioperl/
Distribution    BIRNEY/bioperl-1.2.1.tar.gz
Distribution    BIRNEY/bioperl-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-1.2.3.tar.gz
Distribution    BIRNEY/bioperl-1.2.tar.gz
Distribution    BIRNEY/bioperl-1.4.tar.gz
Distribution    BIRNEY/bioperl-db-0.1.tar.gz
Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-run-1.4.tar.gz
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
12 items found


-- 
Dipl.-Inform.(FH)
Bernd Mueller
phone: +49 179 2336692
email: bernd at kirx.de


From cydeweys at gmail.com  Sun Apr 29 09:43:55 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 09:43:55 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
Message-ID: <4634A11B.6090809@umd.edu>

I'm trying to load up a table of codon usage frequencies I've downloaded
from the web using Bio::CodonUsage::IO.  My code looks like this:

    use Bio::CodonUsage::Table;
    use Bio::CodonUsage::IO;
    # ...
    my $io = Bio::CodonUsage::IO->new(-file=>$freqFile);
    my $codonTable = $io->next_data();

Unfortunately, I can't seem to find any documentation on what format the
codon usage table file is expected to be in, and all of my best guesses
seem to be invalid, yielding the following error message:

-------------------- WARNING ---------------------
MSG: probable parsing error - should be 21 entries for 20aa + stop codon
---------------------------------------------------

I've tried using both formats that are available from the Codon Usage
Database (easily the largest source of codon usage frequencies),
available here: http://www.kazusa.or.jp/codon/

The two formats I've tried and failed look like this:

UUU 32.5( 45732)  UCU 15.3( 21588)  UAU 27.8( 39146)  UGU  6.3(  8796)
UUC 14.3( 20101)  UCC  3.2(  4458)  UAC  9.3( 13016)  UGC  2.1(  2971)
...


AND

AmAcid  Codon      Number    /1000     Fraction   ..

Gly     GGG     13198.00      9.38      0.14
Gly     GGA     34123.00     24.26      0.36
...


So, anyone know how to get this downloaded codon usage data loaded up
into a Bio::CodonUsage::Table object?  Bio::CodonUsage::IO doesn't seem
to like parsing the standard formats.  Thanks.


From cjfields at uiuc.edu  Sun Apr 29 10:05:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 09:05:59 -0500
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <4634A11B.6090809@umd.edu>
References: <4634A11B.6090809@umd.edu>
Message-ID: <469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>

One example file (MmCT) can be found in the test data directory in  
the bioperl distribution (t/data directory) and some tests relevant  
to codon table usage are found in DBCUTG.t.

chris

On Apr 29, 2007, at 8:43 AM, Ben McIlwain wrote:

> I'm trying to load up a table of codon usage frequencies I've  
> downloaded
> from the web using Bio::CodonUsage::IO.  My code looks like this:
>
>     use Bio::CodonUsage::Table;
>     use Bio::CodonUsage::IO;
>     # ...
>     my $io = Bio::CodonUsage::IO->new(-file=>$freqFile);
>     my $codonTable = $io->next_data();
>
> Unfortunately, I can't seem to find any documentation on what  
> format the
> codon usage table file is expected to be in, and all of my best  
> guesses
> seem to be invalid, yielding the following error message:
>
> -------------------- WARNING ---------------------
> MSG: probable parsing error - should be 21 entries for 20aa + stop  
> codon
> ---------------------------------------------------
>
> I've tried using both formats that are available from the Codon Usage
> Database (easily the largest source of codon usage frequencies),
> available here: http://www.kazusa.or.jp/codon/
>
> The two formats I've tried and failed look like this:
>
> UUU 32.5( 45732)  UCU 15.3( 21588)  UAU 27.8( 39146)  UGU  6.3(  8796)
> UUC 14.3( 20101)  UCC  3.2(  4458)  UAC  9.3( 13016)  UGC  2.1(  2971)
> ...
>
>
> AND
>
> AmAcid  Codon      Number    /1000     Fraction   ..
>
> Gly     GGG     13198.00      9.38      0.14
> Gly     GGA     34123.00     24.26      0.36
> ...
>
>
> So, anyone know how to get this downloaded codon usage data loaded up
> into a Bio::CodonUsage::Table object?  Bio::CodonUsage::IO doesn't  
> seem
> to like parsing the standard formats.  Thanks.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cydeweys at gmail.com  Sun Apr 29 10:06:12 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 10:06:12 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
Message-ID: <4634A654.7010708@gmail.com>

Chris Fields wrote:
> One example file (MmCT) can be found in the test data directory in the
> bioperl distribution (t/data directory) and some tests relevant to codon
> table usage are found in DBCUTG.t.

I still get the same warning message even when running on the given test
data?  That doesn't sound right.

-------------------- WARNING ---------------------
MSG: probable parsing error - should be 21 entries for 20aa + stop codon
---------------------------------------------------


From cjfields at uiuc.edu  Sun Apr 29 17:50:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 16:50:15 -0500
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <4634A654.7010708@gmail.com>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
	<4634A654.7010708@gmail.com>
Message-ID: <DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>

Odd, when I run 'perl -I. t/DBCUTG.t' from CVS it works fine.  Of  
course, I am assuming that you are running the latest release (1.5.2).

Could you post a bug report with a script that generates the error?

chris

On Apr 29, 2007, at 9:06 AM, Ben McIlwain wrote:

> Chris Fields wrote:
>> One example file (MmCT) can be found in the test data directory in  
>> the
>> bioperl distribution (t/data directory) and some tests relevant to  
>> codon
>> table usage are found in DBCUTG.t.
>
> I still get the same warning message even when running on the given  
> test
> data?  That doesn't sound right.
>
> -------------------- WARNING ---------------------
> MSG: probable parsing error - should be 21 entries for 20aa + stop  
> codon
> ---------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cydeweys at gmail.com  Sun Apr 29 18:15:32 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 18:15:32 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
	<4634A654.7010708@gmail.com>
	<DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>
Message-ID: <46351904.4070202@gmail.com>

Chris Fields wrote:
> Odd, when I run 'perl -I. t/DBCUTG.t' from CVS it works fine.  Of
> course, I am assuming that you are running the latest release (1.5.2).
> 
> Could you post a bug report with a script that generates the error?

Sorry, it was my mistake.  I had turned off warnings and strict earlier
for debugging purposes and then forgot to turn them back on.  It turns
out I was trying to read in the codon frequencies when the filename was
an uninitialized string variable (I typoed the name).  Whoops.  Now that
I've spelled the variable name correctly, it is working.


From bernd at kirx.de  Sun Apr 29 18:57:53 2007
From: bernd at kirx.de (Bernd Mueller)
Date: Mon, 30 Apr 2007 00:57:53 +0200
Subject: [Bioperl-l] bioperl::db
In-Reply-To: <46335BD7.8040306@kirx.de>
References: <46335BD7.8040306@kirx.de>
Message-ID: <463522F1.2010406@kirx.de>

Hello list,

I figured out my problem. Actually it was because of problems in the 
versioning of bioperl. It is described to figure out the available 
versions of bioperl in CPAN but afterwards it is described to install a 
much higher version wich is not listed as distribution in CPAN. So it 
works fine now. Thanks anyway. Proficiency in reading results in success ;-)

But I have another question: Does anyone know how to retrieve free 
fulltext documents with EUtilities from Pubmed Central? All my queries 
result in a corpora of free and non-free articles.

Thanks and regards,

Bernd


Bernd Mueller wrote:
> Hi,
> 
> I followed those instructions on bioperl.org for installing bioperl via 
> cpan. But actually it is impossible for me to install the bioperl::db 
> module.
> 
> How does this work?
> 
> Moreover none of these Birney distribution are installable on my system. 
> After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always 
> fail. So I have to install the CRAFFI bundle but it does not seem that 
> Bio::DB module is included in this bundle because my programs using that 
> module do not work.
> 
> Help would be appreciated :)
> 
> Cheers,
> Bernd
> 
> Appendix:
> 
> cpan[6]> d /bioperl/
> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    BIRNEY/bioperl-1.2.tar.gz
> Distribution    BIRNEY/bioperl-1.4.tar.gz
> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
> 12 items found
> 
> 

-- 
Dipl.-Inform.(FH)
Bernd Mueller
phone: +49 179 2336692
email: bernd at kirx.de


From cjfields at uiuc.edu  Sun Apr 29 20:16:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:16:11 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
Message-ID: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>

Allen (or anyone),

What is the status of this module?  It requires a module not listed  
in the dependencies (WWW:Mechanize) and has no tests.

chris


From allenday at ucla.edu  Sun Apr 29 20:21:19 2007
From: allenday at ucla.edu (Allen Day)
Date: Sun, 29 Apr 2007 17:21:19 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
Message-ID: <5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>

Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
ago.  I only implemented for a few journals, so it never worked for a
large fraction of publications.  Probably it barely works or does not
work at all now b/c of how the PDF are scraped out of the HTML.  The
publisher sites are always modifying their HTML, presumably trying to
prevent automated download like this.

-Allen

On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Allen (or anyone),
>
> What is the status of this module?  It requires a module not listed
> in the dependencies (WWW:Mechanize) and has no tests.
>
> chris
>


From cjfields at uiuc.edu  Sun Apr 29 20:28:47 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:28:47 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
Message-ID: <29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>

Quick response!  Yep, I've run into this with a few publishers.   
Though they're supposed to have 'permanent' links for those of us who  
like to link to our pubs they frequently change (scary if that's  
their definition of permanent).

Did you want us to remove the code from CVS?

chris

On Apr 29, 2007, at 7:21 PM, Allen Day wrote:

> Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
> ago.  I only implemented for a few journals, so it never worked for a
> large fraction of publications.  Probably it barely works or does not
> work at all now b/c of how the PDF are scraped out of the HTML.  The
> publisher sites are always modifying their HTML, presumably trying to
> prevent automated download like this.
>
> -Allen
>
> On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> Allen (or anyone),
>>
>> What is the status of this module?  It requires a module not listed
>> in the dependencies (WWW:Mechanize) and has no tests.
>>
>> chris
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Apr 29 20:31:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:31:15 -0500
Subject: [Bioperl-l] PMC and EUtilities, was  bioperl::db
In-Reply-To: <463522F1.2010406@kirx.de>
References: <46335BD7.8040306@kirx.de> <463522F1.2010406@kirx.de>
Message-ID: <01DC1D72-8AFA-4C11-8795-D4787506C602@uiuc.edu>

There may be a way to limit the initial query to full text docs from  
esearch, then use the history to retrieve only the XML docs you  
want.  Is that what you mean?

BioPerl-based access to PMC is limited at best.  Bio::DB::EUtilities  
only returns raw PMC XML with no post-processing of raw data (for  
good reason, as EUtilities is meant to be an intermediate step).   
Allen Day's Bio::DB::Biblio::eutils module supposedly allows PMC  
queries.  I'm also pretty sure that PubMedXML != PMC XML, in other  
words the Bio::Biblio XML format parsers may not work on PMC XML.

chris

On Apr 29, 2007, at 5:57 PM, Bernd Mueller wrote:

> Hello list,
>
> I figured out my problem. Actually it was because of problems in the
> versioning of bioperl. It is described to figure out the available
> versions of bioperl in CPAN but afterwards it is described to  
> install a
> much higher version wich is not listed as distribution in CPAN. So it
> works fine now. Thanks anyway. Proficiency in reading results in  
> success ;-)
>
> But I have another question: Does anyone know how to retrieve free
> fulltext documents with EUtilities from Pubmed Central? All my queries
> result in a corpora of free and non-free articles.
>
> Thanks and regards,
>
> Bernd
>
>
> Bernd Mueller wrote:
>> Hi,
>>
>> I followed those instructions on bioperl.org for installing  
>> bioperl via
>> cpan. But actually it is impossible for me to install the bioperl::db
>> module.
>>
>> How does this work?
>>
>> Moreover none of these Birney distribution are installable on my  
>> system.
>> After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always
>> fail. So I have to install the CRAFFI bundle but it does not seem  
>> that
>> Bio::DB module is included in this bundle because my programs  
>> using that
>> module do not work.
>>
>> Help would be appreciated :)
>>
>> Cheers,
>> Bernd
>>
>> Appendix:
>>
>> cpan[6]> d /bioperl/
>> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.tar.gz
>> Distribution    BIRNEY/bioperl-1.4.tar.gz
>> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
>> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
>> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
>> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
>> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
>> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
>> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
>> 12 items found
>>
>>
>
> -- 
> Dipl.-Inform.(FH)
> Bernd Mueller
> phone: +49 179 2336692
> email: bernd at kirx.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From allenday at ucla.edu  Sun Apr 29 20:57:55 2007
From: allenday at ucla.edu (Allen Day)
Date: Sun, 29 Apr 2007 17:57:55 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
Message-ID: <5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>

Doesn't matter to me if it stays or not.  If you're cleaning house
feel free to get rid of it.

-Allen

On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Quick response!  Yep, I've run into this with a few publishers.
> Though they're supposed to have 'permanent' links for those of us who
> like to link to our pubs they frequently change (scary if that's
> their definition of permanent).
>
> Did you want us to remove the code from CVS?
>
> chris
>
> On Apr 29, 2007, at 7:21 PM, Allen Day wrote:
>
> > Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
> > ago.  I only implemented for a few journals, so it never worked for a
> > large fraction of publications.  Probably it barely works or does not
> > work at all now b/c of how the PDF are scraped out of the HTML.  The
> > publisher sites are always modifying their HTML, presumably trying to
> > prevent automated download like this.
> >
> > -Allen
> >
> > On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> Allen (or anyone),
> >>
> >> What is the status of this module?  It requires a module not listed
> >> in the dependencies (WWW:Mechanize) and has no tests.
> >>
> >> chris
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Mon Apr 30 11:15:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Apr 2007 10:15:16 -0500
Subject: [Bioperl-l] PMC and EUtilities, was  bioperl::db
In-Reply-To: <4635B1BD.9030402@kirx.de>
References: <46335BD7.8040306@kirx.de> <463522F1.2010406@kirx.de>
	<01DC1D72-8AFA-4C11-8795-D4787506C602@uiuc.edu>
	<4635B1BD.9030402@kirx.de>
Message-ID: <D11CE380-EDEC-4F7F-80EA-09D915EA79F0@uiuc.edu>

Bernd,

As a pretext to this discussion, I am in the middle of refactoring  
EUtilities; the next incarnation should have a similar API but will  
likely set parameters via simpler methods (no need for all the getter/ 
setters).

You'll likely have to parse out the tags yourself, AFAIK there is no  
BioPerl XML parser for PMC XML and a quick grep search turns up  
nothing but PubMed parsers.  If you aren't familiar with XML parsing  
you could try XML::Simple to get at what you want.  I would pass the  
XML in as small chunks (maybe by retrieving them in batches of 100 or  
less) and initially use Data::Dumper to determine the data structure  
XML::Simple returns (PMC XML has attributes and elements, so the  
structure will be more complex).  Then just iterate through articles  
and grab what you want.

I think the predominant portion of articles in PubMed Central are  
free full-text access (if not all):

http://www.pubmedcentral.nih.gov/about/faq.html#q9

You can retrieve them via ftp:

ftp://ftp.ncbi.nlm.nih.gov/pub/pmc

which contains an index file of all articles and their dir. location  
(the readme gives more info).

chris

On Apr 30, 2007, at 4:07 AM, Bernd Mueller wrote:

> Hello,
>
> I think so. The ids from my wanted articles are retrieved by  
> Bio::DB::EUtilities::esearch. Afterwards I download the articles  
> with Bio::DB::EUtilities::efetch. It is only possible to download  
> in XML format from PMC. So post processing is actually needed  
> because I want the articles in plain format.
>
> But I don't know why I have results of non-free articles, i.e.  
> abstracts where full articles should be found with a query  
> constraining to only free fulltext. In the query I limit the search  
> with the filter "AND free fulltext[filter]".Probably this is a  
> matter concerning not directly bioperl but the eutilities interface  
> of PMC.
>
> Regards,
> Bernd


From allenday at ucla.edu  Mon Apr 30 12:44:12 2007
From: allenday at ucla.edu (Allen Day)
Date: Mon, 30 Apr 2007 09:44:12 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <4635FDD8.8030704@jouy.inra.fr>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
	<5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>
	<4635FDD8.8030704@jouy.inra.fr>
Message-ID: <5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>

DOI is definitely the right way to do this.  It wasn't implemented
widely at the time I wrote this module.

-Allen

On 4/30/07, St?phane T?letch?a <steletch at jouy.inra.fr> wrote:
> Allen Day a ?crit :
> > Doesn't matter to me if it stays or not.  If you're cleaning house
> > feel free to get rid of it.
> >
> > -Allen
> >
>
> I've worked on something on the other way around: get information about
> a pdf from the DOI if present. Most recent publications do have a doi,
> and i use this as a target for my request.
>
> This does not solve the problem, but may help others, feel free to ask
> if it can help the ongoing work, the code is quite dirty ...
>
> St?phane
>
>
> --
> St?phane T?letch?a, PhD.                  http://www.steletch.org
> Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig
> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>


From cjfields at uiuc.edu  Mon Apr 30 13:55:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Apr 2007 12:55:01 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
	<5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>
	<4635FDD8.8030704@jouy.inra.fr>
	<5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>
Message-ID: <34F19F02-1B7B-41A1-90B1-F373C49BC012@uiuc.edu>

Agreed; even some seq. records may have DOI now.  PubMed and PMC XML  
contain this, so it is possible to parse the DOI out if one were  
inclined to incorporate this into Bio::Biblio (I added a doi() getter/ 
setter into Bio::Annotation::Reference a few months back).

chris

On Apr 30, 2007, at 11:44 AM, Allen Day wrote:

> DOI is definitely the right way to do this.  It wasn't implemented
> widely at the time I wrote this module.
>
> -Allen
>
> On 4/30/07, St?phane T?letch?a <steletch at jouy.inra.fr> wrote:
>> Allen Day a ?crit :
>>> Doesn't matter to me if it stays or not.  If you're cleaning house
>>> feel free to get rid of it.
>>>
>>> -Allen
>>>
>>
>> I've worked on something on the other way around: get information  
>> about
>> a pdf from the DOI if present. Most recent publications do have a  
>> doi,
>> and i use this as a target for my request.
>>
>> This does not solve the problem, but may help others, feel free to  
>> ask
>> if it can help the ongoing work, the code is quite dirty ...
>>
>> St?phane
>>
>>
>> --
>> St?phane T?letch?a, PhD.                  http://www.steletch.org
>> Unit? Math?matique Informatique et G?nome http:// 
>> migale.jouy.inra.fr/mig
>> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
>> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr 30 16:05:45 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 30 Apr 2007 13:05:45 -0700 (PDT)
Subject: [Bioperl-l] generate a fasta file from the blast report
Message-ID: <10259461.post@talk.nabble.com>


hi all,
if i have the following script working on my blast report, can anyone plz
tell me how can i generate a fasta format file of just the hits (subject)
sequence.
thanks alot.
 
use strict;
 use Bio::SearchIO;
   
    my $in = new Bio::SearchIO(-format => 'blast', 
                               -file   => 'report.bls');
    while( my $result = $in->next_result ) {
      while( my $hit = $result->next_hit ) {
        while( my $hsp = $hit->next_hsp ) {
          if( $hsp->length('total') > 100 &&
              $hsp->percent_identity >= 75 ) {
              print "Hit= ", $hit->name, 
                    ", len=",$hsp->length('total'), 
                    ", percent_id=", $hsp->percent_identity, "\n";
          }
        }  
      }
    }
-- 
View this message in context: http://www.nabble.com/generate-a-fasta-file-from-the-blast-report-tf3671549.html#a10259461
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From Francoise.LECOMTE at biogemma.com  Mon Apr 30 06:35:03 2007
From: Francoise.LECOMTE at biogemma.com (Francoise.LECOMTE at biogemma.com)
Date: Mon, 30 Apr 2007 12:35:03 +0200
Subject: [Bioperl-l] Pb makefile
Message-ID: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>

Hi
I try to install biopoerl1.4 on Tru64 plateform and I've got a message 
"make:line too long" when I run the command make install
How can I solve it ? How disable man pages installaton in Makefile.PL if 
it can sove this problem 

Best regards 

Fran?oise Lecomte 


From torsten.seemann at infotech.monash.edu.au  Mon Apr 30 20:22:35 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 1 May 2007 10:22:35 +1000
Subject: [Bioperl-l] generate a fasta file from the blast report
In-Reply-To: <10259461.post@talk.nabble.com>
References: <10259461.post@talk.nabble.com>
Message-ID: <a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>

> if i have the following script working on my blast report, can anyone plz
> tell me how can i generate a fasta format file of just the hits (subject)
> sequence.

Do you want the WHOLE subject sequence, or just the region that hit the query?

The hit is available as $hsp->hit_string();
http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/GenericHSP.html#CODE11

The whole subject sequence would require the original Fasta input file.

By the way, are your questions for work related issues, or is this
your homework or assignment for a study course?

--Torsten


From dmessina at wustl.edu  Sun Apr  1 22:54:58 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 1 Apr 2007 21:54:58 -0500
Subject: [Bioperl-l] installation bioperl
Message-ID: <6EFFF13A-66E7-418F-8B8E-A8AA8826DE83@wustl.edu>

We need more information to be able to help you. Could you please  
show us the actual output you see when trying to install Bioperl?

Also, we need to know:

- what operating system you have
- what version of Bioperl you are trying to install

See

http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance

and please read the rest of the document, too.

Dave


From aharry2001 at yahoo.com  Mon Apr  2 06:09:25 2007
From: aharry2001 at yahoo.com (Ambrose)
Date: Mon, 2 Apr 2007 03:09:25 -0700 (PDT)
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <B04E1B58-9BE1-407A-91D2-6EA9C0BA2A38@uiuc.edu>
Message-ID: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>

Hello All,
             I have some problems parsing KEGG using bioperl. I get out of memory problem.I current have 1G RAM.Can some tell me why this is happening and how it can be solved.It is beacuse the objects passed to bioiperl are so big or what?

best regrads
Ambrose

 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.


From cjfields at uiuc.edu  Mon Apr  2 08:43:18 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 07:43:18 -0500
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>
References: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>
Message-ID: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu>

This doesn't really explain much beyond stating you are having  
problems.  You need to post some code (to the mail list!) and let us  
know what version of BioPerl you are using.

chris

On Apr 2, 2007, at 5:09 AM, Ambrose wrote:

> Hello All,
>              I have some problems parsing KEGG using bioperl. I get  
> out of memory problem.I current have 1G RAM.Can some tell me why  
> this is happening and how it can be solved.It is beacuse the  
> objects passed to bioiperl are so big or what?
>
> best regrads
> Ambrose
>
>
> ---------------------------------
> TV dinner still cooling?
> Check out "Tonight's Picks" on Yahoo! TV.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From aharry2001 at yahoo.com  Mon Apr  2 09:56:33 2007
From: aharry2001 at yahoo.com (Ambrose)
Date: Mon, 2 Apr 2007 06:56:33 -0700 (PDT)
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu>
Message-ID: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>


Hello ALL,

I have the code below,which parses my kegg files.A host of the files are parsed and the information is inserted into my databases but unfortunate after the program runs for some hours it stops showing the message out of memory.I assume that this happens because the bioperl object is too big.Please just check the code below

best regards Ambrose


#!/usr/local/ActivePerl/bin/perl
#
#

use strict;
use Bio::SeqIO;
use Bio::FASTASequence;
use DBI;
use Benchmark  qw(:all) ;

my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1,$pathway_id1,$pathway_name1,$ec_num);
my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,%dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
my( @kg_id);
my $db="gbdb";
my $host="localhost";
my $userid="root";
my $passwd="ubuntu";
my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/mysqld.sock";
my ($t1,$t2);
my $dbh = DBI->connect($connectionInfo,$userid,$passwd);
my $time_used;
 
 
 eval { $dbh->do("DROP TABLE kegginfo") };
 print "Dropping kegginfo failed: $@\n" if $@;
 $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL AUTO_INCREMENT,
                                   up_id INT UNSIGNED REFERENCES uniprotinfo(up_id),
                                                                  filename VARCHAR(50),
                                                    kegg_id VARCHAR(50),
                                   keggaccn VARCHAR(50),
                                                                  description VARCHAR(250),
                                   ec_numbers VARCHAR(250),
                                              pathway_id VARCHAR(250),
                                              pathway_name VARCHAR(250),
                                              crc64 VARCHAR(50),
                                   ko_id VARCHAR(50),
                                   pfam_id VARCHAR(50),
                                   ncbigi_id VARCHAR(50),
                                   ncbigeneid_id VARCHAR(50),
                                   uniprot_id VARCHAR(50),
                                   prosite_id VARCHAR(50),
                                   PRIMARY KEY (kg_id)
                                 )");
                                 

eval { $dbh->do("DROP TABLE keggntsequence") };
print "Dropping keggntsequence failed: $@\n" if $@;
$dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id),
                                                    keggaccn VARCHAR(50),
                                  nucleotidesequence text
                                 )");

eval { $dbh->do("DROP TABLE keggaasequence") };
print "Dropping keggaasequence failed: $@\n" if $@;
$dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id),
                                                    keggaccn VARCHAR(50),
                                                    crc64 VARCHAR(50),
                                  aminoacidsequence text
                                 )");
eval { $dbh->do("DROP TABLE timestable") };
print "Dropping timestable failed: $@\n" if $@;
$dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT NULL AUTO_INCREMENT,
                                   genome VARCHAR(100),
                                    totaltime_seconds int(100),
                                                                  PRIMARY KEY(aut_id))");


open (LIST, "genomes.list") || die "Cannot open input kegg genomes file genomes.list\n $! \n";
$t1=new Benchmark;
my @genomelist = ();
while (my $line=<LIST>) {
    #ignore comment lines
    if ($line !~ /^#/) {
        chomp $line;
                
        push (@genomelist, $line); #store the filename
    }
}

close LIST;
my $count=0;
foreach my $genomefile (@genomelist) {

    #in case the user fails to remove some strange files from
    #the genomes.list file.. check for the KEGG format
    my $check=checkKeggFormat($genomefile);
    if ($check==0) {
        #if file is not kegg, start with the next one...
        print "ERROR: $genomefile doesn't look like a KEGG file to me! \n";
        #<stdin>;
        next;
    }
#print $genomefile,"\n";
    my $stream = Bio::SeqIO->new(-file => $genomefile, -format => 'KEGG');

    while ( my $seq = $stream->next_seq() ) {

        my $primary_id = $seq->primary_id;
        my $display_id = $seq->display_id; #name
        my $keggaccn   = $seq->accession; #accn
        my @description = $seq->annotation->get_Annotations('description');
        
        my @dblinks     = $seq->annotation->get_Annotations('dblink');
        my @orthologs   = $seq->annotation->get_Annotations('ortholog');
        my @orthologs   = grep {$_->database eq 'KO'} $seq->annotation->get_Annotations('dblink');
        my @class       = $seq->annotation->get_Annotations('pathway');
         $ntseq{$keggaccn} = $seq->seq;
         $aaseq{$keggaccn} = $seq->translate->seq; 
         $aaseq{$keggaccn} =~s /\*$//;
                 my $fasta = ">".$count."\n".$aaseq{$keggaccn};
         my $newseq = Bio::FASTASequence->new($fasta);
         $crc64{$keggaccn}=$newseq->getCrc64();
#print $keggaccn,"crc64:$crc64{$keggaccn}\n";
        
        $count++;
        if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no keggaccn\n";
        next;}    

        if(@dblinks)
        {
                my @dblink_KO=();
                my @dblink_Pfam=();
                my @dblink_PROSITE=();
                my @dblink_NCBIGI=();
                my @dblink_NCBIGENEID=();
                my @dblink_UniProt=();
        
                foreach my $ele (@dblinks) {
                    if ($ele =~ /^KO:/){
                        $ele=~s/KO://;
                        push (@dblink_KO,$ele);
                        $dblink_KO{$keggaccn}=$ele;
                        next;
                    }
                        #parse Pfam: dblink
                    if ($ele =~ /^Pfam:/){
                        $ele=~s/Pfam://;
                        push (@dblink_Pfam,$ele);
                        $dblink_Pfam{$keggaccn}=$ele;
                        next;
                    }
                        #parse PROSITE: dblink
                    if ($ele =~ /^PROSITE:/){
                        $ele=~s/PROSITE://;
                        push (@dblink_PROSITE,$ele);
                        $dblink_PROSITE{$keggaccn}=$ele;
                        next;
                    }
                        #parse NCBI-GI: dblink
                    if ($ele =~ /^NCBI-GI:/){
                        $ele=~s/NCBI-GI://;
                        push (@dblink_NCBIGI,$ele);
                        $dblink_NCBIGI{$keggaccn}=$ele;
                        next;
                    }
                        #parse NCBI-GeneID: dblink
                    if ($ele =~ /^NCBI-GeneID:/){
                        $ele=~s/NCBI-GeneID://;
                        push (@dblink_NCBIGENEID,$ele);
                        $dblink_NCBIGENEID{$keggaccn}=$ele;
                        next;
                        }
                        #parse UniProt: dblink
                    if ($ele =~ /^UniProt:/){
                        $ele=~s/UniProt://;
                        push (@dblink_UniProt,$ele);
                        $dblink_UniProt{$keggaccn}=$ele;
                        next;
                    }
            
                }#end foreach     #finished parsing all dblinks    
        }#end if @dblinks
        if(@class)
        {
            foreach my $pathway (@class) {
    
                $pathway=~s/^\s+|\s+$//;
                my @arr = split (/\s+/,$pathway);
                my $pathway_id = $arr[0];
                shift @arr;
                my $pathway_name = join(" ", at arr);
                $pathway_name{$keggaccn}=$pathway_name;
                $pathway_id{$keggaccn}=$pathway_id;
                #print $pathway_id{$keggaccn},"\t",$pathway_name{$keggaccn},"\n";
                                    
            }
            
        }
        
        my @ecnumbers=();
        @ecnumbers = extractECnumbers(@description);
        if(@ecnumbers)
        {
                if (@ecnumbers!=0) 
                {
                    foreach my $ecn (@ecnumbers) 
                    {
                       $ecnumbers{$keggaccn}=$ecn;
                    }#end foreach
                }
                else {
                    #print "ECnumbers:\n";
                     }
        }
        
        
#         print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t",$dblink_NCBIGENEID{$keggaccn},
#                 "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers{$keggaccn}","\t",
#                  "p1:$pathway_id{$keggaccn}","\t","p2:$pathway_name{$keggaccn}","\n";
#         
                $dbh->do("INSERT INTO kegginfo VALUES (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)",
         undef,"NULL","NULL",$genomefile,$display_id,$keggaccn, at description,$ecnumbers{$keggaccn},
                  $pathway_id{$keggaccn},$pathway_name{$keggaccn},$crc64{$keggaccn},$dblink_KO{$keggaccn},
                 $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn},$dblink_NCBIGENEID{$keggaccn},
                 $dblink_UniProt{$keggaccn},$dblink_PROSITE{$keggaccn});
         

        $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)",
            undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn});
                        

        $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)",
            undef,"",$keggaccn,$ntseq{$keggaccn});
                
               
    }
     $t2=new Benchmark;
    $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile");
    $dbh->do("INSERT INTO timestable VALUES (?,?,?)",
    undef,"NULL",$genomefile,$time_used);
 
}


$dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)");
print "Index created on kegginfo\n";

$dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence (kg_id,keggaccn)");
print "Index created on keggaasequence\n";

$dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence (kg_id,keggaccn)");
print "Index created on keggntsequence\n";


print"Updating the tables................\n";

    
$dbh->do("update kegginfo,keggaasequence set keggaasequence.kg_id=kegginfo.kg_id 
         where 
                kegginfo.keggaccn=keggaasequence.keggaccn");
        print " keggaasequence kg_id\n";

$dbh->do("update kegginfo,keggntsequence set keggntsequence.kg_id=kegginfo.kg_id 
         where 
                kegginfo.keggaccn=keggntsequence.keggaccn");
        print " keggaasequence kg_id\n";


sub extractECnumbers ($) {
    #sample description lines
     #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 2.7.7.2]
    #ATP synthase F0 subunit c [EC:3.6.3.14]

    my @desc=shift;
    my $description = join ("", at desc);
    my @ecnumbers=();
    #print "parsing ec for $description..\n";
    #check if EC number exists
    if ($description=~/\[EC:/) {
        
        my @array = split (/\[EC:/,$description);
        $array[1]=~s/]//g;
        shift @array; #remove the annotation , only EC numbers remain
        foreach my $ele (@array) {
            $ele=~s/^\s+|\s+$//g;
            $ele= "EC:".$ele;
            push (@ecnumbers,$ele);
        }    
        return @ecnumbers;
    }
    else {
        #return an empty value
        return ;

    }

}


sub checkKeggFormat ($) {
=head2

checkKeggFormat

make sure that the file is a valid KEGG file
function checks the first two lines,
1st must start with ENTRY
2nd must start with DEFINITION

returns 0 or 1

=cut
    my $genomefile=shift;

    open (TEST,$genomefile) || die "Cannot open file $genomefile for reading \n";
    my $testline=<TEST>;
#print "$testline\n";
    if ($testline=~/^ENTRY/) {
        #continue
        #$testline=<TEST>;#double check
        #if ($testline=~/^NAME/) {
            #this looks like a valid kegg file
            return 1;
        #}
        #else {
        #    close TEST;
        #    return 0;
        #}
    }
    else {
        close TEST;    
        return 0;
    }

}

sub timeThis ($$$) 
{
    my ($start,$end,$message) = @_;
    my $td = timediff($end, $start);
    my $t = timestr($td);    
        print "$message : ",$t,"\n";
        my @array = split (/\s+/,$t);
#20 wallclock secs (14.23 usr +  0.84 sys = 15.07 CPU)
        return $array[0]; #return the no. of seconds.
}

   
---------------------------------
Looking for earth-friendly autos? 
 Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.  


From e-just at northwestern.edu  Mon Apr  2 10:12:33 2007
From: e-just at northwestern.edu (Eric Just)
Date: Mon, 2 Apr 2007 09:12:33 -0500
Subject: [Bioperl-l] Can't locate object method "seq_start" via package
	"Bio::DB::GenBank"
Message-ID: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>

Hello,

I am getting this error while running a bioperl script that I had been using
in bioperl 1.4.  On upgradeing to bioperl 1.5.2 I get the following fatal
error

Can't locate object method "seq_start" via package "Bio::DB::GenBank"

My script is as follows:


use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

my $gb = new Bio::DB::GenBank();

my $query = Bio::DB::Query::GenBank->new(
      -query   =>'txid44689[Organism:noexp]',
      -reldate => 60,
      -db      => 'nucleotide'

);

my $in = $gb->get_Stream_by_query($query);

while ( my $seq = $in->next_seq()) {
      print "do something";
      #....
}


I noticed that seq_start is created in the begin block of
Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have
expericence troubleshooting this kind of autoloaded method.  Any idea where
to start?

Thanks

Eric


From e-just at northwestern.edu  Mon Apr  2 10:15:28 2007
From: e-just at northwestern.edu (Eric Just)
Date: Mon, 2 Apr 2007 09:15:28 -0500
Subject: [Bioperl-l] Can't locate object method "seq_start" via package
	"Bio::DB::GenBank"
In-Reply-To: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>
References: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>
Message-ID: <fa1fe35c0704020715u1f14f273n100d4e21f848603d@mail.gmail.com>

Sorry about that.

As soon as I sent the email I found my problem ( an old NCBIHelper in my
inheritance path ).   There is no bug here.

Eric


On 4/2/07, Eric Just <e-just at northwestern.edu> wrote:
>
> Hello,
>
> I am getting this error while running a bioperl script that I had been
> using in bioperl 1.4.  On upgradeing to bioperl 1.5.2 I get the following
> fatal error
>
> Can't locate object method "seq_start" via package "Bio::DB::GenBank"
>
> My script is as follows:
>
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> my $gb = new Bio::DB::GenBank();
>
> my $query = Bio::DB::Query::GenBank->new(
>       -query   =>'txid44689[Organism:noexp]',
>       -reldate => 60,
>       -db      => 'nucleotide'
>
> );
>
> my $in = $gb->get_Stream_by_query($query);
>
> while ( my $seq = $in->next_seq()) {
>       print "do something";
>       #....
> }
>
>
>
> I noticed that seq_start is created in the begin block of
> Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have
> expericence troubleshooting this kind of autoloaded method.  Any idea where
> to start?
>
> Thanks
>
> Eric
>


From cjfields at uiuc.edu  Mon Apr  2 11:32:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 10:32:59 -0500
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>
References: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>
Message-ID: <38475C93-FB21-4BC4-BF5D-7F48493E8EE2@uiuc.edu>

Ambrose,

Data is persisting in your hashes (in particular DBLink objects),  
which is eating away at your memory.  If I take a sample KEGG gene  
file and simply parse it:

while (my $seq = $io->next_seq) {
     print $seq->accession,"\n";
}

there are no memory issues, but if I store the data in hashes  
declared outside the loop:

my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);

while (my $seq = $io->next_seq) {
     # store Bio::Seq data in hashes
}

I see problems with only one genome file with KEGG records.  You'll  
definitely run into memory issues if you are parsing many genome  
files, which you appear to be:

my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);

for my $genomefile (@genomelist) {
     while (my $seq = $io->next_seq) {
         # store Bio::Seq data in hashes
     }
}

Localizing the hashes to the genome or sequence loops should prevent  
the memory problem.

Note that the DBLink Annotation objects are overloaded so they act  
like a string ($ele =~ /^KO:/) but are actually  
Bio::Annotation::DBLink objects, something we will likely get rid of  
in the near future.

chris

On Apr 2, 2007, at 8:56 AM, Ambrose wrote:

>
>
> Hello ALL,
>
> I have the code below,which parses my kegg files.A host of the  
> files are parsed and the information is inserted into my databases  
> but unfortunate after the program runs for some hours it stops  
> showing the message out of memory.I assume that this happens  
> because the bioperl object is too big.Please just check the code below
>
> best regards Ambrose
>
>
> #!/usr/local/ActivePerl/bin/perl
> #
> #
>
> use strict;
> use Bio::SeqIO;
> use Bio::FASTASequence;
> use DBI;
> use Benchmark  qw(:all) ;
>
> my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1, 
> $pathway_id1,$pathway_name1,$ec_num);
> my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
> dblink_NCBIGENEID,%dblink_UniProt);
> my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
> my( @kg_id);
> my $db="gbdb";
> my $host="localhost";
> my $userid="root";
> my $passwd="ubuntu";
> my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/ 
> mysqld.sock";
> my ($t1,$t2);
> my $dbh = DBI->connect($connectionInfo,$userid,$passwd);
> my $time_used;
>
>
>
>  eval { $dbh->do("DROP TABLE kegginfo") };
>  print "Dropping kegginfo failed: $@\n" if $@;
>  $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL  
> AUTO_INCREMENT,
>                                    up_id INT UNSIGNED REFERENCES  
> uniprotinfo(up_id),
>                                                                    
> filename VARCHAR(50),
>                                                     kegg_id VARCHAR 
> (50),
>                                    keggaccn VARCHAR(50),
>                                                                    
> description VARCHAR(250),
>                                    ec_numbers VARCHAR(250),
>                                               pathway_id VARCHAR(250),
>                                               pathway_name VARCHAR 
> (250),
>                                               crc64 VARCHAR(50),
>                                    ko_id VARCHAR(50),
>                                    pfam_id VARCHAR(50),
>                                    ncbigi_id VARCHAR(50),
>                                    ncbigeneid_id VARCHAR(50),
>                                    uniprot_id VARCHAR(50),
>                                    prosite_id VARCHAR(50),
>                                    PRIMARY KEY (kg_id)
>                                  )");
>
>
> eval { $dbh->do("DROP TABLE keggntsequence") };
> print "Dropping keggntsequence failed: $@\n" if $@;
> $dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED  
> REFERENCES uniprotinfo(kg_id),
>                                                     keggaccn VARCHAR 
> (50),
>                                   nucleotidesequence text
>                                  )");
>
> eval { $dbh->do("DROP TABLE keggaasequence") };
> print "Dropping keggaasequence failed: $@\n" if $@;
> $dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED  
> REFERENCES uniprotinfo(kg_id),
>                                                     keggaccn VARCHAR 
> (50),
>                                                     crc64 VARCHAR(50),
>                                   aminoacidsequence text
>                                  )");
> eval { $dbh->do("DROP TABLE timestable") };
> print "Dropping timestable failed: $@\n" if $@;
> $dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT  
> NULL AUTO_INCREMENT,
>                                    genome VARCHAR(100),
>                                     totaltime_seconds int(100),
>                                                                    
> PRIMARY KEY(aut_id))");
>
>
>
> open (LIST, "genomes.list") || die "Cannot open input kegg genomes  
> file genomes.list\n $! \n";
> $t1=new Benchmark;
> my @genomelist = ();
> while (my $line=<LIST>) {
>     #ignore comment lines
>     if ($line !~ /^#/) {
>         chomp $line;
>
>         push (@genomelist, $line); #store the filename
>     }
> }
>
> close LIST;
> my $count=0;
> foreach my $genomefile (@genomelist) {
>
>     #in case the user fails to remove some strange files from
>     #the genomes.list file.. check for the KEGG format
>     my $check=checkKeggFormat($genomefile);
>     if ($check==0) {
>         #if file is not kegg, start with the next one...
>         print "ERROR: $genomefile doesn't look like a KEGG file to  
> me! \n";
>         #<stdin>;
>         next;
>     }
> #print $genomefile,"\n";
>     my $stream = Bio::SeqIO->new(-file => $genomefile, -format =>  
> 'KEGG');
>
>     while ( my $seq = $stream->next_seq() ) {
>
>         my $primary_id = $seq->primary_id;
>         my $display_id = $seq->display_id; #name
>         my $keggaccn   = $seq->accession; #accn
>         my @description = $seq->annotation->get_Annotations 
> ('description');
>
>         my @dblinks     = $seq->annotation->get_Annotations('dblink');
>         my @orthologs   = $seq->annotation->get_Annotations 
> ('ortholog');
>         my @orthologs   = grep {$_->database eq 'KO'} $seq- 
> >annotation->get_Annotations('dblink');
>         my @class       = $seq->annotation->get_Annotations 
> ('pathway');
>          $ntseq{$keggaccn} = $seq->seq;
>          $aaseq{$keggaccn} = $seq->translate->seq;
>          $aaseq{$keggaccn} =~s /\*$//;
>                  my $fasta = ">".$count."\n".$aaseq{$keggaccn};
>          my $newseq = Bio::FASTASequence->new($fasta);
>          $crc64{$keggaccn}=$newseq->getCrc64();
> #print $keggaccn,"crc64:$crc64{$keggaccn}\n";
>
>         $count++;
>         if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no  
> keggaccn\n";
>         next;}
>
>         if(@dblinks)
>         {
>                 my @dblink_KO=();
>                 my @dblink_Pfam=();
>                 my @dblink_PROSITE=();
>                 my @dblink_NCBIGI=();
>                 my @dblink_NCBIGENEID=();
>                 my @dblink_UniProt=();
>
>                 foreach my $ele (@dblinks) {
>                     if ($ele =~ /^KO:/){
>                         $ele=~s/KO://;
>                         push (@dblink_KO,$ele);
>                         $dblink_KO{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse Pfam: dblink
>                     if ($ele =~ /^Pfam:/){
>                         $ele=~s/Pfam://;
>                         push (@dblink_Pfam,$ele);
>                         $dblink_Pfam{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse PROSITE: dblink
>                     if ($ele =~ /^PROSITE:/){
>                         $ele=~s/PROSITE://;
>                         push (@dblink_PROSITE,$ele);
>                         $dblink_PROSITE{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse NCBI-GI: dblink
>                     if ($ele =~ /^NCBI-GI:/){
>                         $ele=~s/NCBI-GI://;
>                         push (@dblink_NCBIGI,$ele);
>                         $dblink_NCBIGI{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse NCBI-GeneID: dblink
>                     if ($ele =~ /^NCBI-GeneID:/){
>                         $ele=~s/NCBI-GeneID://;
>                         push (@dblink_NCBIGENEID,$ele);
>                         $dblink_NCBIGENEID{$keggaccn}=$ele;
>                         next;
>                         }
>                         #parse UniProt: dblink
>                     if ($ele =~ /^UniProt:/){
>                         $ele=~s/UniProt://;
>                         push (@dblink_UniProt,$ele);
>                         $dblink_UniProt{$keggaccn}=$ele;
>                         next;
>                     }
>
>                 }#end foreach     #finished parsing all dblinks
>         }#end if @dblinks
>         if(@class)
>         {
>             foreach my $pathway (@class) {
>
>                 $pathway=~s/^\s+|\s+$//;
>                 my @arr = split (/\s+/,$pathway);
>                 my $pathway_id = $arr[0];
>                 shift @arr;
>                 my $pathway_name = join(" ", at arr);
>                 $pathway_name{$keggaccn}=$pathway_name;
>                 $pathway_id{$keggaccn}=$pathway_id;
>                 #print $pathway_id{$keggaccn},"\t",$pathway_name 
> {$keggaccn},"\n";
>
>             }
>
>         }
>
>         my @ecnumbers=();
>         @ecnumbers = extractECnumbers(@description);
>         if(@ecnumbers)
>         {
>                 if (@ecnumbers!=0)
>                 {
>                     foreach my $ecn (@ecnumbers)
>                     {
>                        $ecnumbers{$keggaccn}=$ecn;
>                     }#end foreach
>                 }
>                 else {
>                     #print "ECnumbers:\n";
>                      }
>         }
>
>
> #         print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t", 
> $dblink_NCBIGENEID{$keggaccn},
> #                 "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers 
> {$keggaccn}","\t",
> #                  "p1:$pathway_id{$keggaccn}","\t","p2: 
> $pathway_name{$keggaccn}","\n";
> #
>                 $dbh->do("INSERT INTO kegginfo VALUES  
> (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)",
>          undef,"NULL","NULL",$genomefile,$display_id, 
> $keggaccn, at description,$ecnumbers{$keggaccn},
>                   $pathway_id{$keggaccn},$pathway_name{$keggaccn}, 
> $crc64{$keggaccn},$dblink_KO{$keggaccn},
>                  $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn}, 
> $dblink_NCBIGENEID{$keggaccn},
>                  $dblink_UniProt{$keggaccn},$dblink_PROSITE 
> {$keggaccn});
>
>
>         $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)",
>             undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn});
>
>
>         $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)",
>             undef,"",$keggaccn,$ntseq{$keggaccn});
>
>
>     }
>      $t2=new Benchmark;
>     $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile");
>     $dbh->do("INSERT INTO timestable VALUES (?,?,?)",
>     undef,"NULL",$genomefile,$time_used);
>
> }
>
>
> $dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)");
> print "Index created on kegginfo\n";
>
> $dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence  
> (kg_id,keggaccn)");
> print "Index created on keggaasequence\n";
>
> $dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence  
> (kg_id,keggaccn)");
> print "Index created on keggntsequence\n";
>
>
> print"Updating the tables................\n";
>
>
> $dbh->do("update kegginfo,keggaasequence set  
> keggaasequence.kg_id=kegginfo.kg_id
>          where
>                 kegginfo.keggaccn=keggaasequence.keggaccn");
>         print " keggaasequence kg_id\n";
>
> $dbh->do("update kegginfo,keggntsequence set  
> keggntsequence.kg_id=kegginfo.kg_id
>          where
>                 kegginfo.keggaccn=keggntsequence.keggaccn");
>         print " keggaasequence kg_id\n";
>
>
>
> sub extractECnumbers ($) {
>     #sample description lines
>      #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26  
> 2.7.7.2]
>     #ATP synthase F0 subunit c [EC:3.6.3.14]
>
>     my @desc=shift;
>     my $description = join ("", at desc);
>     my @ecnumbers=();
>     #print "parsing ec for $description..\n";
>     #check if EC number exists
>     if ($description=~/\[EC:/) {
>
>         my @array = split (/\[EC:/,$description);
>         $array[1]=~s/]//g;
>         shift @array; #remove the annotation , only EC numbers remain
>         foreach my $ele (@array) {
>             $ele=~s/^\s+|\s+$//g;
>             $ele= "EC:".$ele;
>             push (@ecnumbers,$ele);
>         }
>         return @ecnumbers;
>     }
>     else {
>         #return an empty value
>         return ;
>
>     }
>
> }
>
>
>
>
>
>
>
> sub checkKeggFormat ($) {
> =head2
>
> checkKeggFormat
>
> make sure that the file is a valid KEGG file
> function checks the first two lines,
> 1st must start with ENTRY
> 2nd must start with DEFINITION
>
> returns 0 or 1
>
> =cut
>     my $genomefile=shift;
>
>     open (TEST,$genomefile) || die "Cannot open file $genomefile  
> for reading \n";
>     my $testline=<TEST>;
> #print "$testline\n";
>     if ($testline=~/^ENTRY/) {
>         #continue
>         #$testline=<TEST>;#double check
>         #if ($testline=~/^NAME/) {
>             #this looks like a valid kegg file
>             return 1;
>         #}
>         #else {
>         #    close TEST;
>         #    return 0;
>         #}
>     }
>     else {
>         close TEST;
>         return 0;
>     }
>
> }
>
> sub timeThis ($$$)
> {
>     my ($start,$end,$message) = @_;
>     my $td = timediff($end, $start);
>     my $t = timestr($td);
>         print "$message : ",$t,"\n";
>         my @array = split (/\s+/,$t);
> #20 wallclock secs (14.23 usr +  0.84 sys = 15.07 CPU)
>         return $array[0]; #return the no. of seconds.
> }
>
>
>
>
> ---------------------------------
> Looking for earth-friendly autos?
>  Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Mon Apr  2 12:19:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 2 Apr 2007 11:19:51 -0500
Subject: [Bioperl-l] installation bioperl
Message-ID: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>

Hi Fahmi,

Please include the list on the reply so that others can comment, too.

Yes, it appears the machine you are installing on does not have an  
internet connection. You probably will want to resolve that problem  
before dealing with Bioperl. Alternatively, you could simply install  
and use Bioperl  on the machine which does have an internet connection.

If you really need to get Bioperl installed on that machine, however,  
probably the easiest way would be to find a machine that does have an  
internet connection, install CPAN::Mini, and use it to make a local  
mirror of CPAN. You could then copy that local mirror over to the  
machine without the internet connection and point that machine's cpan  
at the local mirror (read the CPAN documentation to find out how to  
do this). Also, the BioPerl install instructions list several  
external packages that you will need to use some parts of Bioperl  
(e.g. GD). Again, you can download those distributions using the  
machine with the internet connection and copy them over.

Dave


On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote:

> thank you for answer. I will give you the maximum of informations  
> inorder to be able to diagnostic the problem:
>
> i use the linux mandriva 2006
> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained  
> from the url:
> http://www.bioperl.org/wiki/Release_1.5.2
> afetr that i made these commands which i found in the url
> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph  
> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ')
>
> >gunzip bioperl-1.5.2_102.tar.gz
> >tar xvf bioperl-1.5.2_102.tar
> >cd bioperl-1.5.2_102
> after that i made the command
> >perl Build.PL
> i obtained the text
> this package requires Module::Build v0.2805 or greater to install  
> itself
> install Module::Build now from CPAN?[y]
> i pushed enter and i obtained many lines such as
> System call"/usr/bin/wget -0-"ftp://.perl.org/pub/CPAN/modules/ 
> modlist.data.gz">home/fahmi/.cpan/sources/modules/03modlist.data
> Not connected
> cant access URL ftp://ftp.perl.org/CPAN/modules/modlist.data.gz
> ...
> i'm trying to install bioperl whithout having internet connection  
> beacause i don't know whay linux didn't detect my ethernet card.
> please tell me what should i do.
> tahnk you for your collaboration.


From cjfields at uiuc.edu  Mon Apr  2 14:10:30 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 13:10:30 -0500
Subject: [Bioperl-l] Fwd: BLAST beta, URLAPI, and BioPerl (RemoteBlast users)
References: <CD04BF03C87B6240A342461CDE1DEC0304091DB4@NIHCESMLBX8.nih.gov>
Message-ID: <002E7937-10DF-43CE-96F6-71DC743C1314@uiuc.edu>

This may be of interest to anyone using RemoteBlast.

For anyone who uses RemoteBlast, the new changes to NCBI's BLAST  
interface shouldn't affect anything (Scott tested it out).  If there  
are any abnormalities with RemoteBlast queries over the next few  
weeks let us know.

chris

Begin forwarded message:

> From: "Mcginnis, Scott \(NIH/NLM/NCBI\) [E]"  
> <mcginnis at ncbi.nlm.nih.gov>
> Date: April 2, 2007 12:53:33 PM CDT
> To: "Chris Fields" <cjfields at uiuc.edu>
> Subject: RE: BLAST beta, URLAPI, and BioPerl
>
> Hi Chris:
>
> We are ready to make the new pages the defaults come April 16th. An  
> announcement is going out shortly. There are some very minor  
> changes to the URL API and I have listed them below. IT will be  
> part of the announcements. Please note we actually tested BioPerl  
> and it seems to me fine with the new pages. If you have a news on  
> your site or a mailing list you might want to pass this on.
>
> A Note About URLAPI
>
> The new BLAST pages support URLAPI, a protocol that scripts and
> programs use to run BLAST searches and retrieve results over
> HTTP. (For more on URLAPI, see
> http://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html). The following
> information only applies to you if you develop or are responsible
> for software that uses URLAPI.
>
> The new pages have been tested and produce correct results with
> the following URLAPI client programs:
>
> * the BioPERL RemoteBlast module
> * the NCBI demo script http://ncbi.nlm.nih.gov/blast/docs/web_blast.pl
> * various scripts used in-house at NCBI
>
> Users of URLAPI should be aware of the following minor
> changes. In the new interface:
>
> 1. The Request ID (RID) format will be shorter.  The new format
>     is 11 alphanumeric characters (e.g. RDEFEA5012) and will have no
>     internal structure. The previous RID format was 36 or more
>     characters long, including punctuation (e.g.,
>     1175172712-21345-42512597310.BLASTQ3).
>
> 2. BLAST reports will show masked regions as lower-case letters
>     by default (see
>     http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W6,
>     figure 2. The current default behavior is to show masked
>     regions as N's or X's. Users may recover the current behavior
>     by adding &MASK_CHAR=0 to the query string for a URLAPI
>     request.
>
> 3. BLAST reports will show alignments for 100 database sequences
>     by default. The current reports show only 50 alignments by
>     default.
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Mon 3/5/2007 11:50 AM
> To: Mcginnis, Scott (NIH/NLM/NCBI) [E]
> Subject: BLAST beta, URLAPI, and BioPerl
>
> The BioPerl project has several have several modules and parsers
> which currently parse XML/text/tabular BLAST output, as well as a
> module which is capable of posting BLAST queries via the URLAPI
> interface.  Will any of the BLAST changes affect these (particularly
> URLAPI)?
>
> Thanks!
>
> chris
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From steletch at jouy.inra.fr  Tue Apr  3 08:28:39 2007
From: steletch at jouy.inra.fr (=?ISO-8859-1?Q?St=E9phane_T=E9letch=E9a?=)
Date: Tue, 03 Apr 2007 14:28:39 +0200
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
Message-ID: <46124877.4020605@jouy.inra.fr>

Alex Lancaster a ?crit :
> Hello bioperl,
> 
> I'm new to the bioperl world, having just started a research position
> in which I need to manage a large bioperl-based codebase.  To this
> end, I'm working on packaging bioperl as an official Fedora Package
> (formerly "Fedora Extras") and I'm currently wading through and
> packaging the long laundry list of Perl dependencies (then I'm going
> to try and do the same for biopython).  You can see my some of my
> progress (including links to the reviews) here:
> 
> http://fedoraproject.org/wiki/AlexLancaster
> 
> Several issues have arisen during the packaging that I hope the
>

Nice, i was on my way to do it :-)
I'm a Mandriva packager and have been kindly "spushed" for maintaining 
the bioperl package for Mandriva.

You can have a look at the work already done by Mandriva at the addresses:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl/current/
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl-run/current/

(Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).

Feel free to contact me if you need more input for dependencies, since 
they are quite a lot.

Cheers,
St?phane

-- 
St?phane T?letch?a, PhD.                  http://www.steletch.org
Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig
INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901


From cjfields at uiuc.edu  Tue Apr  3 10:58:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 3 Apr 2007 09:58:44 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <46124877.4020605@jouy.inra.fr>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
Message-ID: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>

Once these are set up we should add a page to the bioperl wiki to  
describe them in more detail (along with Allen's Biopackages).

chris

On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:

> Alex Lancaster a ?crit :
>> Hello bioperl,
>>
>> I'm new to the bioperl world, having just started a research position
>> in which I need to manage a large bioperl-based codebase.  To this
>> end, I'm working on packaging bioperl as an official Fedora Package
>> (formerly "Fedora Extras") and I'm currently wading through and
>> packaging the long laundry list of Perl dependencies (then I'm going
>> to try and do the same for biopython).  You can see my some of my
>> progress (including links to the reviews) here:
>>
>> http://fedoraproject.org/wiki/AlexLancaster
>>
>> Several issues have arisen during the packaging that I hope the
>>
>
> Nice, i was on my way to do it :-)
> I'm a Mandriva packager and have been kindly "spushed" for maintaining
> the bioperl package for Mandriva.
>
> You can have a look at the work already done by Mandriva at the  
> addresses:
> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- 
> bioperl/current/
> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- 
> bioperl-run/current/
>
> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
>
> Feel free to contact me if you need more input for dependencies, since
> they are quite a lot.
>
> Cheers,
> St?phane
>
> -- 
> St?phane T?letch?a, PhD.                  http://www.steletch.org
> Unit? Math?matique Informatique et G?nome http:// 
> migale.jouy.inra.fr/mig
> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From allenday at gmail.com  Tue Apr  3 13:54:51 2007
From: allenday at gmail.com (Allen Day)
Date: Tue, 3 Apr 2007 10:54:51 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
	<67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
Message-ID: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>

You can link Biopackages now, it's been done for nearly 2 years.

-Allen

On 4/3/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Once these are set up we should add a page to the bioperl wiki to
> describe them in more detail (along with Allen's Biopackages).
>
> chris
>
> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:
>
> > Alex Lancaster a ?crit :
> >> Hello bioperl,
> >>
> >> I'm new to the bioperl world, having just started a research position
> >> in which I need to manage a large bioperl-based codebase.  To this
> >> end, I'm working on packaging bioperl as an official Fedora Package
> >> (formerly "Fedora Extras") and I'm currently wading through and
> >> packaging the long laundry list of Perl dependencies (then I'm going
> >> to try and do the same for biopython).  You can see my some of my
> >> progress (including links to the reviews) here:
> >>
> >> http://fedoraproject.org/wiki/AlexLancaster
> >>
> >> Several issues have arisen during the packaging that I hope the
> >>
> >
> > Nice, i was on my way to do it :-)
> > I'm a Mandriva packager and have been kindly "spushed" for maintaining
> > the bioperl package for Mandriva.
> >
> > You can have a look at the work already done by Mandriva at the
> > addresses:
> > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
> > bioperl/current/
> > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
> > bioperl-run/current/
> >
> > (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
> >
> > Feel free to contact me if you need more input for dependencies, since
> > they are quite a lot.
> >
> > Cheers,
> > St?phane
> >
> > --
> > St?phane T?letch?a, PhD.                  http://www.steletch.org
> > Unit? Math?matique Informatique et G?nome http://
> > migale.jouy.inra.fr/mig
> > INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> > 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Tue Apr  3 14:11:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 3 Apr 2007 13:11:19 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
	<67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
	<5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>
Message-ID: <0802E2EB-5E94-42D2-9CE1-B82DC103A5D1@uiuc.edu>

I added a small piece on Biopackages to the wiki installation page:

http://www.bioperl.org/wiki/Installing_BioPerl

We can move links to RPM (or similar) installations to their own page  
or section in the INSTALL docs when we have time.

chris

On Apr 3, 2007, at 12:54 PM, Allen Day wrote:

> You can link Biopackages now, it's been done for nearly 2 years.
>
> -Allen
>
> On 4/3/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> Once these are set up we should add a page to the bioperl wiki to
>> describe them in more detail (along with Allen's Biopackages).
>>
>> chris
>>
>> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:
>>
>>> Alex Lancaster a ?crit :
>>>> Hello bioperl,
>>>>
>>>> I'm new to the bioperl world, having just started a research  
>>>> position
>>>> in which I need to manage a large bioperl-based codebase.  To this
>>>> end, I'm working on packaging bioperl as an official Fedora Package
>>>> (formerly "Fedora Extras") and I'm currently wading through and
>>>> packaging the long laundry list of Perl dependencies (then I'm  
>>>> going
>>>> to try and do the same for biopython).  You can see my some of my
>>>> progress (including links to the reviews) here:
>>>>
>>>> http://fedoraproject.org/wiki/AlexLancaster
>>>>
>>>> Several issues have arisen during the packaging that I hope the
>>>>
>>>
>>> Nice, i was on my way to do it :-)
>>> I'm a Mandriva packager and have been kindly "spushed" for  
>>> maintaining
>>> the bioperl package for Mandriva.
>>>
>>> You can have a look at the work already done by Mandriva at the
>>> addresses:
>>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
>>> bioperl/current/
>>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
>>> bioperl-run/current/
>>>
>>> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
>>>
>>> Feel free to contact me if you need more input for dependencies,  
>>> since
>>> they are quite a lot.
>>>
>>> Cheers,
>>> St?phane
>>>
>>> --
>>> St?phane T?letch?a, PhD.                  http://www.steletch.org
>>> Unit? Math?matique Informatique et G?nome http://
>>> migale.jouy.inra.fr/mig
>>> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
>>> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Tue Apr  3 18:18:56 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 03 Apr 2007 23:18:56 +0100
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>	<1175258897.2668.21.camel@localhost.localdomain>	<6d648ierkz.fsf@delpy.biol.berkeley.edu>	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
Message-ID: <4612D2D0.7030202@sendu.me.uk>

Chris Fields wrote:
> On Mar 30, 2007, at 11:02 PM, Allen Day wrote:
> 
>> The majority of the Bioperl classes are file parsers, or manipulate
>> data that comes from the file parsers.  Yes there are exceptions like
>> the Eutils and Ensembl-intefacing classes, but they are the minority.
>> The types of files that are worked with are generally either A)
>> primary data sets such as genome data, or B) derivative data, such as
>> sequence alignments that are derived from primary data using an
>> algorithm.
>>
>> If we're in agreement that the primary data sets and
>> libraries/applications for producing derivative data should not be
>> present in Fedora Extras, then it follows that the Bioperl classes for
>> manipulating these primary and derivative data  should also not be
>> present in Fedora Extras as they are of little use without data to
>> manipulate.
>
> I respectfully disagree.

Likewise, but in a slightly different way: for myself and surely many 
others the primary data used either isn't publicly released or isn't in 
some major database and therefore won't be in any kind of repository. 
That doesn't mean I wouldn't want the parser for my files to be 
somewhere convenient.


From bix at sendu.me.uk  Tue Apr  3 18:09:27 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 03 Apr 2007 23:09:27 +0100
Subject: [Bioperl-l] installation bioperl
In-Reply-To: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>
References: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>
Message-ID: <4612D097.9060400@sendu.me.uk>

> On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote:
> 
>> thank you for answer. I will give you the maximum of informations  
>> inorder to be able to diagnostic the problem:
>>
>> i use the linux mandriva 2006
>> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained  
>> from the url:
>> http://www.bioperl.org/wiki/Release_1.5.2
>> afetr that i made these commands which i found in the url
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph  
>> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ')
[snip]
>> i'm trying to install bioperl whithout having internet connection  
>> beacause i don't know whay linux didn't detect my ethernet card.
>> please tell me what should i do.
>> tahnk you for your collaboration.

David's suggestion was a good one, but quite a lot (and possibly all you 
need) of BioPerl is usable just with the bioperl-1.5.2_102.tar.gz file 
you already have.

Just follow the 'hard way' instructions:
http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_MODULES_THE_HARD_WAY

Actually, its not that hard. Just extract the files from the .tat.gz and 
  have your perl lib point at the resulting Bio directory.


From t.r-a_ckright1 at tiscali.co.uk  Wed Apr  4 08:00:12 2007
From: t.r-a_ckright1 at tiscali.co.uk (Michael Pain)
Date: Wed, 4 Apr 2007 13:00:12 +0100
Subject: [Bioperl-l]  Re: read it immediately
Message-ID: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>

I have received three dics but i can not access the files as no ID or pasword was included in the package,I have paid for all this! Can you sort it out.

Regards Michael Pain


From thiago.venancio at gmail.com  Wed Apr  4 14:14:04 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Wed, 4 Apr 2007 15:14:04 -0300
Subject: [Bioperl-l] read it immediately
In-Reply-To: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>
References: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>
Message-ID: <44255ea80704041114pc284522tef2d3a3944763b90@mail.gmail.com>

I think you emailed the wrong list...

On 4/4/07, Michael Pain <t.r-a_ckright1 at tiscali.co.uk> wrote:
>
> I have received three dics but i can not access the files as no ID or
> pasword was included in the package,I have paid for all this! Can you sort
> it out.
>
> Regards Michael Pain
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gdorjee at hotmail.com  Wed Apr  4 14:17:57 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 11:17:57 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
Message-ID: <9842643.post@talk.nabble.com>


hi all,
can anyone plz help me out with this problem that i've been dealing with for
quite a while now. following is a part of my script that's not working for
some reason. it is suppose to get the sequence from 'result/fasta.faa' and
do the blast.

###my script ###########
......
my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/database/nr',
                                                 _READMETHOD => 'Blast'
                                                  );
$factory->outfile("result/out.blast");
my $blastreport = $factory->blastall($queryin);
.....

when i paste the protein sequence into the textarea of my html page and save
the same as 'result/fasta.faa', so that the above script would do the blast,
i get the following error: 

Software error:
------------- EXCEPTION  -------------
MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
STACK Bio::Tools::Run::StandAloneBlast::blastpgp
/usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
--------------------------------------
i would appreciate your help.
i would also like to add that the 'result/fasta.faa' has the sequence saved
in it.

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9842643
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gowthaman.ramasamy at sbri.org  Wed Apr  4 14:57:09 2007
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Wed, 4 Apr 2007 11:57:09 -0700
Subject: [Bioperl-l] How to patch something in installed bioperl module
Message-ID: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>


Hi List,
I am advised to patch (comment out some lines and add some) GFF.pm bioperl module.
How do i go about it?.
I have the latest Bioperl 1.5.2 version installed....via CPAN

I find GFF.pm in the following location...
/root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm


Do i have to recompile it after editing........
I am completely clue less......I have not done this earlier.....
Can any one help me to do this.

Many thanks in advance........

gowthaman


From dmessina at wustl.edu  Wed Apr  4 15:42:43 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 4 Apr 2007 14:42:43 -0500
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>

The code snippet worked fine for me. I believe the problem is that  
'result/fasta.faa' is not getting passed to your code properly. You  
might try specifying a complete path to your input and output file --  
relative paths, especially through a web app, can be tricky.

> when i paste the protein sequence into the textarea of my html page  
> and save
> the same as 'result/fasta.faa', so that the above script would do  
> the blast,

I'm not sure from what you wrote -- did you try running your script  
on the command line (having created 'result/fasta.faa' manually  
first)? If that is working for you, then the problem is with getting  
the data from the webpage into the script, not with the blasting part.

Dave

This is what I did:

  % ls test.pl testp*
test.pl       testp.fa

% formatdb -i testp.fa

% ls test.pl testp*
test.pl       testp.fa      testp.fa.phr  testp.fa.pin  testp.fa.psq

% perl test.pl testp.fa
%  head -10 out.blast
BLASTP 2.2.10 [Oct-19-2004]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens]
          (235 letters)


Your code: I changed only the input filename and the input database  
name, and saved the script as test.pl
-----------------------
#!/usr/bin/perl

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' =>
'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
'blastp',
                                                  'database' =>
'testp.fa',
                                                  _READMETHOD => 'Blast'
                                                   );
$factory->outfile("out.blast");
my $blastreport = $factory->blastall($queryin);
------------------------------------------------------------------------ 
-----------


From gdorjee at hotmail.com  Wed Apr  4 17:44:27 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 14:44:27 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>
References: <9842643.post@talk.nabble.com>
	<35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>
Message-ID: <9846257.post@talk.nabble.com>


Thanks for your reply Dave. I don't think that there's anything wrong with
the open(OUTPUT,">result/fasta.faa"); line as I could get the 'fasta.faa'
file with the sequence in it. I see it. It looks like the blast is not being
able to read from the result/fasta.faa. 
^ ^* 


Dave Messina-2 wrote:
> 
> The code snippet worked fine for me. I believe the problem is that  
> 'result/fasta.faa' is not getting passed to your code properly. You  
> might try specifying a complete path to your input and output file --  
> relative paths, especially through a web app, can be tricky.
> 
>> when i paste the protein sequence into the textarea of my html page  
>> and save
>> the same as 'result/fasta.faa', so that the above script would do  
>> the blast,
> 
> I'm not sure from what you wrote -- did you try running your script  
> on the command line (having created 'result/fasta.faa' manually  
> first)? If that is working for you, then the problem is with getting  
> the data from the webpage into the script, not with the blasting part.
> 
> Dave
> 
> This is what I did:
> 
>   % ls test.pl testp*
> test.pl       testp.fa
> 
> % formatdb -i testp.fa
> 
> % ls test.pl testp*
> test.pl       testp.fa      testp.fa.phr  testp.fa.pin  testp.fa.psq
> 
> % perl test.pl testp.fa
> %  head -10 out.blast
> BLASTP 2.2.10 [Oct-19-2004]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens]
>           (235 letters)
> 
> 
> Your code: I changed only the input filename and the input database  
> name, and saved the script as test.pl
> -----------------------
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> use Bio::SeqIO;
> use Bio::Tools::Run::StandAloneBlast;
> 
> my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' =>
> 'Fasta');
> my $queryin = $Seq_in->next_seq();
> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
> 'blastp',
>                                                   'database' =>
> 'testp.fa',
>                                                   _READMETHOD => 'Blast'
>                                                    );
> $factory->outfile("out.blast");
> my $blastreport = $factory->blastall($queryin);
> ------------------------------------------------------------------------ 
> -----------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9846257
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 20:17:10 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 10:17:10 +1000
Subject: [Bioperl-l] How to patch something in installed bioperl module
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>
Message-ID: <a79f6a4b0704041717q160be28eu472d32d3cd704eba@mail.gmail.com>

> I am advised to patch (comment out some lines and add some) GFF.pm bioperl module.
> How do i go about it?.

First, make a backup of the original file.
Then just edit the original (add/remove lines).

> I have the latest Bioperl 1.5.2 version installed....via CPAN
> I find GFF.pm in the following location...
> /root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm

This is not where it is installed. That is where the CPAN program
uncompressed it to before installing. It is more likely in a directory
like this:
/usr/lib/perl5/site_perl/5.8.5/Bio/Tools/GFF.pm
But it depends on how your Perl setup arranges things!

> Do i have to recompile it after editing........

No.

--Torsten


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 20:22:37 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 10:22:37 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>

> Software error:
> ------------- EXCEPTION  -------------
> MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50

> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => 'Fasta');

Does this still happen if you give the full path to the FASTA file?
eg. -file => /usr/local/apache2/htdocs/result/fasta.faa
(I'm guessing what the full path is here)

--Torsten


From gilbertd at cricket.bio.indiana.edu  Wed Apr  4 20:59:23 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Wed, 4 Apr 2007 19:59:23 -0500 (EST)
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
Message-ID: <200704050059.l350xNF07452@cricket.bio.indiana.edu>


Dear Bioperl list,

There is a small bug in what I think is the current Bio::Tools::GFF.pm,
that blocks output of Target attributes (in gff3 at least).  See a patch
here

http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF

-- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 21:34:17 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 11:34:17 +1000
Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports
Message-ID: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>

Dear all,

I have been migrating all our BLAST infrastructure to use the XML
output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
before, and encountered some issues I hope you can help me with:

1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
Bio::Search::Result::GenericResult object. This means I can not use
the PSI-BLAST functions like iterations() and psiblast() provided by
Bio::Search::Result::BlastResult. I'm guessing this is because the the
XML output reports itself as a plain BLASTP output:
<BlastOutput_program>blastp</BlastOutput_program>

How do I determine if it is a PSI-BLAST report?

2. Usually a PSI-BLAST report has multiple Iterations. The XML output
has <Iteration> tags but it took me a while to figure out that these
get mapped to Bio::SearchIO::Result objects accessible via
Bio::SearchIO->next_result().

Is this the proper way to process the iterations?

3. I also notice that only the first result (iteration) has the
query_name set. Subsequent ones are empty:
RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
query=MyProtein , db=uniprot_sprot
RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
, db=uniprot_sprot

Is this a bug or expected?

I'm guessing a lot of these problems are simply due to limitations of
the NCBI BLAST XML DTD?

--Torsten


From gdorjee at hotmail.com  Wed Apr  4 20:59:08 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 17:59:08 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>
Message-ID: <9848412.post@talk.nabble.com>


hi Torsten,
Yes, it still gives me the same error even if I give the full path to the
fasta file. Following is how I did: 

####### part of my script #######
my $Seq_in = Bio::SeqIO->new (-file =>
'/export/home/local/apache2/htdocs/result/fasta.faa', -format => 'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/dorjee/database/nrpart',
                                                 _READMETHOD => 'Blast'
                                                   );
$factory->outfile("/export/home/local/apache2/htdocs/result/out.blast");
my $blastreport = $factory->blastall($queryin);
....

thanks man.


Torsten Seemann wrote:
> 
>> Software error:
>> ------------- EXCEPTION  -------------
>> MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
>> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
> 
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
>> 'Fasta');
> 
> Does this still happen if you give the full path to the FASTA file?
> eg. -file => /usr/local/apache2/htdocs/result/fasta.faa
> (I'm guessing what the full path is here)
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9848412
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From torsten.seemann at infotech.monash.edu.au  Wed Apr  4 22:57:09 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 12:57:09 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>

DeeGee,

Please add the following lines to help deduce the problem:

> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
> 'Fasta');

die "could not open fasta" if not defined $Seq_in;

> my $queryin = $Seq_in->next_seq();

die "could not get seq" if not defined $queryin;

Does anything happen now?

...

Some other comments:

> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
> STACK Bio::Tools::Run::StandAloneBlast::blastpgp

I'm not sure why it is in the blastpgp() method when you chose
$factory->blastall() ?

>                                                  _READMETHOD => 'Blast'

I don't think this is required anymore in modern Bioperl. Are you
using 1.5.x or bioperl-live ?

> when i paste the protein sequence into the textarea of my html page and
> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50

So this is a CGI script?
Does the script run as user 'apache' or 'httpd', or as yourself via SuEXEC?
Does 'apache' have permissions to READ/WRITE the result/ directory?

--Torsten


From cjfields at uiuc.edu  Thu Apr  5 00:14:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Apr 2007 23:14:46 -0500
Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports
In-Reply-To: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>
References: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>
Message-ID: <8EA4D933-9B99-485E-9CEA-AB39297F90B4@uiuc.edu>

On Apr 4, 2007, at 8:34 PM, Torsten Seemann wrote:

> Dear all,
>
> I have been migrating all our BLAST infrastructure to use the XML
> output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
> in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
> before, and encountered some issues I hope you can help me with:
>
> 1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
> Bio::Search::Result::GenericResult object. This means I can not use
> the PSI-BLAST functions like iterations() and psiblast() provided by
> Bio::Search::Result::BlastResult. I'm guessing this is because the the
> XML output reports itself as a plain BLASTP output:
> <BlastOutput_program>blastp</BlastOutput_program>
>
> How do I determine if it is a PSI-BLAST report?

I don't know if you can very easily, though I haven't tried myself.   
If I remember correctly there wasn't a substantial difference in the  
XML output between regular BLAST XML and PSI-BLAST XML.  We could add  
a parameter to the parser to treat the report as PSI-BLAST.

> 2. Usually a PSI-BLAST report has multiple Iterations. The XML output
> has <Iteration> tags but it took me a while to figure out that these
> get mapped to Bio::SearchIO::Result objects accessible via
> Bio::SearchIO->next_result().
>
> Is this the proper way to process the iterations?

The problem is in the way that NCBI now outputs multiple-query BLAST  
XML reports, which apparently changed sometime in the last year w/o  
notice.  This was also a problem with other Bio* parsers (I remember  
seeing something about it on the BioPython list).  Previously  
multiquery BLAST requests were output like single XML reports  
concatenated together, each with their own XML declaration, etc.  Now  
they are treated like iterations (query 1 = iteration 1, query 2 =  
iteration 2, etc) all in one long BLAST report.  There's an example  
of one in the SearchIO tests which I added to CVS in Jan-Feb,  
post-1.5.2.  The current parser handles both old and new cases.

The current behavior of the parser is to parse everything up front,  
building up the ResultI's and then returning them one-by-one upon  
next_result(), which is horrible on memory if you have tons of XML to  
wade through.  I will probably change that to carve the data up into  
report-sized chunks of XML and parse them piecemeal, but I haven't  
had time to work on it yet.

> 3. I also notice that only the first result (iteration) has the
> query_name set. Subsequent ones are empty:
> RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
> query=MyProtein , db=uniprot_sprot
> RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
> , db=uniprot_sprot
>
> Is this a bug or expected?

If you are using 1.5.2 then there is a bug related to that which was  
fixed in CVS a few months back (related to the multiquery issue  
above).  If it isn't let me know.

> I'm guessing a lot of these problems are simply due to limitations of
> the NCBI BLAST XML DTD?
>
> --Torsten

To tell the truth I'm not sure.  One would think they could add some  
designation to the report for PSI-BLAST!

chris


From cjfields at uiuc.edu  Thu Apr  5 13:40:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 12:40:41 -0500
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
Message-ID: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>

Roy Chaudhuri has raised an interesting question in a bug report  
filed regarding 'bless'-ing objects into another (similar) class.   
The bug report on this is here:

http://bugzilla.open-bio.org/show_bug.cgi?id=2262

The following code (from the bug report) illustrates the problem.   
Note some of this is taken from the Bio::Seq::Meta::Array POD, though  
the example sequence object is a LocatableSeq (PrimarySeqI) and not a  
SeqI:

use Bio::SeqIO;
use Bio::Seq::Meta::Array;
# $seq isa Bio::SeqI
my $seq=Bio::SeqIO->new(-fh=>\*ARGV, -format=>'genbank')->next_seq;
# $seq is still a Bio::SeqI
bless $seq, 'Bio::Seq::Meta::Array';
Bio::SeqIO->new(-format=>'genbank')->write_seq($seq);

This produces sequence output missing sequence data, a definition,  
and other odds and ends.  $seq is first a Bio::Seq::RichSeq and is  
blessed into a Bio::Seq::Meta::Array; both times $seq remains  
Bio::SeqI.  However, Bio::Seq::Meta::Array has an odd inheritance  
tree which also makes it a Bio::PrimarySeqI and a Bio::Seq::MetaI (ick):

use base qw(Bio::LocatableSeq Bio::Seq Bio::Seq::MetaI);

Bio::LocatableSeq has a seq() method inherited from Bio::PrimarySeq,  
for instance, so using $seq->seq() invokes Bio::PrimarySeq::seq()  
instead of Bio::Seq::seq().  No problem in most cases as long as  
PrimarySeqI is blessed into another PrimarySeqI, but if one blesses a  
Bio::SeqI into a Bio::Seq::Meta::Array (as in the example) then  
PrimarySeq::seq() expects a raw sequence and gets none (since the  
data is stored internally as a PrimarySeq in a different location)  
and no sequence is output.  This happens similarly for other stored  
object data.

I'm not sure why Bio::Seq::Meta::Array is set up this way.  Do we  
want to support using 'bless $obj, Class' with Bio::SeqI/PrimarySeqI,  
or should Bio::Seq::Meta::Array be changed so that it follows one  
interface or the other?

chris


From hlapp at gmx.net  Thu Apr  5 14:27:39 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 5 Apr 2007 14:27:39 -0400
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
In-Reply-To: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
Message-ID: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>


On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:

> Do we want to support using 'bless $obj, Class'

This smacks of over-clever programming and looks like a sure way to  
obfuscate what you're doing. I'm not sure why we need to support this  
construct.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Apr  5 14:44:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 13:44:38 -0500
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
In-Reply-To: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
	<421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
Message-ID: <F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>

I tend to agree on that front as it seems too prone to subtle issues  
with inheritance (as the bug demonstrates).

Related to that, do we want to have Bio::Seq::Meta::Array implement  
either PrimarySeqI or SeqI?  Having it implement both is definitely  
not working as expected.

chris

On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote:

>
> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:
>
>> Do we want to support using 'bless $obj, Class'
>
> This smacks of over-clever programming and looks like a sure way to  
> obfuscate what you're doing. I'm not sure why we need to support  
> this construct.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mkiwala at watson.wustl.edu  Thu Apr  5 15:11:22 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Thu, 05 Apr 2007 14:11:22 -0500
Subject: [Bioperl-l] Mixed bless-ings with
	Bio::Seq/Bio::PrimarySeq	(Bio::Seq::Meta::Array)
In-Reply-To: <F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>	<421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
	<F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>
Message-ID: <461549DA.90709@watson.wustl.edu>

My vote is for SeqI.

I was using the SeqWithQuality class and more recently switched over to 
Bio::Seq::Quality as we are upgrading from 1.4 to 1.5.2. The sequences 
I'm working with are destined for GenBank and have features and quality 
values. I've written a module (that I call GenBank::Tbl2Asn) that 
accepts a Bio::Seq::Quality with features and runs tbl2asn on it to 
produce a file that we send to GenBank. I don't know of any other class 
that suites my needs better than Bio::Seq::Quality inheriting from 
Bio::SeqI.


Chris Fields wrote:
> I tend to agree on that front as it seems too prone to subtle issues  
> with inheritance (as the bug demonstrates).
>
> Related to that, do we want to have Bio::Seq::Meta::Array implement  
> either PrimarySeqI or SeqI?  Having it implement both is definitely  
> not working as expected.
>
> chris
>
> On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote:
>
>   
>> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:
>>
>>     
>>> Do we want to support using 'bless $obj, Class'
>>>       
>> This smacks of over-clever programming and looks like a sure way to  
>> obfuscate what you're doing. I'm not sure why we need to support  
>> this construct.
>>
>> 	-hilmar
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From gdorjee at hotmail.com  Thu Apr  5 17:09:14 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Thu, 5 Apr 2007 14:09:14 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
Message-ID: <9864004.post@talk.nabble.com>


Thanks again, Torsten. I tried (die "could not get seq" if not defined
$queryin;) as you suggested, and now I get the following error message:

Software error:
could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.

Does this mean that next_seq() method in 'my $queryin =
$Seq_in->next_seq();' has some problem? How can I fix it? I would appreciate
your help.
Cheers!


Torsten Seemann wrote:
> 
> DeeGee,
> 
> Please add the following lines to help deduce the problem:
> 
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
>> 'Fasta');
> 
> die "could not open fasta" if not defined $Seq_in;
> 
>> my $queryin = $Seq_in->next_seq();
> 
> die "could not get seq" if not defined $queryin;
> 
> Does anything happen now?
> 
> ...
> 
> Some other comments:
> 
>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>
>> 'blastp',
>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
> 
> I'm not sure why it is in the blastpgp() method when you chose
> $factory->blastall() ?
> 
>>                                                  _READMETHOD => 'Blast'
> 
> I don't think this is required anymore in modern Bioperl. Are you
> using 1.5.x or bioperl-live ?
> 
>> when i paste the protein sequence into the textarea of my html page and
>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
> 
> So this is a CGI script?
> Does the script run as user 'apache' or 'httpd', or as yourself via
> SuEXEC?
> Does 'apache' have permissions to READ/WRITE the result/ directory?
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9864004
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Thu Apr  5 19:32:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 18:32:55 -0500
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9864004.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
Message-ID: <3ED7F1E9-FE21-4796-99AC-0CD0EA418563@uiuc.edu>


On Apr 5, 2007, at 4:09 PM, DeeGee wrote:

>
> Thanks again, Torsten. I tried (die "could not get seq" if not defined
> $queryin;) as you suggested, and now I get the following error  
> message:
>
> Software error:
> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
>
> Does this mean that next_seq() method in 'my $queryin =
> $Seq_in->next_seq();' has some problem? How can I fix it? I would  
> appreciate
> your help.
> Cheers!

This indicates there is likely some problem with your sequence file  
(either it isn't fasta or something else is wrong), but w/o actually  
seeing it we can't be sure.  I can't be sure but I don't think it is  
a next_seq() issue.  Also, if there are problems accessing the file  
the stream object should throw an error so I don't think it is that  
either...

chris

>
> Torsten Seemann wrote:
>>
>> DeeGee,
>>
>> Please add the following lines to help deduce the problem:
>>
>>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '- 
>>> format' =>
>>> 'Fasta');
>>
>> die "could not open fasta" if not defined $Seq_in;
>>
>>> my $queryin = $Seq_in->next_seq();
>>
>> die "could not get seq" if not defined $queryin;
>>
>> Does anything happen now?
>>
>> ...
>>
>> Some other comments:
>>
>>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>
>>> 'blastp',
>>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
>>
>> I'm not sure why it is in the blastpgp() method when you chose
>> $factory->blastall() ?
>>
>>>                                                  _READMETHOD =>  
>>> 'Blast'
>>
>> I don't think this is required anymore in modern Bioperl. Are you
>> using 1.5.x or bioperl-live ?
>>
>>> when i paste the protein sequence into the textarea of my html  
>>> page and
>>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
>>
>> So this is a CGI script?
>> Does the script run as user 'apache' or 'httpd', or as yourself via
>> SuEXEC?
>> Does 'apache' have permissions to READ/WRITE the result/ directory?
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9864004
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From torsten.seemann at infotech.monash.edu.au  Thu Apr  5 20:40:32 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 6 Apr 2007 10:40:32 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9864004.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
Message-ID: <a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>

Dorjee,

> thanks alot for your reply again. as per your suggestion (using 'die "could
> not get seq" if not defined $queryin;'), i now get the following error
> message:
> Software error:
> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
> i've attached the script. could you plz have a look at it and see where am i
> going wrong.
> cheers mate!

This strongly suggests that your FASTA file is not actually in FASTA format.
http://en.wikipedia.org/wiki/Fasta_format

Does it work if you pass it to blastall on the command line?
eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr

> Saier Lab.
> 858-534-2457

Are you working at UCSD?

--Torsten


From gdorjee at hotmail.com  Thu Apr  5 23:26:16 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Thu, 5 Apr 2007 20:26:16 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
Message-ID: <9867402.post@talk.nabble.com>


hi Torsten,  
blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works
perfectly fine on the command line, and the 'fasta.faa' is in fasta format:

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

it seems like i'm just one bloody step away from success. ^ ^* can't figure
out the prob. 
thanks for your help.


Torsten Seemann wrote:
> 
> Dorjee,
> 
>> thanks alot for your reply again. as per your suggestion (using 'die
>> "could
>> not get seq" if not defined $queryin;'), i now get the following error
>> message:
>> Software error:
>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
>> i've attached the script. could you plz have a look at it and see where
>> am i
>> going wrong.
>> cheers mate!
> 
> This strongly suggests that your FASTA file is not actually in FASTA
> format.
> http://en.wikipedia.org/wiki/Fasta_format
> 
> Does it work if you pass it to blastall on the command line?
> eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
> 
>> Saier Lab.
>> 858-534-2457
> 
> Are you working at UCSD?
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9867402
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From tuco at pasteur.fr  Fri Apr  6 09:33:08 2007
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Fri, 06 Apr 2007 15:33:08 +0200
Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior
Message-ID: <46164C14.8040701@pasteur.fr>

Hi folks,

I have a strange behavior from Bio::SeqIO::embl.
When I read an EMBL file as an input and write to another one, the tags
in the output file (EMBL format) are not in the same order as the original
file.
Is it a normal and expecting result ?

I anyone want to test it as a perl on line here is the code :

perl -MBio::SeqIO -e '$i = Bio::SeqIO->new(-file => "file.embl", -format 
=> "EMBL"); $o = Bio::SeqIO->new(-file => ">new.embl", -format => 
"EMBL"); while($e = $i->next_seq()){ $o->write_seq($e);  }'

I checked in the embl.pm code but was enable to find where this behavior 
came from.

If someone has the solution or any clue.

Thanks

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Softwares and data banks
Pasteur Insititue
tuco at_ pasteur dot fr	
-------------------------


From dmessina at wustl.edu  Fri Apr  6 11:09:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Fri, 6 Apr 2007 10:09:51 -0500
Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior
In-Reply-To: <46164C14.8040701@pasteur.fr>
References: <46164C14.8040701@pasteur.fr>
Message-ID: <7C67D287-DE2A-488A-8636-01EFF468368D@wustl.edu>

> Is it a normal and expecting result ?

Yes, unfortunately. Due to the complexity of the parsing, it is  
surprisingly difficult to "round-trip" some sequence file formats.

http://www.bioperl.org/wiki/HOWTO:SeqIO#Caveats


Dave


From jason at bioperl.org  Fri Apr  6 11:42:41 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 6 Apr 2007 08:42:41 -0700
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9867402.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
Message-ID: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>

When/How are are you writing your sequences to this file result.faa?   
are you using seqIO or bioperl to write the sequence  to a file?
I'm wondering if this is I/O buffering problem.

On Apr 5, 2007, at 8:26 PM, DeeGee wrote:

>
> hi Torsten,
> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr  
> works
> perfectly fine on the command line, and the 'fasta.faa' is in fasta  
> format:
>
>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV 
> SPSMTVASSQ
> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA 
> GTAPGAEGPA
> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF 
> RRKEHLRRHR
> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH 
> QRIHGRAAAS
> AQGAVAPGPDGGGPFPPWPLG
>
> it seems like i'm just one bloody step away from success. ^ ^*  
> can't figure
> out the prob.
> thanks for your help.
>
>
> Torsten Seemann wrote:
>>
>> Dorjee,
>>
>>> thanks alot for your reply again. as per your suggestion (using 'die
>>> "could
>>> not get seq" if not defined $queryin;'), i now get the following  
>>> error
>>> message:
>>> Software error:
>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl  
>>> line 50.
>>> i've attached the script. could you plz have a look at it and see  
>>> where
>>> am i
>>> going wrong.
>>> cheers mate!
>>
>> This strongly suggests that your FASTA file is not actually in FASTA
>> format.
>> http://en.wikipedia.org/wiki/Fasta_format
>>
>> Does it work if you pass it to blastall on the command line?
>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ 
>> database/nr
>>
>>> Saier Lab.
>>> 858-534-2457
>>
>> Are you working at UCSD?
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9867402
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0003.bin>

From bernd.web at gmail.com  Fri Apr  6 14:00:18 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 6 Apr 2007 20:00:18 +0200
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
Message-ID: <716af09c0704061100n1555915bw18050639d25cbf89@mail.gmail.com>

Hi Dorjee,

Do you now use complete file paths everywhere (instead of some
relative paths that were in your script).  Did you check all read and
execute permission (turn r, x on for group and others)? And regarding
the fasta file I'd suggest closing the filehandle after you printed
the fasta sequence to the file.

open(OUTPUT,">result/fasta.faa"); #don't use this relative path and
use the "die" as was suggested earlier.
.... your other code lines
print OUTPUT
"$desc\n$seqo\n";
close(OUTPUT); #close the file.

Also check if your complete script runs from the command-line as to be
sure your problems are not related to the webserver enviroment.


BTW I do think you do not want to parse your fasta file like you do:
if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;}
$fasta_file=~s/[\n\r]//g;
if ($fasta_file =~ /([A-Z]{10}.+)/){$seqo=$1;}

$seqo will contain the description as well, so your sequence starts
with the description.
BioPerl provides code for fasta file parsing too ;-) If you really
want to stick to your code you can catch the $desc and $seqo in one
RegExp, or replace this line:
if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;}
with
if ($fasta_file =~ s/^(\>.+)\s+//){$desc=$1;}


I hope you will get your script working now.

Regards,
Bernd

On 4/6/07, Jason Stajich <jason at bioperl.org> wrote:
> When/How are are you writing your sequences to this file result.faa?  are
> you using seqIO or bioperl to write the sequence  to a file?
> I'm wondering if this is I/O buffering problem.
>
>
>
> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>
>
> hi Torsten,
> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works
> perfectly fine on the command line, and the 'fasta.faa' is in fasta format:
>
>
> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
> AQGAVAPGPDGGGPFPPWPLG
>
> it seems like i'm just one bloody step away from success. ^ ^* can't figure
> out the prob.
> thanks for your help.
>
>
> Torsten Seemann wrote:
>
> Dorjee,
>
>
> thanks alot for your reply again. as per your suggestion (using 'die
> "could
> not get seq" if not defined $queryin;'), i now get the following error
> message:
> Software error:
> could not get seq at
> /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
> i've attached the script. could you plz have a look at it and see where
> am i
> going wrong.
> cheers mate!
>
> This strongly suggests that your FASTA file is not actually in FASTA
> format.
> http://en.wikipedia.org/wiki/Fasta_format
>
> Does it work if you pass it to blastall on the command line?
> eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>
>
> Saier Lab.
> 858-534-2457
>
> Are you working at UCSD?
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/blastall-problem-tf3527412.html#a9867402
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.htmlhttp://fungalgenomes.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From gdorjee at hotmail.com  Fri Apr  6 13:39:38 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Fri, 6 Apr 2007 10:39:38 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
Message-ID: <9875685.post@talk.nabble.com>


Following is the part of my script, which is in the 'htdocs' directory:

####### part of my script #############
#generate a new CGI object from the input to the CGI script
my $query=new CGI;

open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");

print STDOUT $query->header();
print STDOUT $query->start_html(-title=>"Response from blast",
-BGCOLOR=>"#FFFFFF");
print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";

#gets the sequence from the html textarea with ?post? method
my $fasta_file=$query->param('sequence');
print OUTPUT $fasta_file;

#Local blast of the input sequence against nr database
my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
'Fasta');
die "could not open fasta" if not defined $Seq_in;
my $queryin = $Seq_in->next_seq();
die "could not get seq" if not defined $queryin;
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/dorjee/database/nr',
                                                 _READMETHOD => 'Blast'
                                                   );
$factory->outfile("result/out.blast");
my $blastreport = $factory->blastall($queryin);
.....

Thank you.


Jason Stajich-3 wrote:
> 
> When/How are are you writing your sequences to this file result.faa?   
> are you using seqIO or bioperl to write the sequence  to a file?
> I'm wondering if this is I/O buffering problem.
> 
> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
> 
>>
>> hi Torsten,
>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr  
>> works
>> perfectly fine on the command line, and the 'fasta.faa' is in fasta  
>> format:
>>
>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV 
>> SPSMTVASSQ
>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA 
>> GTAPGAEGPA
>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF 
>> RRKEHLRRHR
>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH 
>> QRIHGRAAAS
>> AQGAVAPGPDGGGPFPPWPLG
>>
>> it seems like i'm just one bloody step away from success. ^ ^*  
>> can't figure
>> out the prob.
>> thanks for your help.
>>
>>
>> Torsten Seemann wrote:
>>>
>>> Dorjee,
>>>
>>>> thanks alot for your reply again. as per your suggestion (using 'die
>>>> "could
>>>> not get seq" if not defined $queryin;'), i now get the following  
>>>> error
>>>> message:
>>>> Software error:
>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl  
>>>> line 50.
>>>> i've attached the script. could you plz have a look at it and see  
>>>> where
>>>> am i
>>>> going wrong.
>>>> cheers mate!
>>>
>>> This strongly suggests that your FASTA file is not actually in FASTA
>>> format.
>>> http://en.wikipedia.org/wiki/Fasta_format
>>>
>>> Does it work if you pass it to blastall on the command line?
>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ 
>>> database/nr
>>>
>>>> Saier Lab.
>>>> 858-534-2457
>>>
>>> Are you working at UCSD?
>>>
>>> --Torsten
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/blastall- 
>> problem-tf3527412.html#a9867402
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
> 
> 
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9875685
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Fri Apr  6 14:40:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 6 Apr 2007 11:40:42 -0700
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9875685.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
	<9875685.post@talk.nabble.com>
Message-ID: <A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>

Looks like you need to deal with buffering:

http://perl.plover.com/FAQs/Buffering.html

So you need to add this:
close(OUTPUT);

Alternatively you can build a sequence object and pass that in to the  
BLAST factory, then you don't have to mess around with creating  
temporary files or run into this sort of problem.

-jason
On Apr 6, 2007, at 10:39 AM, DeeGee wrote:

>
> Following is the part of my script, which is in the 'htdocs'  
> directory:
>
> ####### part of my script #############
> #generate a new CGI object from the input to the CGI script
> my $query=new CGI;
>
> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");
>
> print STDOUT $query->header();
> print STDOUT $query->start_html(-title=>"Response from blast",
> -BGCOLOR=>"#FFFFFF");
> print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";
>
> #gets the sequence from the html textarea with ?post? method
> my $fasta_file=$query->param('sequence');
> print OUTPUT $fasta_file;
>
close(OUTPUT);
> #Local blast of the input sequence against nr database
> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
> 'Fasta');
> die "could not open fasta" if not defined $Seq_in;
> my $queryin = $Seq_in->next_seq();
> die "could not get seq" if not defined $queryin;
> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
> 'blastp',
>                                                  'database' =>
> '/export/home/dorjee/database/nr',
>                                                  _READMETHOD =>  
> 'Blast'
>                                                    );
> $factory->outfile("result/out.blast");
> my $blastreport = $factory->blastall($queryin);
> .....
>
> Thank you.
>
>
>
> Jason Stajich-3 wrote:
>>
>> When/How are are you writing your sequences to this file result.faa?
>> are you using seqIO or bioperl to write the sequence  to a file?
>> I'm wondering if this is I/O buffering problem.
>>
>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>>
>>>
>>> hi Torsten,
>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>>> works
>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta
>>> format:
>>>
>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA 
>>> SV
>>> SPSMTVASSQ
>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP 
>>> LA
>>> GTAPGAEGPA
>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK 
>>> AF
>>> RRKEHLRRHR
>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL 
>>> RH
>>> QRIHGRAAAS
>>> AQGAVAPGPDGGGPFPPWPLG
>>>
>>> it seems like i'm just one bloody step away from success. ^ ^*
>>> can't figure
>>> out the prob.
>>> thanks for your help.
>>>
>>>
>>> Torsten Seemann wrote:
>>>>
>>>> Dorjee,
>>>>
>>>>> thanks alot for your reply again. as per your suggestion (using  
>>>>> 'die
>>>>> "could
>>>>> not get seq" if not defined $queryin;'), i now get the following
>>>>> error
>>>>> message:
>>>>> Software error:
>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl
>>>>> line 50.
>>>>> i've attached the script. could you plz have a look at it and see
>>>>> where
>>>>> am i
>>>>> going wrong.
>>>>> cheers mate!
>>>>
>>>> This strongly suggests that your FASTA file is not actually in  
>>>> FASTA
>>>> format.
>>>> http://en.wikipedia.org/wiki/Fasta_format
>>>>
>>>> Does it work if you pass it to blastall on the command line?
>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/
>>>> database/nr
>>>>
>>>>> Saier Lab.
>>>>> 858-534-2457
>>>>
>>>> Are you working at UCSD?
>>>>
>>>> --Torsten
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/blastall-
>>> problem-tf3527412.html#a9867402
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Miller Research Fellow
>> University of California, Berkeley
>> lab: 510.642.8441
>> http://pmb.berkeley.edu/~taylor/people/js.html
>> http://fungalgenomes.org/
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9875685
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment-0003.bin>

From MEC at stowers-institute.org  Fri Apr  6 16:34:37 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 6 Apr 2007 15:34:37 -0500
Subject: [Bioperl-l] Bio/DB/SeqFeature/Store/DBI/mysql.pm patched
Message-ID: <CED81D34E37D5043A1211565277A51E507E22BAF@exchkc02.stowers-institute.org>

Lincoln,

I just commited a patch to Bio/DB/SeqFeature/Store/DBI/mysql.pm which
avoids potential problem which, unless fixed, can generates warnings
that look like this:

prepare_cached(SELECT f.id,f.object
  FROM feature as f, typelist AS tl
  WHERE (   tl.id=f.typeid
   AND   (tl.tag LIKE ?)
)
  
) statement handle DBI::st=HASH(0x16f61c0) still Active at
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line
1427
DBD::mysql::st fetchrow_array failed: fetch() without execute() at
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line
1416.

... as well as other downstream abberent program behaviour.  

I encounterd what the DBI manpage suggests might happen: "The results
will certainly not be what you expect"

This can happen, for example, when you open an iterator using
Bio::DB::SeqFeature::Store->get_seq_stream, and then while iterating,
perform other queries against the store.  My understanding of the DBI
doc is that this should only occur if the 2nd iterator is for the same
sql statement identically parameterized as the 1st, but I have not
proven beyond a doubt that this is what Bio::DB::SeqFeature::Store is
doing the way I am using it.  Nonetheless, the patch fixes my pipeline.

Cheers,

Malcolm


From gdorjee at hotmail.com  Fri Apr  6 18:27:54 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Fri, 6 Apr 2007 15:27:54 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
	<9875685.post@talk.nabble.com>
	<A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>
Message-ID: <9879110.post@talk.nabble.com>


I added the line: 
close(OUTPUT);
and now following error comes up, where 'out.blast' is supposed to be the
blast result file, but it not being created. 

Software error:
------------- EXCEPTION  -------------
MSG: Could not open /export/home/dorjee/result/out.blast: No such file or
directory
STACK Bio::Root::IO::_initialize_io /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:167
STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:53

--------------------------------------


Jason Stajich-3 wrote:
> 
> Looks like you need to deal with buffering:
> 
> http://perl.plover.com/FAQs/Buffering.html
> 
> So you need to add this:
> close(OUTPUT);
> 
> Alternatively you can build a sequence object and pass that in to the  
> BLAST factory, then you don't have to mess around with creating  
> temporary files or run into this sort of problem.
> 
> -jason
> On Apr 6, 2007, at 10:39 AM, DeeGee wrote:
> 
>>
>> Following is the part of my script, which is in the 'htdocs'  
>> directory:
>>
>> ####### part of my script #############
>> #generate a new CGI object from the input to the CGI script
>> my $query=new CGI;
>>
>> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");
>>
>> print STDOUT $query->header();
>> print STDOUT $query->start_html(-title=>"Response from blast",
>> -BGCOLOR=>"#FFFFFF");
>> print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";
>>
>> #gets the sequence from the html textarea with ?post? method
>> my $fasta_file=$query->param('sequence');
>> print OUTPUT $fasta_file;
>>
> close(OUTPUT);
>> #Local blast of the input sequence against nr database
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
>> 'Fasta');
>> die "could not open fasta" if not defined $Seq_in;
>> my $queryin = $Seq_in->next_seq();
>> die "could not get seq" if not defined $queryin;
>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
>> 'blastp',
>>                                                  'database' =>
>> '/export/home/dorjee/database/nr',
>>                                                  _READMETHOD =>  
>> 'Blast'
>>                                                    );
>> $factory->outfile("result/out.blast");
>> my $blastreport = $factory->blastall($queryin);
>> .....
>>
>> Thank you.
>>
>>
>>
>> Jason Stajich-3 wrote:
>>>
>>> When/How are are you writing your sequences to this file result.faa?
>>> are you using seqIO or bioperl to write the sequence  to a file?
>>> I'm wondering if this is I/O buffering problem.
>>>
>>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>>>
>>>>
>>>> hi Torsten,
>>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>>>> works
>>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta
>>>> format:
>>>>
>>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA 
>>>> SV
>>>> SPSMTVASSQ
>>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP 
>>>> LA
>>>> GTAPGAEGPA
>>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK 
>>>> AF
>>>> RRKEHLRRHR
>>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL 
>>>> RH
>>>> QRIHGRAAAS
>>>> AQGAVAPGPDGGGPFPPWPLG
>>>>
>>>> it seems like i'm just one bloody step away from success. ^ ^*
>>>> can't figure
>>>> out the prob.
>>>> thanks for your help.
>>>>
>>>>
>>>> Torsten Seemann wrote:
>>>>>
>>>>> Dorjee,
>>>>>
>>>>>> thanks alot for your reply again. as per your suggestion (using  
>>>>>> 'die
>>>>>> "could
>>>>>> not get seq" if not defined $queryin;'), i now get the following
>>>>>> error
>>>>>> message:
>>>>>> Software error:
>>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl
>>>>>> line 50.
>>>>>> i've attached the script. could you plz have a look at it and see
>>>>>> where
>>>>>> am i
>>>>>> going wrong.
>>>>>> cheers mate!
>>>>>
>>>>> This strongly suggests that your FASTA file is not actually in  
>>>>> FASTA
>>>>> format.
>>>>> http://en.wikipedia.org/wiki/Fasta_format
>>>>>
>>>>> Does it work if you pass it to blastall on the command line?
>>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/
>>>>> database/nr
>>>>>
>>>>>> Saier Lab.
>>>>>> 858-534-2457
>>>>>
>>>>> Are you working at UCSD?
>>>>>
>>>>> --Torsten
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/blastall-
>>>> problem-tf3527412.html#a9867402
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> Miller Research Fellow
>>> University of California, Berkeley
>>> lab: 510.642.8441
>>> http://pmb.berkeley.edu/~taylor/people/js.html
>>> http://fungalgenomes.org/
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/blastall- 
>> problem-tf3527412.html#a9875685
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
> 
> 
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9879110
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gilbertd at cricket.bio.indiana.edu  Fri Apr  6 23:31:29 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Fri, 6 Apr 2007 22:31:29 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704070331.l373VTI22000@cricket.bio.indiana.edu>


Dear Bioperlers,

There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta
files have fixed line widths, but that isn't a requirement of Fasta
format. The documentation notes this package requirement, but I was
bitten by this, and I'd guess not many people check their data (esp.
if from someone else) to see it meets this requirement.

Simple tools can easily produce fasta with ragged line formatting:
e.g. genome assemblers that paste together contig fasta with spacers
to make assemblies.

It would be nice if B:D:Fasta would check and die when it can't handle
this ragged input.  Here is a suggested addition:

  package Bio::DB::Fasta;

=head1 DESCRIPTION
  
  Entries may have any line length up to 65,536 characters, and
  different line lengths are allowed in the same file.  However, within
  a sequence entry, all lines must be the same length except for the
  last.  
+ An error will be thrown if this is not the case.

=cut

  use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want 
  
  sub calculate_offsets {
  
     my ($offset,$id,$linelength,$type,$firstline,$count,$termination_length,%offsets);
  +  my ($l3_len,$l2_len,$l_len)=(0,0,0);
  
         $self->_check_linelength($linelength);
  +      ($l3_len,$l2_len,$l_len)=(0,0,0);
       } else {
  +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to check every line :(
  +      if(DIE_ON_MISSMATCHED_LINES &&
  +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
  +         my $fap= substr($_,0,20)."..";
  +         $self->throw("Each line of the fasta entry must be the same length except the last.
  +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
  +         }
  
         $linelength ||= length($_);
  
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


From hlapp at gmx.net  Sat Apr  7 12:42:13 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 12:42:13 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
References: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
Message-ID: <05D43C56-8B30-41C9-8C35-2CD77419DE7F@gmx.net>

Wouldn't it be easier (and more robust) to just reformat the file to  
meet the constant line width requirement? The code required to do  
that should be fewer lines than your addition below, I think.

For example, one could do a fast first-pass through the file simply  
checking that all sequence lines not followed by a description line  
or eof have the same length, stopping at the first line that fails  
the test. If unequal lengths, use Bio::SeqIO to read and write back  
out the fasta file, then continue as for well-formatted files.

	-hilmar

On Apr 6, 2007, at 11:31 PM, Don Gilbert wrote:

>
> Dear Bioperlers,
>
> There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta
> files have fixed line widths, but that isn't a requirement of Fasta
> format. The documentation notes this package requirement, but I was
> bitten by this, and I'd guess not many people check their data (esp.
> if from someone else) to see it meets this requirement.
>
> Simple tools can easily produce fasta with ragged line formatting:
> e.g. genome assemblers that paste together contig fasta with spacers
> to make assemblies.
>
> It would be nice if B:D:Fasta would check and die when it can't handle
> this ragged input.  Here is a suggested addition:
>
>   package Bio::DB::Fasta;
>
> =head1 DESCRIPTION
>
>   Entries may have any line length up to 65,536 characters, and
>   different line lengths are allowed in the same file.  However,  
> within
>   a sequence entry, all lines must be the same length except for the
>   last.
> + An error will be thrown if this is not the case.
>
> =cut
>
>   use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want
>
>   sub calculate_offsets {
>
>      my ($offset,$id,$linelength,$type,$firstline,$count, 
> $termination_length,%offsets);
>   +  my ($l3_len,$l2_len,$l_len)=(0,0,0);
>
>          $self->_check_linelength($linelength);
>   +      ($l3_len,$l2_len,$l_len)=(0,0,0);
>        } else {
>   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); #  
> need to check every line :(
>   +      if(DIE_ON_MISSMATCHED_LINES &&
>   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
>   +         my $fap= substr($_,0,20)."..";
>   +         $self->throw("Each line of the fasta entry must be the  
> same length except the last.
>   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
>   +         }
>
>          $linelength ||= length($_);
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr  7 17:13:24 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 17:13:24 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704071711.l37HBB823983@cricket.bio.indiana.edu>
References: <200704071711.l37HBB823983@cricket.bio.indiana.edu>
Message-ID: <8177CF47-558F-4891-97B5-69F327EF8A4A@gmx.net>

What I was suggesting was the indexer automatically does the  
reformatting, i.e., to have touch/change the input data if necessary  
(and obviously one would be able to turn this feature off when the  
correctness of the input formatting is known).

Are you suggesting that this automatic reformatting isn't possible?

	-hilmar

On Apr 7, 2007, at 1:11 PM, Don Gilbert wrote:

>
>
> Hilmar,
>
> I have added reformatting where appropriate (in code that installs the
> files for indexing by Bio::DB::Fasta).  What I'm suggesting is a patch
> to Bio::DB::Fasta to warn and die when the documented fixed width
> that Bio::DB::Fasta requires isn't met.  I.e., keep other folks from
> being bitten by this hard to identify requirement.  Then when they
> see that this indexer is failing on inappropriate inputs, they also  
> can reformat
> their Fasta to meet this requirement, and not continue to use the  
> software with
> bad results.  The operation of Bio::DB::Fasta is reading a sequence  
> stream
> and it doesn't touch/change the input data, so it would be hard to  
> patch it
> to re-format the input data.
>
> - Don
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr  7 21:00:51 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 21:00:51 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>
References: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>
Message-ID: <B8009E72-30C5-479B-B7B9-456E859B80CB@gmx.net>

Since you'd have to reformat it though, how would you do it then  
(presumably offline)?

	-hilmar

On Apr 7, 2007, at 8:06 PM, Don Gilbert wrote:

>
>
> Hilmar,
>
> Yes, basically automatic reformatting isn't possible. If you are
> indexing a large genome of fasta data, I'd not want a bioperl script
> to rewrite that data, or create a new version, automatically.
>
> - Don

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From gilbertd at cricket.bio.indiana.edu  Sat Apr  7 13:11:11 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Sat, 7 Apr 2007 12:11:11 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704071711.l37HBB823983@cricket.bio.indiana.edu>


Hilmar,

I have added reformatting where appropriate (in code that installs the 
files for indexing by Bio::DB::Fasta).  What I'm suggesting is a patch
to Bio::DB::Fasta to warn and die when the documented fixed width
that Bio::DB::Fasta requires isn't met.  I.e., keep other folks from
being bitten by this hard to identify requirement.  Then when they
see that this indexer is failing on inappropriate inputs, they also can reformat 
their Fasta to meet this requirement, and not continue to use the software with
bad results.  The operation of Bio::DB::Fasta is reading a sequence stream
and it doesn't touch/change the input data, so it would be hard to patch it
to re-format the input data.

- Don

-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


From gilbertd at cricket.bio.indiana.edu  Sat Apr  7 20:06:34 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Sat, 7 Apr 2007 19:06:34 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>


Hilmar,

Yes, basically automatic reformatting isn't possible. If you are
indexing a large genome of fasta data, I'd not want a bioperl script
to rewrite that data, or create a new version, automatically.

- Don


From gdorjee at hotmail.com  Mon Apr  9 00:18:39 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 8 Apr 2007 21:18:39 -0700 (PDT)
Subject: [Bioperl-l] parse blast report for the best evalue
Message-ID: <9898358.post@talk.nabble.com>


hi all, 
i'm trying to parse a blast report using Bio::SearchIO as follows, but since
this blast report is generated with many against many (database) fasta
sequences, there're many individual blast reports (one for each of the
sequence from the query file). i was wondering if there is a way to get only
the best hit (with best evalue) from each one of them.

##### part of my script ######
my $in = new Bio::SearchIO(-format => 'blast',  -file   => $blast_report);
while( my $result = $in->next_result ) {
        while( my $hit = $result->next_hit ) {
              ...........

thanks.


-- 
View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9898358
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From staffa at niehs.nih.gov  Mon Apr  9 11:43:19 2007
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Mon, 09 Apr 2007 11:43:19 -0400
Subject: [Bioperl-l] Retrieve mRNA from Genome
Message-ID: <C23FD757.3FAB%staffa@niehs.nih.gov>

I have been retrieving sub-sequence from Genbank genomic records by use of
Bio::SeqIO
and ->get_SeqFeatures, ->start ->end ,
but now I'm looking for a quick way to extract CDS or mRNA from
a multi-segmented annotation, e.g.
     mRNA          
join(72458..72791,84573..84613,93279..94419,94481..94656,
                     94719..94992,95056..95350,95438..95553,95614..96056)

Is there such a method?
Please point me to appropriate documentation.


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From Kevin.M.Brown at asu.edu  Mon Apr  9 12:19:19 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 9 Apr 2007 09:19:19 -0700
Subject: [Bioperl-l] Retrieve mRNA from Genome
In-Reply-To: <C23FD757.3FAB%staffa@niehs.nih.gov>
References: <C23FD757.3FAB%staffa@niehs.nih.gov>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCAED7@EX02.asurite.ad.asu.edu>

I believe that is what the spliced_seq method is for

$feat->spliced_seq    # the "joined" sequence, when there are
                      # multiple sub-locations

http://www.bioperl.org/wiki/Bptutorial.pl 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Staffa, Nick (NIH/NIEHS)
> Sent: Monday, April 09, 2007 8:43 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Retrieve mRNA from Genome
> 
> I have been retrieving sub-sequence from Genbank genomic 
> records by use of Bio::SeqIO and ->get_SeqFeatures, ->start 
> ->end , but now I'm looking for a quick way to extract CDS or 
> mRNA from a multi-segmented annotation, e.g.
>      mRNA          
> join(72458..72791,84573..84613,93279..94419,94481..94656,
>                      
> 94719..94992,95056..95350,95438..95553,95614..96056)
> 
> Is there such a method?
> Please point me to appropriate documentation.


From cjfields at uiuc.edu  Mon Apr  9 12:50:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 9 Apr 2007 11:50:05 -0500
Subject: [Bioperl-l] parse blast report for the best evalue
In-Reply-To: <9898358.post@talk.nabble.com>
References: <9898358.post@talk.nabble.com>
Message-ID: <C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>

You should probably use sort_hits() with a coderef that sorts by  
evalue to ensure that you retrieve the best evalue (significance()  
for hits) (see POD for Bio::Search::Result::ResultI).  You could then  
do something like:

my $hit;

unless ($result->no_hits_found) {
    # pass coderef to sort by evalue
    $result->sort_hits(\&sort_by_evalue);
    # retrieve first (best) hit
    $hit = $result->next_hit;
}

# do whatever you want with the best Hit

If you plan on retaining data from hits over a ton of different  
reports it may be best (memory-wise) to only retain the data you want  
for each hit instead of retaining the actual object.  For instance,  
if you only care about the description and evalue set up a simple  
data structure to house what you want by the query data instead of  
retaining all the extra stuff in the Hit object you don't need (all  
the HSP data, etc).

chris

On Apr 8, 2007, at 11:18 PM, DeeGee wrote:

>
> hi all,
> i'm trying to parse a blast report using Bio::SearchIO as follows,  
> but since
> this blast report is generated with many against many (database) fasta
> sequences, there're many individual blast reports (one for each of the
> sequence from the query file). i was wondering if there is a way to  
> get only
> the best hit (with best evalue) from each one of them.
>
> ##### part of my script ######
> my $in = new Bio::SearchIO(-format => 'blast',  -file   =>  
> $blast_report);
> while( my $result = $in->next_result ) {
>         while( my $hit = $result->next_hit ) {
>               ...........
>
> thanks.
>
>
> -- 
> View this message in context: http://www.nabble.com/parse-blast- 
> report-for-the-best-evalue-tf3545784.html#a9898358
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr  9 15:40:02 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 9 Apr 2007 12:40:02 -0700 (PDT)
Subject: [Bioperl-l] parse blast report for the best evalue
In-Reply-To: <C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>
References: <9898358.post@talk.nabble.com>
	<C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>
Message-ID: <9907757.post@talk.nabble.com>


thank you, Chris.
^ ^*

Chris Fields wrote:
> 
> You should probably use sort_hits() with a coderef that sorts by  
> evalue to ensure that you retrieve the best evalue (significance()  
> for hits) (see POD for Bio::Search::Result::ResultI).  You could then  
> do something like:
> 
> my $hit;
> 
> unless ($result->no_hits_found) {
>     # pass coderef to sort by evalue
>     $result->sort_hits(\&sort_by_evalue);
>     # retrieve first (best) hit
>     $hit = $result->next_hit;
> }
> 
> # do whatever you want with the best Hit
> 
> If you plan on retaining data from hits over a ton of different  
> reports it may be best (memory-wise) to only retain the data you want  
> for each hit instead of retaining the actual object.  For instance,  
> if you only care about the description and evalue set up a simple  
> data structure to house what you want by the query data instead of  
> retaining all the extra stuff in the Hit object you don't need (all  
> the HSP data, etc).
> 
> chris
> 
> On Apr 8, 2007, at 11:18 PM, DeeGee wrote:
> 
>>
>> hi all,
>> i'm trying to parse a blast report using Bio::SearchIO as follows,  
>> but since
>> this blast report is generated with many against many (database) fasta
>> sequences, there're many individual blast reports (one for each of the
>> sequence from the query file). i was wondering if there is a way to  
>> get only
>> the best hit (with best evalue) from each one of them.
>>
>> ##### part of my script ######
>> my $in = new Bio::SearchIO(-format => 'blast',  -file   =>  
>> $blast_report);
>> while( my $result = $in->next_result ) {
>>         while( my $hit = $result->next_hit ) {
>>               ...........
>>
>> thanks.
>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/parse-blast- 
>> report-for-the-best-evalue-tf3545784.html#a9898358
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9907757
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bosborne11 at verizon.net  Tue Apr 10 09:55:37 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 10 Apr 2007 09:55:37 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
Message-ID: <C2410F99.DA34%bosborne11@verizon.net>

OK, applied.


On 4/6/07 11:31 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu> wrote:

>   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to
> check every line :(
>   +      if(DIE_ON_MISSMATCHED_LINES &&
>   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
>   +         my $fap= substr($_,0,20)."..";
>   +         $self->throw("Each line of the fasta entry must be the same length
> except the last.
>   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
>   +         }


From MEC at stowers-institute.org  Tue Apr 10 12:21:45 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 10 Apr 2007 11:21:45 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option
Message-ID: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>

Lincoln,

In `perldoc Bio::DB::SeqFeature::Store` I read:

"Caching requires the Tie::Cacher module to be installed. If the module
is not installed, then caching will silently be disabled."

I am wondering about the design motivation for silently disabling
caching when Tie::Cacher is not installed.  Perhaps at least emitting a
warning when -cache is requested and Tie::Cacher is not present is a
good idea?

I am writing a code that depends upon caching (i.e. upon the equality of
in-memory objects).

Do you advise that I don't depend upon Tie::Cacher working?  I
understand that it will NOT work as hoped if the cache is too small for
my application.

Thanks,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
 

From cjfields at uiuc.edu  Tue Apr 10 12:31:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 11:31:43 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
Message-ID: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>

At the moment we do not have a comprehensive list up on the wiki.  I  
have been slowly working (alphabetically!) to switch them over, so  
any help would be appreciated.

I have CC'd this to the main mail list for anyone else interested.

chris

On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:

> Hey guys,
>
> I noticed there's an open task regarding moving testing code to use
> Test::More etc and that Chris and Nathan are already on to it. Is
> there any kind of wiki page that you keep track of which modules you
> are already working on? I am new to this and want to contribute,
> having a fair amount of unit testing from work, but don't want to step
> over other people's work and avoid duplication as well.
> Any pointers where i could get started would be much appreciated :-)
>
> Thanks,
> Spiros
>
> ps. apologies if this is not the correct list to post this, just
> seemed the most intuitive choice.
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From spiros at lokku.com  Tue Apr 10 12:34:49 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Tue, 10 Apr 2007 17:34:49 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
Message-ID: <bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>

Okay, awesome, thank you for the info. I'll get started and see how it goes!

Spiros

On 4/10/07, Chris Fields <cjfields at uiuc.edu> wrote:
> At the moment we do not have a comprehensive list up on the wiki.  I
> have been slowly working (alphabetically!) to switch them over, so
> any help would be appreciated.
>
> I have CC'd this to the main mail list for anyone else interested.
>
> chris
>
> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>
> > Hey guys,
> >
> > I noticed there's an open task regarding moving testing code to use
> > Test::More etc and that Chris and Nathan are already on to it. Is
> > there any kind of wiki page that you keep track of which modules you
> > are already working on? I am new to this and want to contribute,
> > having a fair amount of unit testing from work, but don't want to step
> > over other people's work and avoid duplication as well.
> > Any pointers where i could get started would be much appreciated :-)
> >
> > Thanks,
> > Spiros
> >
> > ps. apologies if this is not the correct list to post this, just
> > seemed the most intuitive choice.
> > _______________________________________________
> > Bioperl-guts-l mailing list
> > Bioperl-guts-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Tue Apr 10 12:34:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 11:34:12 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>
Message-ID: <0D396A53-9911-4304-88FE-CCD6884A2699@uiuc.edu>


On Apr 10, 2007, at 11:21 AM, Cook, Malcolm wrote:

> Lincoln,
>
> In `perldoc Bio::DB::SeqFeature::Store` I read:
>
> "Caching requires the Tie::Cacher module to be installed. If the  
> module
> is not installed, then caching will silently be disabled."
>
> I am wondering about the design motivation for silently disabling
> caching when Tie::Cacher is not installed.  Perhaps at least  
> emitting a
> warning when -cache is requested and Tie::Cacher is not present is a
> good idea?

...

Maybe this should be added to the optional BioPerl dependencies?   
It's not listed in Build.PL in CVS...

chris


From cjfields at uiuc.edu  Tue Apr 10 13:22:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 12:22:33 -0500
Subject: [Bioperl-l] ] moving tests to use Test::More
In-Reply-To: <bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>
Message-ID: <DFAA7C75-BC52-4027-9816-5970404D1558@uiuc.edu>

When moving tests over be particularly careful of 'ok' tests which  
should be 'is'; a few older tests have display messages which make  
things tricky.  Use 'isa_ok', 'use_ok', 'require_ok', 'like', etc.  
where appropriate.

Also, we are not supporting TODO blocks at this time due to the  
upgrade needed for Test::Harness (which isn't necessary for BioPerl  
functionality).  Just use a skip block with a message if you run into  
something, like this (from RNA_SearchIO.t):

SKIP: {
     skip('Working on meta string building; TODO', 3);
     is($hsp->meta, 'blahblahblah', "HSP meta");
     # two more tests...
}

Thanks for helping out!

chris

On Apr 10, 2007, at 11:34 AM, Spiros Denaxas wrote:

> Okay, awesome, thank you for the info. I'll get started and see how  
> it goes!
>
> Spiros
...


From gopu_36 at yahoo.com  Tue Apr 10 03:42:26 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT)
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome
Message-ID: <9915265.post@talk.nabble.com>


Hi,
I am one of the newbee venturingout bioperl for my research purposes. I have
a whole genome sequence of a pathogen. I am trying to break them into
non-overlapping 1000bps subsequences. For example if my whole genome
sequence is 400000 bps length, then I should be beak them into 4000
subsequences of each 1000 bps and they should be non-overlapping but at the
same time continous. To be precise, my first substring would be from 1 to
1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone
help me. 
I tried with the following code but it gives me only the first substring and
rest are not! I would appreciate very much if someone could help me!
.........
.
.
my $start =1;
my $finish =100;
my $inseq  = Bio::SeqIO->new(-file => "$in_file");
while( my $seq = $inseq->next_seq ) {
	
	my $cleseq = $seq->seq();
	
	$seqlength = $seq->length();
	if ($finish<$seqlength){	
	print "The length of the sequence is $seqlength\n";	
	my $ordseq = $cleseq->subseq($start,$finish);
          push(@seq_array,$ordseq);
          $start=+100;
          $finish=+100;
          $counter++;
          next;          	             
       } 
}
-- 
View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gopu_36 at yahoo.com  Tue Apr 10 03:42:26 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT)
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome
Message-ID: <9915265.post@talk.nabble.com>


Hi,
I am one of the newbee venturingout bioperl for my research purposes. I have
a whole genome sequence of a pathogen. I am trying to break them into
non-overlapping 1000bps subsequences. For example if my whole genome
sequence is 400000 bps length, then I should be beak them into 4000
subsequences of each 1000 bps and they should be non-overlapping but at the
same time continous. To be precise, my first substring would be from 1 to
1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone
help me. 
I tried with the following code but it gives me only the first substring and
rest are not! I would appreciate very much if someone could help me!
.........
.
.
my $start =1;
my $finish =100;
my $inseq  = Bio::SeqIO->new(-file => "$in_file");
while( my $seq = $inseq->next_seq ) {
	
	my $cleseq = $seq->seq();
	
	$seqlength = $seq->length();
	if ($finish<$seqlength){	
	print "The length of the sequence is $seqlength\n";	
	my $ordseq = $cleseq->subseq($start,$finish);
          push(@seq_array,$ordseq);
          $start=+100;
          $finish=+100;
          $counter++;
          next;          	             
       } 
}
-- 
View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bix at sendu.me.uk  Tue Apr 10 16:10:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 10 Apr 2007 21:10:35 +0100
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
 genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <461BEF3B.3080708@sendu.me.uk>

gopu_36 wrote:
> Hi,
> I am one of the newbee venturingout bioperl for my research purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences.
[snip]
> I tried with the following code but it gives me only the first substring and
> rest are not! I would appreciate very much if someone could help me!
[snip]
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	             
>        } 
> }

Unless I've misunderstood, there are a few problems here.

I'm guessing $in_file is a file containing the entire genome sequence as 
a single sequence. This means your while loop will only loop once. To do 
what you want you then need another loop that acts on the single $seq 
object you're going to get. You don't need $cleseq, and in fact your 
script ought to crash on the $cleseq->subseq line because $cleseq is a 
string which has no subseq() method. $seq->subseq is what you want.

I didn't look at the remaining code.


Hope that helps,
Sendu.


From cjfields at uiuc.edu  Tue Apr 10 16:22:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 15:22:15 -0500
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
	genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <88E9CC63-48FD-444B-877D-12BB1D944214@uiuc.edu>

There is a script in the BioPerl scripts directory which does this,  
with optional overlaps (split_seq.PLS).  It's in /scripts/seq.

chris

On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:

>
> Hi,
> I am one of the newbee venturingout bioperl for my research  
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping  
> but at the
> same time continous. To be precise, my first substring would be  
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could  
> anyone
> help me.
> I tried with the following code but it gives me only the first  
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	
>        }
> }
> -- 
> View this message in context: http://www.nabble.com/extract- 
> nonoverlapping-subsequences-from-a-whole-genome- 
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Apr 10 16:57:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 15:57:20 -0500
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
	genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <18529D36-C772-474A-9CE6-A29FA0C59ABA@uiuc.edu>

Okay, I was bored!  This is a little shorter than that script:

my $seqin = Bio::SeqIO->new(-format => 'fasta',
                             -file => shift);

my $seqout = Bio::SeqIO->new(-format => 'fasta',
                             -file => '>split.fas');

while( my $seq = $seqin->next_seq ) {
     my $seqlength = $seq->length();
     print STDERR "Length is $seqlength\n";
     my $start = 1;
     my $end = 100;
     my $desc = $seq->description;
     CHUNK:
     while ($end <= $seqlength){
         my $ordseq = $seq->trunc($start,$end);
         $ordseq->description("$start-$end $desc");
         $seqout->write_seq($ordseq);
         last CHUNK if $end >= $seqlength;
         $start += 100;
         $end = ($end + 100 > $seqlength) ? $seqlength : $end + 100;
     }
}

chris

On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:

>
> Hi,
> I am one of the newbee venturingout bioperl for my research  
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping  
> but at the
> same time continous. To be precise, my first substring would be  
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could  
> anyone
> help me.
> I tried with the following code but it gives me only the first  
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	
>        }
> }
> -- 
> View this message in context: http://www.nabble.com/extract- 
> nonoverlapping-subsequences-from-a-whole-genome- 
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Tue Apr 10 18:01:37 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 10 Apr 2007 18:01:37 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <C2410F99.DA34%bosborne11@verizon.net>
References: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
	<C2410F99.DA34%bosborne11@verizon.net>
Message-ID: <6dce9a0b0704101501y15b96e20w89c4b9ef4abc1b48@mail.gmail.com>

I'm happy I didn't catch this thread until just now, but my preferred course
of action was to do exactly what Brian did and accept the patch.

Lincoln

On 4/10/07, Brian Osborne <bosborne11 at verizon.net> wrote:
>
> OK, applied.
>
>
> On 4/6/07 11:31 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu>
> wrote:
>
> >   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need
> to
> > check every line :(
> >   +      if(DIE_ON_MISSMATCHED_LINES &&
> >   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
> >   +         my $fap= substr($_,0,20)."..";
> >   +         $self->throw("Each line of the fasta entry must be the same
> length
> > except the last.
> >   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
> >   +         }
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From heikki at sanbi.ac.za  Wed Apr 11 05:14:27 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 11 Apr 2007 11:14:27 +0200
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
Message-ID: <200704111114.27839.heikki@sanbi.ac.za>

What is going on here? Can anyone remember doing this?

	-Heikki 

Please can I ask what is the purpose of the line @pos = sort @pos; in
the select_noncont subroutine of SimpleAlign.pm.

 
In previous versions this line was not present and I could use the
function to reorder the alignment e.g in an alignment with 5 sequences I
could reorder it to put the second sequence last using
$aln->select_noncont(1,3,4,5,2). The sort function stops this, but even
if the idea is to sort numerically this dos not work since the sort
function as is will put 10 before 2, so that
->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .

 
Many thanks

 
Anthony


From cjfields at uiuc.edu  Wed Apr 11 08:33:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 07:33:42 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za>
References: <200704111114.27839.heikki@sanbi.ac.za>
Message-ID: <F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>

Don't know when this was added.  Maybe we should make the sorting  
optional?  In other words, pass an optional 'nosort' string as the  
first arg, defaulting to numerical sort.

Either way the sort needs to be changed by the looks of it.  I'll  
verify the bug and commit today.

chris

On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote:

> What is going on here? Can anyone remember doing this?
>
> 	-Heikki
>
> Please can I ask what is the purpose of the line @pos = sort @pos; in
> the select_noncont subroutine of SimpleAlign.pm.
>
>
>
> In previous versions this line was not present and I could use the
> function to reorder the alignment e.g in an alignment with 5  
> sequences I
> could reorder it to put the second sequence last using
> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but  
> even
> if the idea is to sort numerically this dos not work since the sort
> function as is will put 10 before 2, so that
> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>
>
>
> Many thanks
>
>
>
> Anthony
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lzlgboy at gmail.com  Wed Apr 11 08:48:30 2007
From: lzlgboy at gmail.com (kenzy ken)
Date: Wed, 11 Apr 2007 20:48:30 +0800
Subject: [Bioperl-l] How to Remove root node from a tree, ???
Message-ID: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>

Hi all:
    I write a script which used the Bio::Tree module. I want to remove some
nodes from the tree, so I used "$tree->remove_Node($node_object);method . It
works ok, but when I remove root node, problem happened. It seens that this
method can not remove root node, so ,if you guys have any idea about how to
remove the root ,it will be very appreciated.

-- 
??????
Chen,Kenian
===========================
School of Life Science, Sun Yat-Sen University
===========================
Xingang Xilu 135
Guangzhou, Guangdong 510275
P. R. China
===========================
Phone: (86) 20-84113677; (86) 20-34474683;
Fax: (86) 20-34022356
===========================
Email:lzlgboy at gmail.com;
chenkn at mail2.sysu.edu.cn


From cjfields at uiuc.edu  Wed Apr 11 09:13:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 08:13:40 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>
Message-ID: <9DE1A554-4F33-45D1-9043-732FEB86ECD5@uiuc.edu>

I confirmed this; it is now fixed in CVS.  I have also added the  
option to prevent sorting if needed:

$aln2 = $aln->select_noncont(6,7,8,9,10,1,2,3,4,5);

sorts numerically by default.

$aln2 = $aln->select_noncont('nosort',6,7,8,9,10,1,2,3,4,5);

prevents sorting.  I have added a few tests to SimpleAlign.t for  
these.  It doesn't change the default behavior so shouldn't break  
anything.

chris

On Apr 11, 2007, at 7:33 AM, Chris Fields wrote:

> Don't know when this was added.  Maybe we should make the sorting
> optional?  In other words, pass an optional 'nosort' string as the
> first arg, defaulting to numerical sort.
>
> Either way the sort needs to be changed by the looks of it.  I'll
> verify the bug and commit today.
>
> chris
>
> On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote:
>
>> What is going on here? Can anyone remember doing this?
>>
>> 	-Heikki
>>
>> Please can I ask what is the purpose of the line @pos = sort @pos; in
>> the select_noncont subroutine of SimpleAlign.pm.
>>
>>
>>
>> In previous versions this line was not present and I could use the
>> function to reorder the alignment e.g in an alignment with 5
>> sequences I
>> could reorder it to put the second sequence last using
>> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but
>> even
>> if the idea is to sort numerically this dos not work since the sort
>> function as is will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>
>>
>>
>> Many thanks
>>
>>
>>
>> Anthony
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 11 09:21:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 11 Apr 2007 14:21:25 +0100
Subject: [Bioperl-l] How to Remove root node from a tree, ???
In-Reply-To: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>
References: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>
Message-ID: <461CE0D5.9040001@sendu.me.uk>

kenzy ken wrote:
> Hi all:
>    I write a script which used the Bio::Tree module. I want to remove some
> nodes from the tree, so I used "$tree->remove_Node($node_object);method 
> . It
> works ok, but when I remove root node, problem happened. It seens that this
> method can not remove root node, so ,if you guys have any idea about how to
> remove the root ,it will be very appreciated.

You'll have to re-root the tree to some other node in the tree. See the 
reroot() method.

(I don't think Bio::Tree::Tree objects can be unrooted.)


From emeric.sevin at univ-rennes1.fr  Wed Apr 11 09:32:38 2007
From: emeric.sevin at univ-rennes1.fr (Emeric Sevin)
Date: Wed, 11 Apr 2007 15:32:38 +0200
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
Message-ID: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>

Hi everybody,

I'm sorry to bug, but either I missed something so obvious nobody 
bothered to answer, either I'm being a little boycotted here...
A little help would be very much appreciated

Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :

> Hello,
>
> I am new to this community, and apologize if this subject has been 
> posted before.
>
> I want to print out only selected results from a multiple 
> blast-alignments results file. Problem is, the algorithm used is 
> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the actual 
> writing task yields "unclean" warnings. Although an ouput is actually 
> written, the writer (Bio::SearchIO::Writer::TextResultWriter) seems to 
> be disturbed by the fact rpsblast DBs are not labeled with 
> "protein"/"nucleic"/"translated".
> Does anybody know of an easy fix to that bug, or of another way to 
> come around it?
>
> Thank you very much
>
> Emeric SEVIN
> Universit? de Rennes 1_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1110 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/9784f194/attachment-0003.bin>

From cjfields at uiuc.edu  Wed Apr 11 10:44:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 09:44:27 -0500
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
Message-ID: <D0E54B3C-A345-4A90-9571-25144622265D@uiuc.edu>

We could ignore this post... oh the irony!  ;>

It has nothing to do with ignoring you.  Read this:

http://en.wikipedia.org/wiki/Warnock's_Dilemma

Basically your question probably fell on deaf ears b/c no one has  
time to look into it and post a fix.  Realize that BioPerl is, for  
the large part, a volunteer effort and we all have $jobs to worry  
about.  If you want you are more than welcome to file a bug on this  
(if it isn't already filed), which is the best way to make sure  
something is done:

http://www.bioperl.org/wiki/Bugs
http://bugzilla.open-bio.org/

chris


On Apr 11, 2007, at 8:32 AM, Emeric Sevin wrote:

> Hi everybody,
>
> I'm sorry to bug, but either I missed something so obvious nobody  
> bothered to answer, either I'm being a little boycotted here...
> A little help would be very much appreciated
>
> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>
>> Hello,
>>
>> I am new to this community, and apologize if this subject has been  
>> posted before.
>>
>> I want to print out only selected results from a multiple blast- 
>> alignments results file. Problem is, the algorithm used is  
>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the  
>> actual writing task yields "unclean" warnings. Although an ouput  
>> is actually written, the writer  
>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by  
>> the fact rpsblast DBs are not labeled with  
>> "protein"/"nucleic"/"translated".
>> Does anybody know of an easy fix to that bug, or of another way to  
>> come around it?
>>
>> Thank you very much
>>
>> Emeric SEVIN
>> Universit? de Rennes 1_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Wed Apr 11 10:30:11 2007
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Wed, 11 Apr 2007 15:30:11 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
Message-ID: <461CF0F3.1010708@sheffield.ac.uk>

It should be easy enough to find those t/*.t files that have "use Test;" 
or "require Test;" This should provide a list of files still needing to 
be converted over to Test::More. As discussed previously, it may also be 
useful to use Test::Exception to test for situations where 
exceptions/warnings are thrown. If you add additional tests using this 
module, you should add the Test::Exception module to t/lib/

Good luck, and feel free to mail the list with questions/comments etc.

Nath


Chris Fields wrote:
> At the moment we do not have a comprehensive list up on the wiki.  I  
> have been slowly working (alphabetically!) to switch them over, so  
> any help would be appreciated.
>
> I have CC'd this to the main mail list for anyone else interested.
>
> chris
>
> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>
>   
>> Hey guys,
>>
>> I noticed there's an open task regarding moving testing code to use
>> Test::More etc and that Chris and Nathan are already on to it. Is
>> there any kind of wiki page that you keep track of which modules you
>> are already working on? I am new to this and want to contribute,
>> having a fair amount of unit testing from work, but don't want to step
>> over other people's work and avoid duplication as well.
>> Any pointers where i could get started would be much appreciated :-)
>>
>> Thanks,
>> Spiros
>>
>> ps. apologies if this is not the correct list to post this, just
>> seemed the most intuitive choice.
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From spiros at lokku.com  Wed Apr 11 10:56:22 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Wed, 11 Apr 2007 15:56:22 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <461CF0F3.1010708@sheffield.ac.uk>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
Message-ID: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>

Yep! I have some rough stats I have at home, I will post them later on
tonight. Roughly, if i remember correctly, 50% of the tests were still
using Test, all the others were using Test::More.

More to follow later on,
Spiros

On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> It should be easy enough to find those t/*.t files that have "use Test;"
> or "require Test;" This should provide a list of files still needing to
> be converted over to Test::More. As discussed previously, it may also be
> useful to use Test::Exception to test for situations where
> exceptions/warnings are thrown. If you add additional tests using this
> module, you should add the Test::Exception module to t/lib/
>
> Good luck, and feel free to mail the list with questions/comments etc.
>
> Nath
>
>
> Chris Fields wrote:
> > At the moment we do not have a comprehensive list up on the wiki.  I
> > have been slowly working (alphabetically!) to switch them over, so
> > any help would be appreciated.
> >
> > I have CC'd this to the main mail list for anyone else interested.
> >
> > chris
> >
> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> >
> >
> >> Hey guys,
> >>
> >> I noticed there's an open task regarding moving testing code to use
> >> Test::More etc and that Chris and Nathan are already on to it. Is
> >> there any kind of wiki page that you keep track of which modules you
> >> are already working on? I am new to this and want to contribute,
> >> having a fair amount of unit testing from work, but don't want to step
> >> over other people's work and avoid duplication as well.
> >> Any pointers where i could get started would be much appreciated :-)
> >>
> >> Thanks,
> >> Spiros
> >>
> >> ps. apologies if this is not the correct list to post this, just
> >> seemed the most intuitive choice.
> >> _______________________________________________
> >> Bioperl-guts-l mailing list
> >> Bioperl-guts-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> >>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>


From Kevin.M.Brown at asu.edu  Wed Apr 11 11:14:07 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 11 Apr 2007 08:14:07 -0700
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za>
References: <200704111114.27839.heikki@sanbi.ac.za>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>

> What is going on here? Can anyone remember doing this?
> 
> 	-Heikki 
> 
> Please can I ask what is the purpose of the line @pos = sort 
> @pos; in the select_noncont subroutine of SimpleAlign.pm.
> 
>  
> 
> In previous versions this line was not present and I could 
> use the function to reorder the alignment e.g in an alignment 
> with 5 sequences I could reorder it to put the second 
> sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
> function stops this, but even if the idea is to sort 
> numerically this dos not work since the sort function as is 
> will put 10 before 2, so that
> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .

Not sure why 10 would come before 2 since perl would interpret that list
as a series of integers even if they were entered as strings and do the
sort.


From spiros at lokku.com  Wed Apr 11 11:51:27 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Wed, 11 Apr 2007 16:51:27 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <bba689ec0704110851qb1aa272m5db4e01356f28e92@mail.gmail.com>

This looks like the case of cmp vs <=> I think !

my @array = (1,10,2,3,4,5,6,7,8,9) ;
print join(",", @array), "\n";
my @sorted1 = sort(@array) ;
print join(",", @sorted1), "\n";
my @sorted2 = (sort { $a <=> $b } @array);
print join(",", @sorted2), "\n";

idaru:/tmp spiros$ perl koko.pl
1,10,2,3,4,5,6,7,8,9 # normal array
1,10,2,3,4,5,6,7,8,9 # sorted with sort
1,2,3,4,5,6,7,8,9,10 # sorted with <=>

Spiros


On 4/11/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> > What is going on here? Can anyone remember doing this?
> >
> >       -Heikki
> >
> > Please can I ask what is the purpose of the line @pos = sort
> > @pos; in the select_noncont subroutine of SimpleAlign.pm.
> >
> >
> >
> > In previous versions this line was not present and I could
> > use the function to reorder the alignment e.g in an alignment
> > with 5 sequences I could reorder it to put the second
> > sequence last using $aln->select_noncont(1,3,4,5,2). The sort
> > function stops this, but even if the idea is to sort
> > numerically this dos not work since the sort function as is
> > will put 10 before 2, so that
> > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ak at ebi.ac.uk  Wed Apr 11 11:58:52 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Wed, 11 Apr 2007 16:58:52 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <20070411155852.GC24537@ebi.ac.uk>

On Wed, Apr 11, 2007 at 08:14:07AM -0700, Kevin Brown wrote:
> > What is going on here? Can anyone remember doing this?
> > 
> > 	-Heikki 
> > 
> > Please can I ask what is the purpose of the line @pos = sort 
> > @pos; in the select_noncont subroutine of SimpleAlign.pm.
> > 
> >  
> > 
> > In previous versions this line was not present and I could 
> > use the function to reorder the alignment e.g in an alignment 
> > with 5 sequences I could reorder it to put the second 
> > sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
> > function stops this, but even if the idea is to sort 
> > numerically this dos not work since the sort function as is 
> > will put 10 before 2, so that
> > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> 
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.

Really?

$ perl -e 'print join(" ", sort(1..20)), "\n"';
1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9


-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
-------------------*=<>=*-------------------


From mkiwala at watson.wustl.edu  Wed Apr 11 11:51:35 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Wed, 11 Apr 2007 10:51:35 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <461D0407.8050105@watson.wustl.edu>

Kevin Brown wrote:
>> What is going on here? Can anyone remember doing this?
>>
>> 	-Heikki 
>>
>> Please can I ask what is the purpose of the line @pos = sort 
>> @pos; in the select_noncont subroutine of SimpleAlign.pm.
>>
>>  
>>
>> In previous versions this line was not present and I could 
>> use the function to reorder the alignment e.g in an alignment 
>> with 5 sequences I could reorder it to put the second 
>> sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
>> function stops this, but even if the idea is to sort 
>> numerically this dos not work since the sort function as is 
>> will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>     
>
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.
>
>   
Because, according to the documentation for Perl's sort function, 
sorting occurs "in standard string comparison order" unless the user 
specifies another comparison function to use.


From cjfields at uiuc.edu  Wed Apr 11 12:45:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 11:45:11 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
Message-ID: <FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>

We should probably place something on the wiki to prevent overlaps  
(i.e. make sure no two devs are working on the same tests).  I  
planned on working on the G's last night but got bogged down.

Spiros, if you haven't already go ahead and create a list on a wiki  
page for tracking.  We can lay claim to them by tagging with our sigs  
and cross them off once complete.

chris

On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:

> Yep! I have some rough stats I have at home, I will post them later on
> tonight. Roughly, if i remember correctly, 50% of the tests were still
> using Test, all the others were using Test::More.
>
> More to follow later on,
> Spiros
>
> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
>> It should be easy enough to find those t/*.t files that have "use  
>> Test;"
>> or "require Test;" This should provide a list of files still  
>> needing to
>> be converted over to Test::More. As discussed previously, it may  
>> also be
>> useful to use Test::Exception to test for situations where
>> exceptions/warnings are thrown. If you add additional tests using  
>> this
>> module, you should add the Test::Exception module to t/lib/
>>
>> Good luck, and feel free to mail the list with questions/comments  
>> etc.
>>
>> Nath
>>
>>
>> Chris Fields wrote:
>> > At the moment we do not have a comprehensive list up on the  
>> wiki.  I
>> > have been slowly working (alphabetically!) to switch them over, so
>> > any help would be appreciated.
>> >
>> > I have CC'd this to the main mail list for anyone else interested.
>> >
>> > chris
>> >
>> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>> >
>> >
>> >> Hey guys,
>> >>
>> >> I noticed there's an open task regarding moving testing code to  
>> use
>> >> Test::More etc and that Chris and Nathan are already on to it. Is
>> >> there any kind of wiki page that you keep track of which  
>> modules you
>> >> are already working on? I am new to this and want to contribute,
>> >> having a fair amount of unit testing from work, but don't want  
>> to step
>> >> over other people's work and avoid duplication as well.
>> >> Any pointers where i could get started would be much  
>> appreciated :-)
>> >>
>> >> Thanks,
>> >> Spiros
>> >>
>> >> ps. apologies if this is not the correct list to post this, just
>> >> seemed the most intuitive choice.
>> >> _______________________________________________
>> >> Bioperl-guts-l mailing list
>> >> Bioperl-guts-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>> >>
>> >
>> > Christopher Fields
>> > Postdoctoral Researcher
>> > Lab of Dr. Robert Switzer
>> > Dept of Biochemistry
>> > University of Illinois Urbana-Champaign
>> >
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 11 12:09:54 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 11 Apr 2007 17:09:54 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <461D0852.9070802@sendu.me.uk>

Kevin Brown wrote:
>>  but even if the idea is to sort
>> numerically this dos not work since the sort function as is 
>> will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> 
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.

The default sort for sort() is { $a cmp $b } (standard string comparison 
order): 10 comes before 2.

The fix was to explicitly say sort { $a <=> $b } for a numeric sort.


From cjfields at uiuc.edu  Wed Apr 11 12:46:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 11:46:46 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <461D0407.8050105@watson.wustl.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
	<461D0407.8050105@watson.wustl.edu>
Message-ID: <7001A1A4-5CF4-4C70-8EFA-94AF0D16864C@uiuc.edu>

I have confirmed the bug and fixed this in CVS.  Kevin's right; sort  
defaults to string comparison if no subroutine or sort block is  
specified.

perldoc -f sort:

sort SUBNAME LIST
sort BLOCK LIST
sort LIST
...
If SUBNAME or BLOCK is omitted, "sort"s in standard string com-
parison order.
...

chris

On Apr 11, 2007, at 10:51 AM, Michael Kiwala wrote:

> Kevin Brown wrote:
>>> What is going on here? Can anyone remember doing this?
>>>
>>> 	-Heikki
>>>
>>> Please can I ask what is the purpose of the line @pos = sort
>>> @pos; in the select_noncont subroutine of SimpleAlign.pm.
>>>
>>>
>>>
>>> In previous versions this line was not present and I could
>>> use the function to reorder the alignment e.g in an alignment
>>> with 5 sequences I could reorder it to put the second
>>> sequence last using $aln->select_noncont(1,3,4,5,2). The sort
>>> function stops this, but even if the idea is to sort
>>> numerically this dos not work since the sort function as is
>>> will put 10 before 2, so that
>>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the  
>>> sequences in
>>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>>
>>
>> Not sure why 10 would come before 2 since perl would interpret  
>> that list
>> as a series of integers even if they were entered as strings and  
>> do the
>> sort.
>>
>>
> Because, according to the documentation for Perl's sort function,
> sorting occurs "in standard string comparison order" unless the user
> specifies another comparison function to use.


From heikki at sanbi.ac.za  Wed Apr 11 12:39:57 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 11 Apr 2007 18:39:57 +0200
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
Message-ID: <200704111839.58940.heikki@sanbi.ac.za>

A bit more than half is still using Test:

~/src/bioperl/core/t>  perl -lne 'print $1 if /use +(Test[^\sO;]*);/' *t | 
sort | uniq -c | sort -nr
    147 Test
     97 Test::More


Feel free to add scripts and functionality into core/maintenance directory of 
bioperl-live if you want to keep track of things in modules and tests.

	-Heikki


On Wednesday 11 April 2007 16:56:22 Spiros Denaxas wrote:
> Yep! I have some rough stats I have at home, I will post them later on
> tonight. Roughly, if i remember correctly, 50% of the tests were still
> using Test, all the others were using Test::More.
>
> More to follow later on,
> Spiros
>
> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> > It should be easy enough to find those t/*.t files that have "use Test;"
> > or "require Test;" This should provide a list of files still needing to
> > be converted over to Test::More. As discussed previously, it may also be
> > useful to use Test::Exception to test for situations where
> > exceptions/warnings are thrown. If you add additional tests using this
> > module, you should add the Test::Exception module to t/lib/
> >
> > Good luck, and feel free to mail the list with questions/comments etc.
> >
> > Nath
> >
> > Chris Fields wrote:
> > > At the moment we do not have a comprehensive list up on the wiki.  I
> > > have been slowly working (alphabetically!) to switch them over, so
> > > any help would be appreciated.
> > >
> > > I have CC'd this to the main mail list for anyone else interested.
> > >
> > > chris
> > >
> > > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> > >> Hey guys,
> > >>
> > >> I noticed there's an open task regarding moving testing code to use
> > >> Test::More etc and that Chris and Nathan are already on to it. Is
> > >> there any kind of wiki page that you keep track of which modules you
> > >> are already working on? I am new to this and want to contribute,
> > >> having a fair amount of unit testing from work, but don't want to step
> > >> over other people's work and avoid duplication as well.
> > >> Any pointers where i could get started would be much appreciated :-)
> > >>
> > >> Thanks,
> > >> Spiros
> > >>
> > >> ps. apologies if this is not the correct list to post this, just
> > >> seemed the most intuitive choice.
> > >> _______________________________________________
> > >> Bioperl-guts-l mailing list
> > >> Bioperl-guts-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From stewarta at nmrc.navy.mil  Wed Apr 11 14:40:18 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Wed, 11 Apr 2007 14:40:18 -0400
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
Message-ID: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>

First of all, mucho kudos to those who revamped this module.  It  
works really nice.  I have a couple thoughts..

* The .predict file from Glimmer provides frame and score information  
which could be parsed and included in the generated feature prediction

* It'd be nice to include the orfID somewhere on the feature  
prediction..  maybe the seqID ? (these could be post-processed into  
locus_tags for those using Glimmer as a preliminary annotation tool)

* Options to set the source and primary tags to something other than  
the default (ie) Glimmer3.X and 'transcript'.  This could always be  
done post-Bio::Tools::Glimmer, though, of course.

* This section..

         elsif (
                # Glimmer 2.X prediction
                (/^\s+(\d+)\s+      # gene num
                 (\d+)\s+(\d+)\s+   # start, end
                 \[([\+\-])\d{1}\s+ # strand
                 /ox ) ||
                # Glimmer 3.X prediction
                (/\w+(\d+)\s+       # orf (numeric portion)
                 (\d+)\s+(\d+)\s+   # start, end
                 ([\+\-])\d{1}\s+   # strand
                /ox)) {
	    my ($genenum,$start,$end,$strand) =
		( $1,$2,$3,$4 );

...isn't picking up more than the last digit in the orf-number.  Not  
sure if that's intentional.  A sample of the feature output using - 
 >gff_string shows up as ...

test-pseudocontig       Glimmer_3.X     transcript      1018     
8       .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      1134     
1736    .       +       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      1832     
2596    .       +       .       Group GenePrediction_4
test-pseudocontig       Glimmer_3.X     transcript      2710     
3225    .       +       .       Group GenePrediction_5
test-pseudocontig       Glimmer_3.X     transcript      3246     
4016    .       +       .       Group GenePrediction_6
test-pseudocontig       Glimmer_3.X     transcript      4177     
5064    .       +       .       Group GenePrediction_7
test-pseudocontig       Glimmer_3.X     transcript      5083     
5673    .       +       .       Group GenePrediction_8
test-pseudocontig       Glimmer_3.X     transcript      6001     
7275    .       +       .       Group GenePrediction_9
test-pseudocontig       Glimmer_3.X     transcript      7530     
8081    .       +       .       Group GenePrediction_0
test-pseudocontig       Glimmer_3.X     transcript      8785     
8117    .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      9423     
8788    .       -       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      10088    
9549    .       -       .       Group GenePrediction_3

...which was parsed originally from...

orf00001     1018        8  -2     2.95
orf00002     1134     1736  +3     2.91
orf00004     1832     2596  +2     2.93
orf00005     2710     3225  +1     2.90
orf00006     3246     4016  +3     2.93
orf00007     4177     5064  +1     2.94
orf00008     5083     5673  +1     2.91
orf00009     6001     7275  +1     2.96
orf00010     7530     8081  +3     2.58
orf00011     8785     8117  -2     2.92
orf00012     9423     8788  -1     2.81
orf00013    10088     9549  -3     2.90

* It'd also be nice if you could somehow set the string that is  
placed in front of the orf-number in the line...

                  '-tag'         => { 'Group' => "GenePrediction_ 
$genenum"},

...seeing as how these tag/values can't seem to be changed manually  
anymore without getting into AnnotationCollection stuff, which is no  
longer a simple matter of changing a tag/value string.  (By the way,  
where can I find a list of AnnotationCollectionI compliant objects?)


Any thoughts on the suggestions?  (I don't mind taking a stab at  
incorporating them into the code.. I've never submitted anything to  
BioPerl before)


-Andrew


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From cjfields at uiuc.edu  Wed Apr 11 15:53:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 14:53:54 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
Message-ID: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>

I'm posting this to the mail list in case anyone has any ideas on  
what is going on...

I have noticed an odd (read: annoying) rash of spam on the wiki.   
Jason also ran some spam reversions, so maybe he can chime in.   
Essentially it looks like the responsible spambots 'correct' the wiki  
text and links, so that '+' is being removed and URI-encoded symbols  
in links are reverted to symbols.  Unfortunately the removal occurs  
in all text, so places where '+' is intended (for instance, raw text  
for showing example record formats) are also changed.  My guess is  
we'll need to block the IP address or add to the blacklist if possible.

Between Jason and I we have blocked ~9 spambots and counting.   
Couldn't find anything via Google yet...

chris


From torsten.seemann at infotech.monash.edu.au  Wed Apr 11 20:33:02 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 12 Apr 2007 10:33:02 +1000
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
Message-ID: <a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>

Andrew,

>                 # Glimmer 3.X prediction
>                 (/\w+(\d+)\s+       # orf (numeric portion)
> ...isn't picking up more than the last digit in the orf-number.  Not
> sure if that's intentional.  A sample of the feature output using -
>  >gff_string shows up as ...

I think that regexp should be \w+?(\d+)

ie. the \w+ should be non-greedy, otherwise it will swallow up all but
one of the following \d+ (as \d is a subset of \w)

I've CC:ed this to Mark Johnson who made the recent changes to this module.

Thanks for your feedback,

--Torsten Seemann


From spiros at lokku.com  Wed Apr 11 21:08:47 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 02:08:47 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
Message-ID: <bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>

Good idea Chris. Just got back home so will probably do it tomorrow
morning or so.

Spiros

On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> We should probably place something on the wiki to prevent overlaps
> (i.e. make sure no two devs are working on the same tests).  I
> planned on working on the G's last night but got bogged down.
>
> Spiros, if you haven't already go ahead and create a list on a wiki
> page for tracking.  We can lay claim to them by tagging with our sigs
> and cross them off once complete.
>
> chris
>
> On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
>
> > Yep! I have some rough stats I have at home, I will post them later on
> > tonight. Roughly, if i remember correctly, 50% of the tests were still
> > using Test, all the others were using Test::More.
> >
> > More to follow later on,
> > Spiros
> >
> > On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> >> It should be easy enough to find those t/*.t files that have "use
> >> Test;"
> >> or "require Test;" This should provide a list of files still
> >> needing to
> >> be converted over to Test::More. As discussed previously, it may
> >> also be
> >> useful to use Test::Exception to test for situations where
> >> exceptions/warnings are thrown. If you add additional tests using
> >> this
> >> module, you should add the Test::Exception module to t/lib/
> >>
> >> Good luck, and feel free to mail the list with questions/comments
> >> etc.
> >>
> >> Nath
> >>
> >>
> >> Chris Fields wrote:
> >> > At the moment we do not have a comprehensive list up on the
> >> wiki.  I
> >> > have been slowly working (alphabetically!) to switch them over, so
> >> > any help would be appreciated.
> >> >
> >> > I have CC'd this to the main mail list for anyone else interested.
> >> >
> >> > chris
> >> >
> >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> >> >
> >> >
> >> >> Hey guys,
> >> >>
> >> >> I noticed there's an open task regarding moving testing code to
> >> use
> >> >> Test::More etc and that Chris and Nathan are already on to it. Is
> >> >> there any kind of wiki page that you keep track of which
> >> modules you
> >> >> are already working on? I am new to this and want to contribute,
> >> >> having a fair amount of unit testing from work, but don't want
> >> to step
> >> >> over other people's work and avoid duplication as well.
> >> >> Any pointers where i could get started would be much
> >> appreciated :-)
> >> >>
> >> >> Thanks,
> >> >> Spiros
> >> >>
> >> >> ps. apologies if this is not the correct list to post this, just
> >> >> seemed the most intuitive choice.
> >> >> _______________________________________________
> >> >> Bioperl-guts-l mailing list
> >> >> Bioperl-guts-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> >> >>
> >> >
> >> > Christopher Fields
> >> > Postdoctoral Researcher
> >> > Lab of Dr. Robert Switzer
> >> > Dept of Biochemistry
> >> > University of Illinois Urbana-Champaign
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From Kevin.M.Brown at asu.edu  Thu Apr 12 11:24:15 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 12 Apr 2007 08:24:15 -0700
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <461D0407.8050105@watson.wustl.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
	<461D0407.8050105@watson.wustl.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>

> >> What is going on here? Can anyone remember doing this?

> >> Please can I ask what is the purpose of the line @pos = 
> sort @pos; in 
> >> the select_noncont subroutine of SimpleAlign.pm.
> >>
> >>  
> >>
> >> In previous versions this line was not present and I could use the 
> >> function to reorder the alignment e.g in an alignment with 5 
> >> sequences I could reorder it to put the second sequence last using 
> >> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but 
> >> even if the idea is to sort numerically this dos not work 
> since the 
> >> sort function as is will put 10 before 2, so that
> >> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the 
> sequences in
> >> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> >>     
> >
> > Not sure why 10 would come before 2 since perl would interpret that 
> > list as a series of integers even if they were entered as 
> strings and 
> > do the sort.
> >
> >   
> Because, according to the documentation for Perl's sort 
> function, sorting occurs "in standard string comparison 
> order" unless the user specifies another comparison function to use.

OK, guess I never realized that since I've used just "sort @array" and
gotten things back how I expected them to be.


From bix at sendu.me.uk  Thu Apr 12 11:58:53 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 12 Apr 2007 16:58:53 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>	<461D0407.8050105@watson.wustl.edu>
	<1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>
Message-ID: <461E573D.1060906@sendu.me.uk>

Kevin Brown wrote:
>> Because, according to the documentation for Perl's sort 
>> function, sorting occurs "in standard string comparison 
>> order" unless the user specifies another comparison function to use.
> 
> OK, guess I never realized that since I've used just "sort @array" and
> gotten things back how I expected them to be.

If you were sorting numbers, getting the order wrong either didn't 
matter or you didn't notice the problem. Not realizing sort won't do 
what you expect in this case is a common source of bugs.

It might be worth it for you (and anyone else) to go through your old 
code to make sure you haven't been bitten.


From johnsonm at gmail.com  Thu Apr 12 13:26:33 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 12:26:33 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
Message-ID: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>

    I'd call that a buggy regexp.  Sounds like a good (but minimal)
fix.  Torsten, I don't have cvs write access, I think you do, can you
fix that up?  Andrew, can you file that as a bug:

http://bugzilla.bioperl.org/

    Everything else sounds like enhancements.  I'm not necessarily
opposed, but a little discussion is probably in order before putting
any tickets in for any of that.  Also, I'm not sure when I'll be able
to spare some time to work on the module.  It was easy to justify
spending time from my day job getting the module up to where is now,
as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
quite well for my purposes.  Again, I'm not opposed to further
enhancements, but If I'm going to work on any of them, they'll have to
fit into everything else I'm doing and it could be a while.  However,
there's no reason somebody else can't do what I did.  Discuss the
changes here, work out a plan, implement it, send along the diff(s)
attached to a bug in bugzilla.  Next thing you know, your changes are
in cvs.  8)

On 4/11/07, Torsten Seemann <torsten.seemann at infotech.monash.edu.au> wrote:
> Andrew,
>
> >                 # Glimmer 3.X prediction
> >                 (/\w+(\d+)\s+       # orf (numeric portion)
> > ...isn't picking up more than the last digit in the orf-number.  Not
> > sure if that's intentional.  A sample of the feature output using -
> >  >gff_string shows up as ...
>
> I think that regexp should be \w+?(\d+)
>
> ie. the \w+ should be non-greedy, otherwise it will swallow up all but
> one of the following \d+ (as \d is a subset of \w)
>
> I've CC:ed this to Mark Johnson who made the recent changes to this module.
>
> Thanks for your feedback,
>
> --Torsten Seemann


From cjfields at uiuc.edu  Thu Apr 12 14:11:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 13:11:33 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
Message-ID: <7314C1CD-8AD5-4400-A495-6C8124833D0D@uiuc.edu>

Agreed; anyone can suggest code enhancements and bug fixes and submit  
patches for these:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

You'll see a long list of unimplemented enhancement requests in  
Bugzilla.  These are the ones where no patch is given; you'll find  
that very few are willing to go through the effort to work on them  
unless there is something in it for them!  Enhancement requests that  
come with patches and tests tend to get committed fairly rapidly  
(sometimes within hours).

chris

On Apr 12, 2007, at 12:26 PM, Mark Johnson wrote:

>     I'd call that a buggy regexp.  Sounds like a good (but minimal)
> fix.  Torsten, I don't have cvs write access, I think you do, can you
> fix that up?  Andrew, can you file that as a bug:
>
> http://bugzilla.bioperl.org/
>
>     Everything else sounds like enhancements.  I'm not necessarily
> opposed, but a little discussion is probably in order before putting
> any tickets in for any of that.  Also, I'm not sure when I'll be able
> to spare some time to work on the module.  It was easy to justify
> spending time from my day job getting the module up to where is now,
> as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
> quite well for my purposes.  Again, I'm not opposed to further
> enhancements, but If I'm going to work on any of them, they'll have to
> fit into everything else I'm doing and it could be a while.  However,
> there's no reason somebody else can't do what I did.  Discuss the
> changes here, work out a plan, implement it, send along the diff(s)
> attached to a bug in bugzilla.  Next thing you know, your changes are
> in cvs.  8)
>
> On 4/11/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>> Andrew,
>>
>>>                 # Glimmer 3.X prediction
>>>                 (/\w+(\d+)\s+       # orf (numeric portion)
>>> ...isn't picking up more than the last digit in the orf-number.  Not
>>> sure if that's intentional.  A sample of the feature output using -
>>>> gff_string shows up as ...
>>
>> I think that regexp should be \w+?(\d+)
>>
>> ie. the \w+ should be non-greedy, otherwise it will swallow up all  
>> but
>> one of the following \d+ (as \d is a subset of \w)
>>
>> I've CC:ed this to Mark Johnson who made the recent changes to  
>> this module.
>>
>> Thanks for your feedback,
>>
>> --Torsten Seemann


From stewarta at nmrc.navy.mil  Thu Apr 12 14:35:00 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 12 Apr 2007 14:35:00 -0400
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
Message-ID: <DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>

I'm willing to do the coding and testing, I'm just not familiar with  
the submission process yet (I see there's a HOWTO now, nice).   Let's  
discuss first.

So to reiterate, I'm suggesting that the module also parse out the  
frame and score information from Glimmer output.  I take back my  
suggestion of overriding the source / primary tags through the module  
as this can easily be done post-parser.  Other annotations can also  
be edited post-parser easily enough.

Reasons for:  Parsing everything out of the output and letting the  
user determine what's useful or not.

Reasons against:  Extra information may not be relevant to the format  
of the generated feature type?


-Andrew


On Apr 12, 2007, at 1:26 PM, Mark Johnson wrote:

>    I'd call that a buggy regexp.  Sounds like a good (but minimal)
> fix.  Torsten, I don't have cvs write access, I think you do, can you
> fix that up?  Andrew, can you file that as a bug:
>
> http://bugzilla.bioperl.org/
>
>    Everything else sounds like enhancements.  I'm not necessarily
> opposed, but a little discussion is probably in order before putting
> any tickets in for any of that.  Also, I'm not sure when I'll be able
> to spare some time to work on the module.  It was easy to justify
> spending time from my day job getting the module up to where is now,
> as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
> quite well for my purposes.  Again, I'm not opposed to further
> enhancements, but If I'm going to work on any of them, they'll have to
> fit into everything else I'm doing and it could be a while.  However,
> there's no reason somebody else can't do what I did.  Discuss the
> changes here, work out a plan, implement it, send along the diff(s)
> attached to a bug in bugzilla.  Next thing you know, your changes are
> in cvs.  8)
>
> On 4/11/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>> Andrew,
>>
>> >                 # Glimmer 3.X prediction
>> >                 (/\w+(\d+)\s+       # orf (numeric portion)
>> > ...isn't picking up more than the last digit in the orf-number.   
>> Not
>> > sure if that's intentional.  A sample of the feature output using -
>> >  >gff_string shows up as ...
>>
>> I think that regexp should be \w+?(\d+)
>>
>> ie. the \w+ should be non-greedy, otherwise it will swallow up all  
>> but
>> one of the following \d+ (as \d is a subset of \w)
>>
>> I've CC:ed this to Mark Johnson who made the recent changes to  
>> this module.
>>
>> Thanks for your feedback,
>>
>> --Torsten Seemann


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From johnsonm at gmail.com  Thu Apr 12 15:11:18 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 14:11:18 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
	<DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>
Message-ID: <ebf5eb170704121211s19062ac8hb9b510d440fcfe44@mail.gmail.com>

> So to reiterate, I'm suggesting that the module also parse out the frame and
> score information from Glimmer output.  I take back my suggestion of
> overriding the source / primary tags through the module as this can easily
> be done post-parser.  Other annotations can also be edited post-parser
> easily enough.

The reason the source tags are what they are for my addition(s) is
that the output from glimmer2/glimmer3 does not include a version
string.  You can figure out the major version from the output
formatting, but that's about it.  Also, being my first significant
contribution, I wasn't out to break new ground.  I did what some of
the other gene predictors seem to do, and what the existing code
already did.  Maybe there should be a method to pass in the exact
version if you know it.  Further than that, I think the Glimmer module
should stay consistent with what the other gene predictors do.  No
reason, though, that they couldn't *all* be enhanced similarly, if you
want to be able to further control the source tag.  8)

Part of the reason I didn't parse out the frame / score info for
either glimmer2 or glimmer3 was that I didn't need it.  The other part
being that my regexp kung-fu is nothing special.  This sounds like a
no-brainer to me.  Extend the regexps to capture it and tag it (and
the tests).

As far as the ORFs go, I guess you could just use
Bio::SeqFeature::Generic to represent them.  I haven't been keeping
track of the relevant feature/annotation interfaces, but maybe there
should be some kind of relation between the ORFs and predictions?

The glimmer3 detail file is a little trickier.  The least disruptive
thing to do, interface wise, might be to specify that as a seperate
input via an argument to the constructor.  Then you've got *two* input
files, and are going to have to override the automagic stuff that
expects one input file and takes care of it all.

As far as process, I just got on the list and started pestering
people, and they haven't thrown me off yet.  8)  I'm afraid that
you're going to find that while people are happy to discuss
implementation details, when it comes time to fire up the editor,
you're usually on your own, if it's an enhancement.

I'd love to work on Bioperl more, but so far, it's only been for what
I need for my job.


From spiros at lokku.com  Thu Apr 12 15:16:39 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 20:16:39 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
	<bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
Message-ID: <bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>

Hey guys,

I have added a link as per Chris's nice suggestion for keeping track
on whats going on regarding the migration:
http://www.bioperl.org/wiki/TestMoreProgress
There's also a link to this page from the project priority list.
However, adding our signature for each module etc , in my humble
opinion, seems tedious. May i suggest we just split up the list in
'starting letter sections' and each one does his part.
I volunteer to work on all tests starting with the letter R down to
the bottom of the list.

Let me know if this makes sense or not. I will work on
removing/flagging all the files that have already been migrated on
that list as well.

-spiros

On 4/12/07, Spiros Denaxas <spiros at lokku.com> wrote:
> Good idea Chris. Just got back home so will probably do it tomorrow
> morning or so.
>
> Spiros
>
> On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> > We should probably place something on the wiki to prevent overlaps
> > (i.e. make sure no two devs are working on the same tests).  I
> > planned on working on the G's last night but got bogged down.
> >
> > Spiros, if you haven't already go ahead and create a list on a wiki
> > page for tracking.  We can lay claim to them by tagging with our sigs
> > and cross them off once complete.
> >
> > chris
> >
> > On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
> >
> > > Yep! I have some rough stats I have at home, I will post them later on
> > > tonight. Roughly, if i remember correctly, 50% of the tests were still
> > > using Test, all the others were using Test::More.
> > >
> > > More to follow later on,
> > > Spiros
> > >
> > > On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> > >> It should be easy enough to find those t/*.t files that have "use
> > >> Test;"
> > >> or "require Test;" This should provide a list of files still
> > >> needing to
> > >> be converted over to Test::More. As discussed previously, it may
> > >> also be
> > >> useful to use Test::Exception to test for situations where
> > >> exceptions/warnings are thrown. If you add additional tests using
> > >> this
> > >> module, you should add the Test::Exception module to t/lib/
> > >>
> > >> Good luck, and feel free to mail the list with questions/comments
> > >> etc.
> > >>
> > >> Nath
> > >>
> > >>
> > >> Chris Fields wrote:
> > >> > At the moment we do not have a comprehensive list up on the
> > >> wiki.  I
> > >> > have been slowly working (alphabetically!) to switch them over, so
> > >> > any help would be appreciated.
> > >> >
> > >> > I have CC'd this to the main mail list for anyone else interested.
> > >> >
> > >> > chris
> > >> >
> > >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> > >> >
> > >> >
> > >> >> Hey guys,
> > >> >>
> > >> >> I noticed there's an open task regarding moving testing code to
> > >> use
> > >> >> Test::More etc and that Chris and Nathan are already on to it. Is
> > >> >> there any kind of wiki page that you keep track of which
> > >> modules you
> > >> >> are already working on? I am new to this and want to contribute,
> > >> >> having a fair amount of unit testing from work, but don't want
> > >> to step
> > >> >> over other people's work and avoid duplication as well.
> > >> >> Any pointers where i could get started would be much
> > >> appreciated :-)
> > >> >>
> > >> >> Thanks,
> > >> >> Spiros
> > >> >>
> > >> >> ps. apologies if this is not the correct list to post this, just
> > >> >> seemed the most intuitive choice.
> > >> >> _______________________________________________
> > >> >> Bioperl-guts-l mailing list
> > >> >> Bioperl-guts-l at lists.open-bio.org
> > >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> > >> >>
> > >> >
> > >> > Christopher Fields
> > >> > Postdoctoral Researcher
> > >> > Lab of Dr. Robert Switzer
> > >> > Dept of Biochemistry
> > >> > University of Illinois Urbana-Champaign
> > >> >
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > Bioperl-l mailing list
> > >> > Bioperl-l at lists.open-bio.org
> > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >> >
> > >>
> > >>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> >
>


From marian.thieme at lycos.de  Wed Apr 11 12:02:14 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Apr 2007 16:02:14 +0000
Subject: [Bioperl-l] Affys ReseqChip
Message-ID: <188661178017404@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/bc2eb3aa/attachment-0003.html>

From johnsonm at gmail.com  Thu Apr 12 15:35:35 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 14:35:35 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
Message-ID: <ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>

Looks like MediaWiki has some built in functionality:

    http://meta.wikimedia.org/wiki/Anti-spam_Features
    http://www.mediawiki.org/wiki/Extension:ConfirmEdit

I'm not sure I'd call what they're doing spam, more like vandalism,
but either way, I don't see the point (though I only looked at a
couple examples via Recent Changes).

If they're indeed bots, maybe it's time to enable Captchas? Depending
on who they are and what their goals are, that may get rid of them
completely or just slow them down.

On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> I'm posting this to the mail list in case anyone has any ideas on
> what is going on...
>
> I have noticed an odd (read: annoying) rash of spam on the wiki.
> Jason also ran some spam reversions, so maybe he can chime in.
> Essentially it looks like the responsible spambots 'correct' the wiki
> text and links, so that '+' is being removed and URI-encoded symbols
> in links are reverted to symbols.  Unfortunately the removal occurs
> in all text, so places where '+' is intended (for instance, raw text
> for showing example record formats) are also changed.  My guess is
> we'll need to block the IP address or add to the blacklist if possible.
>
> Between Jason and I we have blocked ~9 spambots and counting.
> Couldn't find anything via Google yet...
>
> chris


From cjfields at uiuc.edu  Thu Apr 12 15:44:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 14:44:28 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
Message-ID: <BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>


On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:

> Looks like MediaWiki has some built in functionality:
>
>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>
> I'm not sure I'd call what they're doing spam, more like vandalism,
> but either way, I don't see the point (though I only looked at a
> couple examples via Recent Changes).
>
> If they're indeed bots, maybe it's time to enable Captchas? Depending
> on who they are and what their goals are, that may get rid of them
> completely or just slow them down.

Already done; Mauricio installed ConfirmEdit yesterday after a bit of  
off-list discussion (thanks again Mauricio!).

If you create a new account you'll encounter a simple captcha (it  
isn't configured for each edit yet).  We may implement confirmations  
per edit or install picture captchas at a later point, dep. on how  
well the current system works.

We may start granting anyone interested in maintaining the wiki sysop  
privs which makes handling spam easier.  If so we'll probably  
announce something along those lines here first.

chris


From cjfields at uiuc.edu  Thu Apr 12 15:48:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 14:48:41 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
	<bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
	<bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>
Message-ID: <3B4500DD-CAB6-4FD6-ABF9-A0160981F7E3@uiuc.edu>

Sounds good!  I'll finish up the P's (halfway through now...) and  
move on to other things; got plenty to do, believe me!

Appreciate all the help, Spiros!

chris

On Apr 12, 2007, at 2:16 PM, Spiros Denaxas wrote:

> Hey guys,
>
> I have added a link as per Chris's nice suggestion for keeping track
> on whats going on regarding the migration:
> http://www.bioperl.org/wiki/TestMoreProgress
> There's also a link to this page from the project priority list.
> However, adding our signature for each module etc , in my humble
> opinion, seems tedious. May i suggest we just split up the list in
> 'starting letter sections' and each one does his part.
> I volunteer to work on all tests starting with the letter R down to
> the bottom of the list.
>
> Let me know if this makes sense or not. I will work on
> removing/flagging all the files that have already been migrated on
> that list as well.
>
> -spiros
>
> On 4/12/07, Spiros Denaxas <spiros at lokku.com> wrote:
>> Good idea Chris. Just got back home so will probably do it tomorrow
>> morning or so.
>>
>> Spiros
>>
>> On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> We should probably place something on the wiki to prevent overlaps
>>> (i.e. make sure no two devs are working on the same tests).  I
>>> planned on working on the G's last night but got bogged down.
>>>
>>> Spiros, if you haven't already go ahead and create a list on a wiki
>>> page for tracking.  We can lay claim to them by tagging with our  
>>> sigs
>>> and cross them off once complete.
>>>
>>> chris
>>>
>>> On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
>>>
>>>> Yep! I have some rough stats I have at home, I will post them  
>>>> later on
>>>> tonight. Roughly, if i remember correctly, 50% of the tests were  
>>>> still
>>>> using Test, all the others were using Test::More.
>>>>
>>>> More to follow later on,
>>>> Spiros
>>>>
>>>> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
>>>>> It should be easy enough to find those t/*.t files that have "use
>>>>> Test;"
>>>>> or "require Test;" This should provide a list of files still
>>>>> needing to
>>>>> be converted over to Test::More. As discussed previously, it may
>>>>> also be
>>>>> useful to use Test::Exception to test for situations where
>>>>> exceptions/warnings are thrown. If you add additional tests using
>>>>> this
>>>>> module, you should add the Test::Exception module to t/lib/
>>>>>
>>>>> Good luck, and feel free to mail the list with questions/comments
>>>>> etc.
>>>>>
>>>>> Nath
>>>>>
>>>>>
>>>>> Chris Fields wrote:
>>>>>> At the moment we do not have a comprehensive list up on the
>>>>> wiki.  I
>>>>>> have been slowly working (alphabetically!) to switch them  
>>>>>> over, so
>>>>>> any help would be appreciated.
>>>>>>
>>>>>> I have CC'd this to the main mail list for anyone else  
>>>>>> interested.
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>>>>>>
>>>>>>
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> I noticed there's an open task regarding moving testing code to
>>>>> use
>>>>>>> Test::More etc and that Chris and Nathan are already on to  
>>>>>>> it. Is
>>>>>>> there any kind of wiki page that you keep track of which
>>>>> modules you
>>>>>>> are already working on? I am new to this and want to contribute,
>>>>>>> having a fair amount of unit testing from work, but don't want
>>>>> to step
>>>>>>> over other people's work and avoid duplication as well.
>>>>>>> Any pointers where i could get started would be much
>>>>> appreciated :-)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Spiros
>>>>>>>
>>>>>>> ps. apologies if this is not the correct list to post this, just
>>>>>>> seemed the most intuitive choice.
>>>>>>> _______________________________________________
>>>>>>> Bioperl-guts-l mailing list
>>>>>>> Bioperl-guts-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>>>>>>>
>>>>>>
>>>>>> Christopher Fields
>>>>>> Postdoctoral Researcher
>>>>>> Lab of Dr. Robert Switzer
>>>>>> Dept of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>
>>>>>
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From spiros at lokku.com  Thu Apr 12 16:19:18 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 21:19:18 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
Message-ID: <bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>

Nice idea, i saw it a bit before. However, any chance of implementing
white lists with regular and/or trusted users to skip it each time we
add something to the wiki ?

Spiros

On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>
> > Looks like MediaWiki has some built in functionality:
> >
> >    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >
> > I'm not sure I'd call what they're doing spam, more like vandalism,
> > but either way, I don't see the point (though I only looked at a
> > couple examples via Recent Changes).
> >
> > If they're indeed bots, maybe it's time to enable Captchas? Depending
> > on who they are and what their goals are, that may get rid of them
> > completely or just slow them down.
>
> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> off-list discussion (thanks again Mauricio!).
>
> If you create a new account you'll encounter a simple captcha (it
> isn't configured for each edit yet).  We may implement confirmations
> per edit or install picture captchas at a later point, dep. on how
> well the current system works.
>
> We may start granting anyone interested in maintaining the wiki sysop
> privs which makes handling spam easier.  If so we'll probably
> announce something along those lines here first.
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Jonathan_Epstein at nih.gov  Thu Apr 12 16:22:40 2007
From: Jonathan_Epstein at nih.gov (Jonathan Epstein)
Date: Thu, 12 Apr 2007 16:22:40 -0400
Subject: [Bioperl-l] Affys ReseqChip
In-Reply-To: <188661178017404@lycos-europe.com>
References: <188661178017404@lycos-europe.com>
Message-ID: <6.2.3.4.2.20070412161407.04a38b60@mail.nih.gov>

This sounds great to me.

Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this.  But I suggest that you proceed full-speed-ahead, and we can sort this out in the future.

Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach.

Jonathan


At 12:02 PM 4/11/2007, Marian Thieme wrote:
>Hi,
>
>I am working on a piece of software, which is aimed to analyse the outcome of Affymetrix DNA Resequencing Arrays. (In particular Mitochip V2). The main goal of the software is to take into account for the redundant fragments. The software is able to align the redundant fragments to the entire sequence and in particular to call bases which arent called by the entire sequence and to detect insertions/deletion, depending on the design of the redundant frags.
>
>I would be glad to distribute the software to the bioperl package or otherwise, if anybody is interested I can give the code and/or further develop some features.
>
>Marian

Jonathan Epstein                                Jonathan_Epstein at nih.gov
Head, Unit on Biologic Computation              (301)402-4563
Office of the Scientific Director               Bldg 31, Room 2A47
Nat. Inst. of Child Health & Human Development  31 Center Drive
National Institutes of Health                   Bethesda, MD 20892  


From spiros at lokku.com  Thu Apr 12 17:35:43 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 22:35:43 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EA4FA.8010504@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
	<461EA4FA.8010504@campus.iztacala.unam.mx>
Message-ID: <bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>

Mauricio, thanks for your response. I actually edited a page several
times today and i got the captcha. More specifically, it was displayed
because "the page i edited contained external links" which is true
since i included a {{CPAN}} link.

Spiros

On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> The chance of having white lists exists but as far as I tested last
> night, the captcha is working only at the Create Account pages, not at
> the time of applying changes to wiki content (I tested as a regular user
> and not as a wiki admin).
>
> The idea at this moment is only to block automated methods for account
> creation (bots). Registered users who haven't been blocked and/or have
> confirmed their email wouldn't be bothered while adding/changing wiki
> content.
>
> Regards,
> Mauricio.
>
> Spiros Denaxas wrote:
> > Nice idea, i saw it a bit before. However, any chance of implementing
> > white lists with regular and/or trusted users to skip it each time we
> > add something to the wiki ?
> >
> > Spiros
> >
> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
> >>
> >>> Looks like MediaWiki has some built in functionality:
> >>>
> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >>>
> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
> >>> but either way, I don't see the point (though I only looked at a
> >>> couple examples via Recent Changes).
> >>>
> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
> >>> on who they are and what their goals are, that may get rid of them
> >>> completely or just slow them down.
> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> >> off-list discussion (thanks again Mauricio!).
> >>
> >> If you create a new account you'll encounter a simple captcha (it
> >> isn't configured for each edit yet).  We may implement confirmations
> >> per edit or install picture captchas at a later point, dep. on how
> >> well the current system works.
> >>
> >> We may start granting anyone interested in maintaining the wiki sysop
> >> privs which makes handling spam easier.  If so we'll probably
> >> announce something along those lines here first.
> >>
> >> chris
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>


From arareko at campus.iztacala.unam.mx  Thu Apr 12 17:30:34 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 12 Apr 2007 16:30:34 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
Message-ID: <461EA4FA.8010504@campus.iztacala.unam.mx>

The chance of having white lists exists but as far as I tested last 
night, the captcha is working only at the Create Account pages, not at 
the time of applying changes to wiki content (I tested as a regular user 
and not as a wiki admin).

The idea at this moment is only to block automated methods for account 
creation (bots). Registered users who haven't been blocked and/or have 
confirmed their email wouldn't be bothered while adding/changing wiki 
content.

Regards,
Mauricio.

Spiros Denaxas wrote:
> Nice idea, i saw it a bit before. However, any chance of implementing
> white lists with regular and/or trusted users to skip it each time we
> add something to the wiki ?
> 
> Spiros
> 
> On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>>
>>> Looks like MediaWiki has some built in functionality:
>>>
>>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>>>
>>> I'm not sure I'd call what they're doing spam, more like vandalism,
>>> but either way, I don't see the point (though I only looked at a
>>> couple examples via Recent Changes).
>>>
>>> If they're indeed bots, maybe it's time to enable Captchas? Depending
>>> on who they are and what their goals are, that may get rid of them
>>> completely or just slow them down.
>> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
>> off-list discussion (thanks again Mauricio!).
>>
>> If you create a new account you'll encounter a simple captcha (it
>> isn't configured for each edit yet).  We may implement confirmations
>> per edit or install picture captchas at a later point, dep. on how
>> well the current system works.
>>
>> We may start granting anyone interested in maintaining the wiki sysop
>> privs which makes handling spam easier.  If so we'll probably
>> announce something along those lines here first.
>>
>> chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From arareko at campus.iztacala.unam.mx  Thu Apr 12 17:53:51 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 12 Apr 2007 16:53:51 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>	
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>	
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
Message-ID: <461EAA6F.1090805@campus.iztacala.unam.mx>

I've reconfigured the extension to display captchas exclusively for 
account creation and disabled it when adding URLs in pages. Don't know 
why this didn't happened to me while testing last night...

Please try do it again to see if the change works. Thanks for pointing 
this out Spiros :)

Mauricio.

Spiros Denaxas wrote:
> Mauricio, thanks for your response. I actually edited a page several
> times today and i got the captcha. More specifically, it was displayed
> because "the page i edited contained external links" which is true
> since i included a {{CPAN}} link.
> 
> Spiros
> 
> On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> 
> wrote:
>> The chance of having white lists exists but as far as I tested last
>> night, the captcha is working only at the Create Account pages, not at
>> the time of applying changes to wiki content (I tested as a regular user
>> and not as a wiki admin).
>>
>> The idea at this moment is only to block automated methods for account
>> creation (bots). Registered users who haven't been blocked and/or have
>> confirmed their email wouldn't be bothered while adding/changing wiki
>> content.
>>
>> Regards,
>> Mauricio.
>>
>> Spiros Denaxas wrote:
>> > Nice idea, i saw it a bit before. However, any chance of implementing
>> > white lists with regular and/or trusted users to skip it each time we
>> > add something to the wiki ?
>> >
>> > Spiros
>> >
>> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>> >>
>> >>> Looks like MediaWiki has some built in functionality:
>> >>>
>> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>> >>>
>> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
>> >>> but either way, I don't see the point (though I only looked at a
>> >>> couple examples via Recent Changes).
>> >>>
>> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
>> >>> on who they are and what their goals are, that may get rid of them
>> >>> completely or just slow them down.
>> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
>> >> off-list discussion (thanks again Mauricio!).
>> >>
>> >> If you create a new account you'll encounter a simple captcha (it
>> >> isn't configured for each edit yet).  We may implement confirmations
>> >> per edit or install picture captchas at a later point, dep. on how
>> >> well the current system works.
>> >>
>> >> We may start granting anyone interested in maintaining the wiki sysop
>> >> privs which makes handling spam easier.  If so we'll probably
>> >> announce something along those lines here first.
>> >>
>> >> chris
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>> -- 
>> MAURICIO HERRERA CUADRA
>> arareko at campus.iztacala.unam.mx
>> Laboratorio de Gen?tica
>> Unidad de Morfofisiolog?a y Funci?n
>> Facultad de Estudios Superiores Iztacala, UNAM
>>
>>
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From spiros at lokku.com  Thu Apr 12 18:11:46 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 23:11:46 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
	<461EAA6F.1090805@campus.iztacala.unam.mx>
Message-ID: <bba689ec0704121511y135f0da0j26d520a11dd3ffa1@mail.gmail.com>

You're welcome Mauricio. Its all cool now, works without the captcha
for internal edits. Thanks for changing it over :-)

-spiros

On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> I've reconfigured the extension to display captchas exclusively for
> account creation and disabled it when adding URLs in pages. Don't know
> why this didn't happened to me while testing last night...
>
> Please try do it again to see if the change works. Thanks for pointing
> this out Spiros :)
>
> Mauricio.
>
> Spiros Denaxas wrote:
> > Mauricio, thanks for your response. I actually edited a page several
> > times today and i got the captcha. More specifically, it was displayed
> > because "the page i edited contained external links" which is true
> > since i included a {{CPAN}} link.
> >
> > Spiros
> >
> > On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx>
> > wrote:
> >> The chance of having white lists exists but as far as I tested last
> >> night, the captcha is working only at the Create Account pages, not at
> >> the time of applying changes to wiki content (I tested as a regular user
> >> and not as a wiki admin).
> >>
> >> The idea at this moment is only to block automated methods for account
> >> creation (bots). Registered users who haven't been blocked and/or have
> >> confirmed their email wouldn't be bothered while adding/changing wiki
> >> content.
> >>
> >> Regards,
> >> Mauricio.
> >>
> >> Spiros Denaxas wrote:
> >> > Nice idea, i saw it a bit before. However, any chance of implementing
> >> > white lists with regular and/or trusted users to skip it each time we
> >> > add something to the wiki ?
> >> >
> >> > Spiros
> >> >
> >> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
> >> >>
> >> >>> Looks like MediaWiki has some built in functionality:
> >> >>>
> >> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >> >>>
> >> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
> >> >>> but either way, I don't see the point (though I only looked at a
> >> >>> couple examples via Recent Changes).
> >> >>>
> >> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
> >> >>> on who they are and what their goals are, that may get rid of them
> >> >>> completely or just slow them down.
> >> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> >> >> off-list discussion (thanks again Mauricio!).
> >> >>
> >> >> If you create a new account you'll encounter a simple captcha (it
> >> >> isn't configured for each edit yet).  We may implement confirmations
> >> >> per edit or install picture captchas at a later point, dep. on how
> >> >> well the current system works.
> >> >>
> >> >> We may start granting anyone interested in maintaining the wiki sysop
> >> >> privs which makes handling spam easier.  If so we'll probably
> >> >> announce something along those lines here first.
> >> >>
> >> >> chris
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Bioperl-l mailing list
> >> >> Bioperl-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >>
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >> --
> >> MAURICIO HERRERA CUADRA
> >> arareko at campus.iztacala.unam.mx
> >> Laboratorio de Gen?tica
> >> Unidad de Morfofisiolog?a y Funci?n
> >> Facultad de Estudios Superiores Iztacala, UNAM
> >>
> >>
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>


From cjfields at uiuc.edu  Thu Apr 12 18:02:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 17:02:51 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>	
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>	
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
	<461EAA6F.1090805@campus.iztacala.unam.mx>
Message-ID: <E1139262-84C3-4282-8E9D-643BF91A3656@uiuc.edu>

You disabled yourself as sysop last night, IIRC.  Don't know; could  
be what Spiros suggested, eg. adding external links trips it.

chris

On Apr 12, 2007, at 4:53 PM, Mauricio Herrera Cuadra wrote:

> I've reconfigured the extension to display captchas exclusively for  
> account creation and disabled it when adding URLs in pages. Don't  
> know why this didn't happened to me while testing last night...
>
> Please try do it again to see if the change works. Thanks for  
> pointing this out Spiros :)
>
> Mauricio.
>
> Spiros Denaxas wrote:
>> Mauricio, thanks for your response. I actually edited a page several
>> times today and i got the captcha. More specifically, it was  
>> displayed
>> because "the page i edited contained external links" which is true
>> since i included a {{CPAN}} link.
>> Spiros
>> On 4/12/07, Mauricio Herrera Cuadra  
>> <arareko at campus.iztacala.unam.mx> wrote:
>>> The chance of having white lists exists but as far as I tested last
>>> night, the captcha is working only at the Create Account pages,  
>>> not at
>>> the time of applying changes to wiki content (I tested as a  
>>> regular user
>>> and not as a wiki admin).
>>>
>>> The idea at this moment is only to block automated methods for  
>>> account
>>> creation (bots). Registered users who haven't been blocked and/or  
>>> have
>>> confirmed their email wouldn't be bothered while adding/changing  
>>> wiki
>>> content.
>>>
>>> Regards,
>>> Mauricio.
>>>
>>> Spiros Denaxas wrote:
>>> > Nice idea, i saw it a bit before. However, any chance of  
>>> implementing
>>> > white lists with regular and/or trusted users to skip it each  
>>> time we
>>> > add something to the wiki ?
>>> >
>>> > Spiros
>>> >
>>> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>>> >>
>>> >>> Looks like MediaWiki has some built in functionality:
>>> >>>
>>> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>>> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>>> >>>
>>> >>> I'm not sure I'd call what they're doing spam, more like  
>>> vandalism,
>>> >>> but either way, I don't see the point (though I only looked at a
>>> >>> couple examples via Recent Changes).
>>> >>>
>>> >>> If they're indeed bots, maybe it's time to enable Captchas?  
>>> Depending
>>> >>> on who they are and what their goals are, that may get rid of  
>>> them
>>> >>> completely or just slow them down.
>>> >> Already done; Mauricio installed ConfirmEdit yesterday after a  
>>> bit of
>>> >> off-list discussion (thanks again Mauricio!).
>>> >>
>>> >> If you create a new account you'll encounter a simple captcha (it
>>> >> isn't configured for each edit yet).  We may implement  
>>> confirmations
>>> >> per edit or install picture captchas at a later point, dep. on  
>>> how
>>> >> well the current system works.
>>> >>
>>> >> We may start granting anyone interested in maintaining the  
>>> wiki sysop
>>> >> privs which makes handling spam easier.  If so we'll probably
>>> >> announce something along those lines here first.
>>> >>
>>> >> chris
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Bioperl-l mailing list
>>> >> Bioperl-l at lists.open-bio.org
>>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >>
>>> > _______________________________________________
>>> > Bioperl-l mailing list
>>> > Bioperl-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >
>>>
>>> -- 
>>> MAURICIO HERRERA CUADRA
>>> arareko at campus.iztacala.unam.mx
>>> Laboratorio de Gen?tica
>>> Unidad de Morfofisiolog?a y Funci?n
>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>
>>>
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Apr 13 04:30:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 13 Apr 2007 09:30:50 +0100
Subject: [Bioperl-l] GenericHit->start/end needs tiled hsps?
Message-ID: <461F3FBA.2010101@sendu.me.uk>

Hi all,

I want to double-check my thinking regarding 
Bio::Search::Hit::GenericHit->start() and end(). Right now the docs 
claim that hsps of the hit object must be tiled before the answer can be 
produced. The code is implemented in that way 
(Bio::Search::SearchUtils::tile_hsps($self)).

Yet as far as I can see, all you need to do is loop through all hsps and 
pick out the smallest start and largest end respectively in terms of 
subject and query.

This comes up because I have a blast report where a single hit contains 
over 80000 hsps and the tiling takes over an hour (I gave up on it, 
don't know how long it really takes). The simple loop through hsps takes 
seconds or less.

Now in this situation the answer isn't especially useful (to me). An 
alternative way of fixing the problem would be to re-write the tiling 
algorithm (again) to somehow make it hundreds of times faster, then 
provide some way in start() and end() for the user to request the start 
and end of the best contig, or other contig of choice. Easier said than 
done though!


What do people think?


From marian.thieme at lycos.de  Fri Apr 13 06:12:51 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Fri, 13 Apr 2007 10:12:51 +0000
Subject: [Bioperl-l] Affys ReseqChip
Message-ID: <18866117804894@lycos-europe.com>

Hi,

To provide a better understanding of the matter and to assess the approach I will shortly present 
1.) the problem and 2.) my approach.


1.)
given: fragments (string of certain length) with description of location within some reference sequence. For instance:

- redundant fragment: acgtnna--gcta (deletion: pos12, pos13)
- start position: 5
- end position: 17
- and some suited reference sequence

Fragments are assumed to be mappable 1:1 to reference sequence and can contain gaps and n's, the latter indicates that the base wasnt determined maybe because of failed hybridization or something like this.
Thus we dont need to cope with insertions/deletions in terms of only parsing an array design file (description of all insertions and deletions in each redundant fragment) and according to that description inserting gaps in the reference sequence and in the fragments if required.
So from my point of view and in the case of the affy mitochip v2 we only need to process the description file rather than calculating an alignment via dynamic programming matrix.


2.)
My current approach is like the following 5 steps:

1.) input reference sequence and redundant fragments into SeqIO object.

2.) calculate a hash with all insertions defined by length and position and
3.) insert the longest insertion of each position in the appropriate fragments and in the reference sequence. And hence insert as many gaps as given by

length(max_insertion(position_x))-length(insertion(fragment_y, position _x))

to each fragment/reference sequence.
(This is done by iterating over each sequence in the SeqIO and insert gaps according to insertion hash) and

4.) Create SimpleAlign object with LocatableSeq objects

5.) Afterwards we can do some statistical analysis and calc some consensus base for each column in the SimpleAlignment. (I use a Statistics module from cpan).

Unfortunatly I didnt manage to find some method that is giving me the set of bases (column) for a given position in the alignment (did I overlooked something ? is SimpleAlign not appropriate? ), so I iterate for each position (base) of the reference sequence and for each fragments which covers that particular position.


Marian


Jonathan Epstein schrieb:

> This sounds great to me.
>
> Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this.  But I suggest that you proceed full-speed-ahead, and we can sort this out in the future.
>
> Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach.
>
> Jonathan

Stelle Deine Fragen bei Lycos iQ -  http://iq.lycos.de/qa/ask/

From thiago.venancio at gmail.com  Fri Apr 13 15:05:12 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Fri, 13 Apr 2007 16:05:12 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
Message-ID: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>

Hi all.

What is the best way to extract coding region from a nucleotide sequence
based on a BLASTX or TBLASTX comparisons ?

Thanks in advance.

Thiago
-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================


From jason at bioperl.org  Fri Apr 13 16:05:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:05:42 -0700
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
Message-ID: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>

Depends on how far away the query protein is, but I don't trust BLAST  
for the actual alignment.  Find the boundaries, add a little slop,  
and refine the alignment of protein to genome with a good alignment  
program designed to like genewise or exonerate or even FASTX/Y.

-jason
On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:

> Hi all.
>
> What is the best way to extract coding region from a nucleotide  
> sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
> -- 
> "The way to get started is to quit talking and begin doing."
>       Walt Disney
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From jason at bioperl.org  Fri Apr 13 16:13:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:13:07 -0700
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
Message-ID: <7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>

I think it just needs an edit the code in the to_string which checks  
for the type of algorithm.  You'd need to add to the if/elsif cascade  
and add something for the RPSBLAST type and codes the query and  
target dbs and query and target sequence types properly.  This would  
be very trivial to code in, have you tried adding this to see if it  
works?

if you submit a bug with and example report we'd be able to make  
appropriate changes faster.

-jason
On Apr 11, 2007, at 6:32 AM, Emeric Sevin wrote:

> Hi everybody,
>
> I'm sorry to bug, but either I missed something so obvious nobody  
> bothered to answer, either I'm being a little boycotted here...
> A little help would be very much appreciated
>
> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>
>> Hello,
>>
>> I am new to this community, and apologize if this subject has been  
>> posted before.
>>
>> I want to print out only selected results from a multiple blast- 
>> alignments results file. Problem is, the algorithm used is  
>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the  
>> actual writing task yields "unclean" warnings. Although an ouput  
>> is actually written, the writer  
>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by  
>> the fact rpsblast DBs are not labeled with  
>> "protein"/"nucleic"/"translated".
>> Does anybody know of an easy fix to that bug, or of another way to  
>> come around it?
>>
>> Thank you very much
>>
>> Emeric SEVIN
>> Universit? de Rennes 1_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From thiago.venancio at gmail.com  Fri Apr 13 16:20:32 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Fri, 13 Apr 2007 17:20:32 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
Message-ID: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>

Thanks Jason.

I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
comparisons and want to extract some translated coding regions for further
multiple aligmnent and phylogenetic analysis.

Best.

Thiago

On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Depends on how far away the query protein is, but I don't trust BLAST for
> the actual alignment.  Find the boundaries, add a little slop, and refine
> the alignment of protein to genome with a good alignment program designed to
> like genewise or exonerate or even FASTX/Y.
> -jason
> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>
> Hi all.
>
> What is the best way to extract coding region from a nucleotide sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


From jason at bioperl.org  Fri Apr 13 16:47:50 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:47:50 -0700
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
	<44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
Message-ID: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>

Hi -

There are some tools that do this for you -- I've listed a few from a  
google search or from what I remember reading.  It would be great If  
you (and others!) are willing to contribute a little of the info of  
what you find that works for you to the wiki, that would be great as  
well.   A little HOWTO would be cool - here or on openwetware.org.

Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml
EST-PAC:  doi: http://dx.doi.org/10.1186/1751-0473-1-2

Ewan Birney's estwise as part of wise package also can help if you  
have a likely protein from BLAST you want to align to the est -  
estwise can handle frameshifts, but can be too slow for some people.   
Exonerate's protein2dna model may also work here, but I haven't tried  
it.

-jason
On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote:

> Thanks Jason.
>
> I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
> comparisons and want to extract some translated coding regions for  
> further
> multiple aligmnent and phylogenetic analysis.
>
> Best.
>
> Thiago
>
> On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> Depends on how far away the query protein is, but I don't trust  
>> BLAST for
>> the actual alignment.  Find the boundaries, add a little slop, and  
>> refine
>> the alignment of protein to genome with a good alignment program  
>> designed to
>> like genewise or exonerate or even FASTX/Y.
>> -jason
>> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>>
>> Hi all.
>>
>> What is the best way to extract coding region from a nucleotide  
>> sequence
>> based on a BLASTX or TBLASTX comparisons ?
>>
>> Thanks in advance.
>>
>> Thiago
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>> http://jason.open-bio.org/
>>
>>
>>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From gopu_36 at yahoo.com  Fri Apr 13 12:48:48 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Fri, 13 Apr 2007 09:48:48 -0700 (PDT)
Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score
Message-ID: <9982570.post@talk.nabble.com>


Can anyone help me to collect the value of the second best hit score
(ie)raw_score from the blast results which contains multiple queries? I have
used searchIO object to parse my blast report. I am only interested in the
second best hit/raw_score and not the first hit!

Thanks in advance!


-- 
View this message in context: http://www.nabble.com/How-to-parse-blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Sat Apr 14 13:53:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 14 Apr 2007 10:53:42 -0700
Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score
In-Reply-To: <9982570.post@talk.nabble.com>
References: <9982570.post@talk.nabble.com>
Message-ID: <67974DCD-B1F9-4286-86A4-5E4C4DBA3914@bioperl.org>

Try reading the HOWTO.

http://bioperl.org/wiki/HOWTO:SearchIO

On Apr 13, 2007, at 9:48 AM, gopu_36 wrote:

>
> Can anyone help me to collect the value of the second best hit score
> (ie)raw_score from the blast results which contains multiple  
> queries? I have
> used searchIO object to parse my blast report. I am only interested  
> in the
> second best hit/raw_score and not the first hit!
>
> Thanks in advance!
>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-parse- 
> blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment-0003.bin>

From gdorjee at hotmail.com  Sat Apr 14 17:39:50 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
Message-ID: <9997343.post@talk.nabble.com>


hi all, 
can anyone please tell me why the following script gives me error like:
waiting... 5 units of time
Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.
cheers!

use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], 
                              -format => 'fasta');
my $query = $Seq_in->next_seq(); 

my $factory = Bio::Tools::Run::RemoteBlast->new(
                                '-prog'  => 'blastp',
                                '-data' => 'swissprot',
                                 _READMETHOD => "Blast"
                         );
my $blast_report = $factory->submit_blast($query);
my $max_number = 100;
my $trial = 0;


while ( my @rids = $factory->each_rid ) {

    print STDERR "\nSorry, maximum number of retries $max_number exceeded\n"
if $trial >= $max_number;
    last if $trial >= $max_number;
    $trial++;

    print STDERR "waiting... ".(5*$trial)." units of time\n" ;

    # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
    foreach my $rid ( @rids ) {
        my $rc = $factory->retrieve_blast($rid);
       if( !ref($rc) ) {
           if( $rc < 0 ) {
                # retrieve_blast returns -1 on error
               $factory->remove_rid($rid);
            }
            # retrieve_blast returns 0 on 'job not finished'
           sleep 5*$trial;
        } else {

            #---- Blast done ----
            $factory->remove_rid($rid);
            my $result = $rc->next_result;
            print "database: ", $result->database_name(), "\n";
            while( my $hit = $result->next_hit ) {
                print "hit name is: ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                    print "score is: ", $hsp->score, "\n";
                }          }
        }
    }
} 
-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Sat Apr 14 17:39:50 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
Message-ID: <9997343.post@talk.nabble.com>


hi all, 
can anyone please tell me why and how can i fix the following script, which
gives me an error like:
waiting... 5 units of time
Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.
cheers!

use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], 
                              -format => 'fasta');
my $query = $Seq_in->next_seq(); 

my $factory = Bio::Tools::Run::RemoteBlast->new(
                                '-prog'  => 'blastp',
                                '-data' => 'swissprot',
                                 _READMETHOD => "Blast"
                         );
my $blast_report = $factory->submit_blast($query);
my $max_number = 100;
my $trial = 0;


while ( my @rids = $factory->each_rid ) {

    print STDERR "\nSorry, maximum number of retries $max_number exceeded\n"
if $trial >= $max_number;
    last if $trial >= $max_number;
    $trial++;

    print STDERR "waiting... ".(5*$trial)." units of time\n" ;

    # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
    foreach my $rid ( @rids ) {
        my $rc = $factory->retrieve_blast($rid);
       if( !ref($rc) ) {
           if( $rc < 0 ) {
                # retrieve_blast returns -1 on error
               $factory->remove_rid($rid);
            }
            # retrieve_blast returns 0 on 'job not finished'
           sleep 5*$trial;
        } else {

            #---- Blast done ----
            $factory->remove_rid($rid);
            my $result = $rc->next_result;
            print "database: ", $result->database_name(), "\n";
            while( my $hit = $result->next_hit ) {
                print "hit name is: ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                    print "score is: ", $hsp->score, "\n";
                }          }
        }
    }
} 
-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From dmessina at wustl.edu  Sun Apr 15 12:02:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 15 Apr 2007 11:02:51 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <9997343.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
Message-ID: <ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>

Hi DeeGee,

Your script worked fine for me. Perhaps the problem is in your input  
fasta file?

Dave

% perl test.pl AAC12660.fa
waiting... 5 units of time
waiting... 10 units of time
waiting... 15 units of time
database: Non-redundant SwissProt sequences
hit name is: sp|Q15750|TAB1_HUMAN
score is: 2413
hit name is: sp|Q8CF89|TAB1_MOUSE
score is: 2352
hit name is: sp|P49444|PP2C_PARTE
score is: 159
hit name is: sp|Q6ING9|PP2CK_XENLA
[...etc...]


From spiros at lokku.com  Sun Apr 15 12:12:05 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Sun, 15 Apr 2007 17:12:05 +0100
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
Message-ID: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>

Yep, it must be in the input file. The

$result->database_name()

function gets called on $result the result object.

The error you get,

Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.

means the result object is not defined thus the function fails since
there are no data to operate on.

Spiros

On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
> Hi DeeGee,
>
> Your script worked fine for me. Perhaps the problem is in your input
> fasta file?
>
> Dave
>
> % perl test.pl AAC12660.fa
> waiting... 5 units of time
> waiting... 10 units of time
> waiting... 15 units of time
> database: Non-redundant SwissProt sequences
> hit name is: sp|Q15750|TAB1_HUMAN
> score is: 2413
> hit name is: sp|Q8CF89|TAB1_MOUSE
> score is: 2352
> hit name is: sp|P49444|PP2C_PARTE
> score is: 159
> hit name is: sp|Q6ING9|PP2CK_XENLA
> [...etc...]
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From dr.hogart at gmail.com  Sun Apr 15 12:13:29 2007
From: dr.hogart at gmail.com (sergei ryazansky)
Date: Sun, 15 Apr 2007 20:13:29 +0400
Subject: [Bioperl-l] error with blast parsing by searchIO
Message-ID: <op.tqt10r17avnppr@hogart.img.ras.ru>

Hello all,

script (parsing blastn report) that previously had worked today "tell" me  
that:

------------- EXCEPTION  -------------
MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
: No such file or directory
STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
STACK toplevel parse-te-lib2.pl:3

--------------------------------------

What does it mean??

ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8


From cjfields at uiuc.edu  Sun Apr 15 13:40:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 15 Apr 2007 12:40:24 -0500
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqt10r17avnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
Message-ID: <460926E6-0EEA-45D9-838E-70706062857C@uiuc.edu>

You have to update to bioperl 1.5.2 or CVS.  BLAST parsing is broken  
for recent BLAST versions (> v.2.2, I believe).

chris

On Apr 15, 2007, at 11:13 AM, sergei ryazansky wrote:

> Hello all,
>
> script (parsing blastn report) that previously had worked today  
> "tell" me
> that:
>
> ------------- EXCEPTION  -------------
> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
> : No such file or directory
> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: 
> 273
> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
> STACK toplevel parse-te-lib2.pl:3
>
> --------------------------------------
>
> What does it mean??
>
> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Sun Apr 15 14:24:56 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 15 Apr 2007 11:24:56 -0700
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqt10r17avnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
Message-ID: <C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>

It looks like something is broken in your script as to how you are  
passing it a filename - it is trying to open a file called "BLASTN  
2.2.13 [Nov-27-2005]".
did you already open the file and are you passing data from the first  
line of the file to SearchIO perhaps?
Sending the relevant part of your script to the list will help us  
diagnose the problem better.

-jason
On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote:

> Hello all,
>
> script (parsing blastn report) that previously had worked today  
> "tell" me
> that:
>
> ------------- EXCEPTION  -------------
> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
> : No such file or directory
> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: 
> 273
> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
> STACK toplevel parse-te-lib2.pl:3
>
> --------------------------------------
>
> What does it mean??
>
> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment-0003.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment-0003.bin>

From gdorjee at hotmail.com  Sun Apr 15 20:40:22 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
Message-ID: <10008507.post@talk.nabble.com>


hi guys,
thanks for your replies, but i still don't understand why it doesn't work.
my input fasta sequence looks fine. here, take a look,

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

is it possible that the script is not being about to read the
RemoteBlast.pm? but the thing is, i can run the standalone blast on the
command line, although i've never been able the run the same with cgi module
(by gettting the input from an html textarea). i don't understand. i've been
trying to get the standalone running for a while now, and i also mentioned
it in my previous postings....but all in vain. i haven't got over it yet. 
any help or an example would be much appreciated.


Spiros Denaxas wrote:
> 
> Yep, it must be in the input file. The
> 
> $result->database_name()
> 
> function gets called on $result the result object.
> 
> The error you get,
> 
> Can't call method "database_name" on an undefined value at
> test1_remote_swissblast.pl line 41, <GEN4> line 31.
> 
> means the result object is not defined thus the function fails since
> there are no data to operate on.
> 
> Spiros
> 
> On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
>> Hi DeeGee,
>>
>> Your script worked fine for me. Perhaps the problem is in your input
>> fasta file?
>>
>> Dave
>>
>> % perl test.pl AAC12660.fa
>> waiting... 5 units of time
>> waiting... 10 units of time
>> waiting... 15 units of time
>> database: Non-redundant SwissProt sequences
>> hit name is: sp|Q15750|TAB1_HUMAN
>> score is: 2413
>> hit name is: sp|Q8CF89|TAB1_MOUSE
>> score is: 2352
>> hit name is: sp|P49444|PP2C_PARTE
>> score is: 159
>> hit name is: sp|Q6ING9|PP2CK_XENLA
>> [...etc...]
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Sun Apr 15 20:40:22 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
Message-ID: <10008507.post@talk.nabble.com>


hi guys,
thanks for your replies, but i still don't understand why it doesn't work.
my input fasta sequence looks fine. here, take a look,

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

is it possible that the script is not being able to read the RemoteBlast.pm?
but the thing is, i can run the standalone blast on the command line,
although i've never been able the run the same with cgi module (by gettting
the input from an html textarea). i don't understand. i've been trying to
get the standalone running for a while now, and i also mentioned it in my
previous postings....but all in vain. i haven't got over it yet. 
any help or an example would be much appreciated.


Spiros Denaxas wrote:
> 
> Yep, it must be in the input file. The
> 
> $result->database_name()
> 
> function gets called on $result the result object.
> 
> The error you get,
> 
> Can't call method "database_name" on an undefined value at
> test1_remote_swissblast.pl line 41, <GEN4> line 31.
> 
> means the result object is not defined thus the function fails since
> there are no data to operate on.
> 
> Spiros
> 
> On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
>> Hi DeeGee,
>>
>> Your script worked fine for me. Perhaps the problem is in your input
>> fasta file?
>>
>> Dave
>>
>> % perl test.pl AAC12660.fa
>> waiting... 5 units of time
>> waiting... 10 units of time
>> waiting... 15 units of time
>> database: Non-redundant SwissProt sequences
>> hit name is: sp|Q15750|TAB1_HUMAN
>> score is: 2413
>> hit name is: sp|Q8CF89|TAB1_MOUSE
>> score is: 2352
>> hit name is: sp|P49444|PP2C_PARTE
>> score is: 159
>> hit name is: sp|Q6ING9|PP2CK_XENLA
>> [...etc...]
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From dmessina at wustl.edu  Sun Apr 15 22:43:06 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 15 Apr 2007 21:43:06 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10008507.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
Message-ID: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>

You're right, it's not the input sequence. I just tried it with your  
script and it worked.


> is it possible that the script is not being about to read the
> RemoteBlast.pm?

I think the program wouldn't compile if that were the case, and your  
error message would be about not finding RemoteBlast.pm rather than  
the one you got.


> but the thing is, i can run the standalone blast on the
> command line, although i've never been able the run the same with  
> cgi module
> (by gettting the input from an html textarea). i don't understand.

This result really suggests that perl and Bioperl are not the issue.  
I'm not saying the following to give you the brushoff, but given the  
numerous ways in which web-based apps can fail and in which  
webservers can be installed, it might be best for you to find someone  
at your institution who can sit down with you and work through it.

Dave


From cjfields at uiuc.edu  Sun Apr 15 23:51:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 15 Apr 2007 22:51:05 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
Message-ID: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>

This sounds like a similar issue that popped up a few weeks ago  
related to URLAPI changes for remote BLAST access.  That was fixed on  
NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
well.

Saying that, my guess is the same as Dave's, that there are  
connectivity issues.  What happens when you set the RemoteBlast  
factory to a verbosity of 1?  This will spill out debugging output  
from the repeated queries to the NCBI server (so if there are  
problems they'll show up there).

...
my $factory = Bio::Tools::Run::RemoteBlast->new(
                                 '-prog'  => 'blastp',
                                 '-data' => 'swissprot',
                                  _READMETHOD => "Blast",
                                  -verbose => 1    # debugging output
                          );
...

If you see the BLAST report but get the same error try using the  
RemoteBlast in CVS to see if it fixes the problem.

chris


On Apr 15, 2007, at 9:43 PM, David Messina wrote:

> You're right, it's not the input sequence. I just tried it with your
> script and it worked.
>
>
>> is it possible that the script is not being about to read the
>> RemoteBlast.pm?
>
> I think the program wouldn't compile if that were the case, and your
> error message would be about not finding RemoteBlast.pm rather than
> the one you got.
>
>
>> but the thing is, i can run the standalone blast on the
>> command line, although i've never been able the run the same with
>> cgi module
>> (by gettting the input from an html textarea). i don't understand.
>
> This result really suggests that perl and Bioperl are not the issue.
> I'm not saying the following to give you the brushoff, but given the
> numerous ways in which web-based apps can fail and in which
> webservers can be installed, it might be best for you to find someone
> at your institution who can sit down with you and work through it.
>
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dr.hogart at gmail.com  Mon Apr 16 03:03:46 2007
From: dr.hogart at gmail.com (sergei ryazansky)
Date: Mon, 16 Apr 2007 11:03:46 +0400
Subject: [Bioperl-l] error with blast parsing by searchIO
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
	<C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>
Message-ID: <op.tqu68kvzavnppr@hogart.img.ras.ru>

The problem was resolved by the direct path (-file=>'d\...\input.txt') to  
input file in the my script.
I think that Chris right and i should update my bioperl to 1.5 version.
By the way, bioperl-1.5 is not accessible via ppm. Where I can download it  
for winXP?

On Sun, 15 Apr 2007 22:24:56 +0400, Jason Stajich <jason at bioperl.org>  
wrote:

> It looks like something is broken in your script as to how you are
> passing it a filename - it is trying to open a file called "BLASTN
> 2.2.13 [Nov-27-2005]".
> did you already open the file and are you passing data from the first
> line of the file to SearchIO perhaps?
> Sending the relevant part of your script to the list will help us
> diagnose the problem better.
>
> -jason
> On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote:
>
>> Hello all,
>>
>> script (parsing blastn report) that previously had worked today
>> "tell" me
>> that:
>>
>> ------------- EXCEPTION  -------------
>> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
>> : No such file or directory
>> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm:
>> 273
>> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
>> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
>> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
>> STACK toplevel parse-te-lib2.pl:3
>>
>> --------------------------------------
>>
>> What does it mean??
>>
>> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>


-- 
?????????? M2, ????????????? ???????? ?????????? Opera:  
http://www.opera.com/mail/mail/


From bix at sendu.me.uk  Mon Apr 16 04:34:56 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 16 Apr 2007 09:34:56 +0100
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqu68kvzavnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>	<C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>
	<op.tqu68kvzavnppr@hogart.img.ras.ru>
Message-ID: <46233530.1010109@sendu.me.uk>

sergei ryazansky wrote:
> The problem was resolved by the direct path (-file=>'d\...\input.txt') to  
> input file in the my script.
> I think that Chris right and i should update my bioperl to 1.5 version.
> By the way, bioperl-1.5 is not accessible via ppm. Where I can download it  
> for winXP?

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 10:36:33 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 22:36:33 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>


Dear all,
 
Given a GO id, is there a way to extract all
the related gene names from that id with Perl?
 
Anybody has experience with that?
I've looked through GO module in CPAN, but can't seem
to find any tool that facilitated that searc
 
Look forward very much for your advice.
 
--
Edward WIJAYA
SINGAPORE

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From spiros at lokku.com  Mon Apr 16 11:10:49 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 16 Apr 2007 16:10:49 +0100
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>

Hi Edward,

What organism are you interested in? I have some code from my PhD
based on the Saccharomyces cerevisiae genome. Basically uses the SGD
flat files and a local MySQL instance of GO. Might be worth turning
into modules if people are interested in it, although it is pretty
organism oriented and the lack of abstraction might introduce a number
of problems.

Spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 11:14:09 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 23:14:09 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>


Hi Spiros,
 
Thanks for your reply. I am interested to apply it for 
all the kind of organisms related to that particular GO ID.
 
Do you have a CPAN module for that?
--
Edward WIJAYA
SINGAPORE

________________________________

From: s.denaxas at gmail.com on behalf of Spiros Denaxas
Sent: Mon 4/16/2007 11:10 PM
To: Wijaya Edward
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


Hi Edward,

What organism are you interested in? I have some code from my PhD
based on the Saccharomyces cerevisiae genome. Basically uses the SGD
flat files and a local MySQL instance of GO. Might be worth turning
into modules if people are interested in it, although it is pretty
organism oriented and the lack of abstraction might introduce a number
of problems.

Spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From dmessina at wustl.edu  Mon Apr 16 11:21:01 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 16 Apr 2007 10:21:01 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>

I use BioMART for this kind of thing. If you need to do this for more  
than a couple of GO terms, BioMART has a Perl API you can use to  
connect to their data.

http://www.biomart.org/

http://www.biomart.org/install-overview.html

Dave


From spiros at lokku.com  Mon Apr 16 11:21:40 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 16 Apr 2007 16:21:40 +0100
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
Message-ID: <bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>

Nope, I don't have a CPAN module for it, and to be honest, I don't
think I will release one for it until I actually finish my PhD. The
code is really scruffy at some parts, lacks documentation and might
not work under all setups. My plan is to take some time after and
clean it up and release a proper version of it to the public.

What you are talking about however, if I understand correctly, is a
much much bigger project. Different genome databases have different
formats and a potential module must take them all into consideration.
Then the issue of the different evidence codes GO annotators use
throughout different genomes and which you consider of higher or lower
quality respective.

If you happen to stumble upon such a module, please share it, it would
be very interesting !

spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> >
> > Dear all,
> >
> > Given a GO id, is there a way to extract all
> > the related gene names from that id with Perl?
> >
> > Anybody has experience with that?
> > I've looked through GO module in CPAN, but can't seem
> > to find any tool that facilitated that searc
> >
> > Look forward very much for your advice.
> >
> > --
> > Edward WIJAYA
> > SINGAPORE
> >
> > ------------ Institute For Infocomm Research - Disclaimer -------------
> > This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> > --------------------------------------------------------
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
>


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 11:33:27 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 23:33:27 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>


Hi David, 
 
There seems to be no biomart-perl module in CPAN.
 
I tried their cvs:
cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login
 
But require password. Can suggest if there is another way to get this module?
 
--
Edward WIJAYA

________________________________

From: David Messina [mailto:dmessina at wustl.edu]
Sent: Mon 4/16/2007 11:21 PM
To: Wijaya Edward
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


I use BioMART for this kind of thing. If you need to do this for more 
than a couple of GO terms, BioMART has a Perl API you can use to 
connect to their data.

http://www.biomart.org/

http://www.biomart.org/install-overview.html

Dave


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From Kevin.M.Brown at asu.edu  Mon Apr 16 11:44:28 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 16 Apr 2007 08:44:28 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net><BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
	<3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
Message-ID: <1A4207F8295607498283FE9E93B775B4030A4914@EX02.asurite.ad.asu.edu>

Did you follow the directions as listed at?

http://www.biomart.org/install-overview.html 


> There seems to be no biomart-perl module in CPAN.
>  
> I tried their cvs:
> cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login
>  
> But require password. Can suggest if there is another way to 
> get this module?


From dmessina at wustl.edu  Mon Apr 16 11:44:26 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 16 Apr 2007 10:44:26 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
	<3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
Message-ID: <2D698B2E-49B9-411E-B1FA-C12F4A235EB2@wustl.edu>

The password you need to enter when asked is CVSUSER.

Dave


From sdavis2 at mail.nih.gov  Mon Apr 16 11:55:14 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 16 Apr 2007 11:55:14 -0400
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
	<bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>
Message-ID: <200704161155.14567.sdavis2@mail.nih.gov>


> > On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> > > Dear all,
> > >
> > > Given a GO id, is there a way to extract all
> > > the related gene names from that id with Perl?

This is a pretty simple problem if you have the data in a useable format.  The 
data that you need are available here:

ftp://ftp.ncbi.nih.gov/gene/DATA

The README file gives details, but the files in this directory are all 
tab-delimited text.  Download the gene2go.gz file, which contains a mapping 
from Entrez Gene ID to GO accession.  Then, download the gene_info.gz file, 
which contains the information about the Entrez Gene ID, including 
description, gene symbol, etc.  If you need to link to other data, you can of 
course download the respective files from NCBI.  You can either load the data 
into a SQL database of some type for general queries, or you can simply read 
them into perl directly (with appropriate data structures) to do you mapping.  
Since they are tab-delimited text, I would choose the database route and then 
use SQL and DBI to do the queries you like.

Sean


From cjfields at uiuc.edu  Mon Apr 16 12:25:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 11:25:42 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>

You can limit EntrezGene searches by Gene Ontology ID using the [Gene  
Ontology] field in queries.  The following query:

'9220[Gene Ontology]'

will give 120 gene IDs.  You can get the same list using the still- 
under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm  
still working on this):

my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                        -db => 'gene',
                                        -term => '9220[Gene Ontology]',
                                        -retmax => 300);
$esearch->get_response;
my @ids = $esearch->get_ids;
print join "\n", at ids;

In my opinion, Sean's idea of using SQL is probably better if you  
have tons of searches to do.

chris

On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:

>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Apr 16 14:34:25 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 13:34:25 -0500
Subject: [Bioperl-l] Bio::Matrix::PSM::ProtPsm
Message-ID: <CA820306-7480-478D-BD3E-A0F094943065@uiuc.edu>

I was going through tests converting to Test::More and found this  
module is largely unimplemented (relevant tests are in t/ProtPsm.t in  
CVS).  It was written by James Thompson a few years ago and the  
module docs seem to indicate some uncertainty on what this class is  
meant to accomplish.  Does anyone know the status of this code?

chris


From cjm at fruitfly.org  Mon Apr 16 14:49:23 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 11:49:23 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
Message-ID: <AAF82F3A-3C75-4D51-AFD4-FDE358391A03@fruitfly.org>


Download:
http://search.cpan.org/~cmungall/go-db-perl

or do:

cpan GO::AppHandle

The API call you want is here:
http://search.cpan.org/~cmungall/go-db-perl/GO/ 
AppHandle.pm#get_deep_products

Here is an example snippet:

   use GO::AppHandle;
   my $apph=GO::AppHandle->connect(@ARGV);
   my $go_acc = shift @ARGV;
   my $gps = $apph->get_deep_products({term=>{acc=>$go_acc}});
   foreach my $gp (@$gps) {
     printf "%s %s\n", $gp->xref->acc, $gp->symbol;
   }

You will need to download the GO Database.

Cheers
Chris

On Apr 16, 2007, at 8:14 AM, Wijaya Edward wrote:

>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO)  
> with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer  
>> -------------
>> This email is confidential and may be privileged.  If you are not  
>> the intended recipient, please delete it and notify us  
>> immediately. Please do not copy or use it for any purpose, or  
>> disclose its contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gdorjee at hotmail.com  Mon Apr 16 15:10:01 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:10:01 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
Message-ID: <10022463.post@talk.nabble.com>


hi Chris,
thanks for your reply. i set the RemoteBlast factory to a verbosity of 1,
and i get the same error message. i'm new to all these. so, could you plz
tell me how can i do the RemoteBlast in CVS that you've suggested.

cheers!!!
 

Chris Fields wrote:
> 
> This sounds like a similar issue that popped up a few weeks ago  
> related to URLAPI changes for remote BLAST access.  That was fixed on  
> NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
> well.
> 
> Saying that, my guess is the same as Dave's, that there are  
> connectivity issues.  What happens when you set the RemoteBlast  
> factory to a verbosity of 1?  This will spill out debugging output  
> from the repeated queries to the NCBI server (so if there are  
> problems they'll show up there).
> 
> ...
> my $factory = Bio::Tools::Run::RemoteBlast->new(
>                                  '-prog'  => 'blastp',
>                                  '-data' => 'swissprot',
>                                   _READMETHOD => "Blast",
>                                   -verbose => 1    # debugging output
>                           );
> ...
> 
> If you see the BLAST report but get the same error try using the  
> RemoteBlast in CVS to see if it fixes the problem.
> 
> chris
> 
> 
> On Apr 15, 2007, at 9:43 PM, David Messina wrote:
> 
>> You're right, it's not the input sequence. I just tried it with your
>> script and it worked.
>>
>>
>>> is it possible that the script is not being about to read the
>>> RemoteBlast.pm?
>>
>> I think the program wouldn't compile if that were the case, and your
>> error message would be about not finding RemoteBlast.pm rather than
>> the one you got.
>>
>>
>>> but the thing is, i can run the standalone blast on the
>>> command line, although i've never been able the run the same with
>>> cgi module
>>> (by gettting the input from an html textarea). i don't understand.
>>
>> This result really suggests that perl and Bioperl are not the issue.
>> I'm not saying the following to give you the brushoff, but given the
>> numerous ways in which web-based apps can fail and in which
>> webservers can be installed, it might be best for you to find someone
>> at your institution who can sit down with you and work through it.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022463
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Mon Apr 16 15:11:18 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:11:18 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
Message-ID: <10022464.post@talk.nabble.com>


Thank you, David.


David Messina-2 wrote:
> 
> You're right, it's not the input sequence. I just tried it with your  
> script and it worked.
> 
> 
>> is it possible that the script is not being about to read the
>> RemoteBlast.pm?
> 
> I think the program wouldn't compile if that were the case, and your  
> error message would be about not finding RemoteBlast.pm rather than  
> the one you got.
> 
> 
>> but the thing is, i can run the standalone blast on the
>> command line, although i've never been able the run the same with  
>> cgi module
>> (by gettting the input from an html textarea). i don't understand.
> 
> This result really suggests that perl and Bioperl are not the issue.  
> I'm not saying the following to give you the brushoff, but given the  
> numerous ways in which web-based apps can fail and in which  
> webservers can be installed, it might be best for you to find someone  
> at your institution who can sit down with you and work through it.
> 
> Dave
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022464
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjm at fruitfly.org  Mon Apr 16 14:41:59 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 11:41:59 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
Message-ID: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>


Unless the Entrez interface has changed since I last looked, the  
query below for "pyrimidine ribonucleotide biosynthetic process" will  
NOT perform the transitive closure over the graph; this means genes  
and gene products annotated to GO:0009174 "pyrimidine ribonucleoside  
monophosphate biosynthetic process", for example

On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:

> You can limit EntrezGene searches by Gene Ontology ID using the [Gene
> Ontology] field in queries.  The following query:
>
> '9220[Gene Ontology]'
>
> will give 120 gene IDs.  You can get the same list using the still-
> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
> still working on this):
>
> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                         -db => 'gene',
>                                         -term => '9220[Gene  
> Ontology]',
>                                         -retmax => 300);
> $esearch->get_response;
> my @ids = $esearch->get_ids;
> print join "\n", at ids;
>
> In my opinion, Sean's idea of using SQL is probably better if you
> have tons of searches to do.
>
> chris
>
> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer
>> -------------
>> This email is confidential and may be privileged.  If you are not
>> the intended recipient, please delete it and notify us immediately.
>> Please do not copy or use it for any purpose, or disclose its
>> contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Mon Apr 16 15:25:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 14:25:14 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
	<50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
Message-ID: <3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>

You are correct; it explains why the list is only 120 genes.  The  
only way (currently) to do so would be to perform the closure locally  
somehow (maybe via go-perl or similar).

chris

On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:

>
> Unless the Entrez interface has changed since I last looked, the  
> query below for "pyrimidine ribonucleotide biosynthetic process"  
> will NOT perform the transitive closure over the graph; this means  
> genes and gene products annotated to GO:0009174 "pyrimidine  
> ribonucleoside monophosphate biosynthetic process", for example
>
> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>
>> You can limit EntrezGene searches by Gene Ontology ID using the [Gene
>> Ontology] field in queries.  The following query:
>>
>> '9220[Gene Ontology]'
>>
>> will give 120 gene IDs.  You can get the same list using the still-
>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>> still working on this):
>>
>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>                                         -db => 'gene',
>>                                         -term => '9220[Gene  
>> Ontology]',
>>                                         -retmax => 300);
>> $esearch->get_response;
>> my @ids = $esearch->get_ids;
>> print join "\n", at ids;
>>
>> In my opinion, Sean's idea of using SQL is probably better if you
>> have tons of searches to do.
>>
>> chris
>>
>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>
>>>
>>> Dear all,
>>>
>>> Given a GO id, is there a way to extract all
>>> the related gene names from that id with Perl?
>>>
>>> Anybody has experience with that?
>>> I've looked through GO module in CPAN, but can't seem
>>> to find any tool that facilitated that searc
>>>
>>> Look forward very much for your advice.
>>>
>>> --
>>> Edward WIJAYA
>>> SINGAPORE
>>>
>>> ------------ Institute For Infocomm Research - Disclaimer
>>> -------------
>>> This email is confidential and may be privileged.  If you are not
>>> the intended recipient, please delete it and notify us immediately.
>>> Please do not copy or use it for any purpose, or disclose its
>>> contents to any other person. Thank you.
>>> --------------------------------------------------------
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr 16 15:27:32 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:27:32 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
Message-ID: <10022661.post@talk.nabble.com>


hi Chris, 
sorry to bother you again, but could you plz check the following script to
see what's wrong. i've been getting errors like :

Content-type: text/html
Software error:
------------- EXCEPTION  -------------
MSG:   (0) not Bio::Seq object or array of Bio::Seq objects or file name!
STACK Bio::Tools::Run::StandAloneBlast::blastall
/usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:532
STACK toplevel /usr/local/apache2/htdocs/rmtest.pl:46
--------------------------------------

#### the script ######
#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::SeqIO;
use Bio::SearchIO;
use Bio::DB::GenPept; 
use Bio::Tools::Run::StandAloneBlast;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

my $cgi = new CGI;

print $cgi->header,
$cgi->start_html(-title=>'A StandAloneBlast Test'),
$cgi->h1('Blast Result'),
$cgi->start_form,
"Enter or paste an amino-acid sequence? ",
$cgi->p,
$cgi->textarea(-name=>'name', rows=>10, -columns=>60),
$cgi->p,
$cgi->submit,
$cgi->end_form,
$cgi->hr;

open(OUTPUT,">result/query.faa");

if ($cgi->param()) {
        my $seq = $cgi->param('name');
        print OUTPUT $seq;

my @params = ('program'=>'blastp', 'database' =>
'/export/home/dorjee/database/nrpart', 'outfile' => 'result/blast.out',
_READMETHOD => 'Blast');
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

# Blast a sequence against a database:
my $str = Bio::SeqIO->new(-file => "result/query.faa", '-format' => 'Fasta'
);
my $input = $str->next_seq();
my $blast_report = $factory->blastall($input);
}


Chris Fields wrote:
> 
> This sounds like a similar issue that popped up a few weeks ago  
> related to URLAPI changes for remote BLAST access.  That was fixed on  
> NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
> well.
> 
> Saying that, my guess is the same as Dave's, that there are  
> connectivity issues.  What happens when you set the RemoteBlast  
> factory to a verbosity of 1?  This will spill out debugging output  
> from the repeated queries to the NCBI server (so if there are  
> problems they'll show up there).
> 
> ...
> my $factory = Bio::Tools::Run::RemoteBlast->new(
>                                  '-prog'  => 'blastp',
>                                  '-data' => 'swissprot',
>                                   _READMETHOD => "Blast",
>                                   -verbose => 1    # debugging output
>                           );
> ...
> 
> If you see the BLAST report but get the same error try using the  
> RemoteBlast in CVS to see if it fixes the problem.
> 
> chris
> 
> 
> On Apr 15, 2007, at 9:43 PM, David Messina wrote:
> 
>> You're right, it's not the input sequence. I just tried it with your
>> script and it worked.
>>
>>
>>> is it possible that the script is not being about to read the
>>> RemoteBlast.pm?
>>
>> I think the program wouldn't compile if that were the case, and your
>> error message would be about not finding RemoteBlast.pm rather than
>> the one you got.
>>
>>
>>> but the thing is, i can run the standalone blast on the
>>> command line, although i've never been able the run the same with
>>> cgi module
>>> (by gettting the input from an html textarea). i don't understand.
>>
>> This result really suggests that perl and Bioperl are not the issue.
>> I'm not saying the following to give you the brushoff, but given the
>> numerous ways in which web-based apps can fail and in which
>> webservers can be installed, it might be best for you to find someone
>> at your institution who can sit down with you and work through it.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022661
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Apr 16 15:37:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 14:37:58 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10022463.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
Message-ID: <5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>

The 'verbose' setting doesn't change the way the BLAST query is sent,  
it just sends the raw output from the repeated attempts to retrieve  
the report (using the RID) to STDERR.  The error you saw won't be  
fixed by doing so.

What I was interested in was the raw HTML output dumped to the  
screen.  If it is querying the NCBI server it should dump stuff that  
includes something like this:

...
<HTML>
<p></p>
<!--
QBlastInfoBegin
         Status=WAITING
QBlastInfoEnd
--><p></p>
<SCRIPT LANGUAGE="JavaScript"><!--
...

which indicates you have a request in the BLAST queue.  If you aren't  
seeing anything then the problem is likely network-related on your  
end, so getting the latest RemoteBlast won't help.  Do any other  
BioPerl modules requiring network access work (Bio::DB::GenBank, for  
instance)?  If not it could be a proxy issue...

Just in case, here's the browsable CVS location for RemoteBlast:

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
Tools/Run/RemoteBlast.pm?cvsroot=bioperl

Click on the download link and save over your local version.

chris

On Apr 16, 2007, at 2:10 PM, DeeGee wrote:

>
> hi Chris,
> thanks for your reply. i set the RemoteBlast factory to a verbosity  
> of 1,
> and i get the same error message. i'm new to all these. so, could  
> you plz
> tell me how can i do the RemoteBlast in CVS that you've suggested.
>
> cheers!!!


From gdorjee at hotmail.com  Mon Apr 16 16:42:37 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 13:42:37 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
Message-ID: <10024333.post@talk.nabble.com>


hi 
i tried the following code just to check the network, and it worked fine
except for the SwissProt part, for which i got the error message instead of
the sequence:

------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq
/usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
STACK toplevel bbbbb.pl:21
--------------------------------------

#### check #####
#!/usr/bin/perl -w
use strict;
use Bio::DB::GenBank;
use Bio::DB::SwissProt;
use Bio::DB::GenPept;
use Bio::SeqIO;

my $genpeptdb = new Bio::DB::GenPept();
my $genbankdb = new Bio::DB::GenBank();
my $swissdb = new Bio::DB::SwissProt();

my $seqio = new Bio::SeqIO(-format => 'fasta',
                           -fh     => \*STDOUT);

my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
$seqio->write_seq($protseq);

my $seq = $genbankdb->get_Seq_by_acc('AF303112');
$seqio->write_seq($seq);

$protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
$seqio->write_seq($protseq);

thanks a lot.


Chris Fields wrote:
> 
> The 'verbose' setting doesn't change the way the BLAST query is sent,  
> it just sends the raw output from the repeated attempts to retrieve  
> the report (using the RID) to STDERR.  The error you saw won't be  
> fixed by doing so.
> 
> What I was interested in was the raw HTML output dumped to the  
> screen.  If it is querying the NCBI server it should dump stuff that  
> includes something like this:
> 
> ...
> <HTML>
> <p></p>
> <!--
> QBlastInfoBegin
>          Status=WAITING
> QBlastInfoEnd
> --><p></p>
> <SCRIPT LANGUAGE="JavaScript"><!--
> ...
> 
> which indicates you have a request in the BLAST queue.  If you aren't  
> seeing anything then the problem is likely network-related on your  
> end, so getting the latest RemoteBlast won't help.  Do any other  
> BioPerl modules requiring network access work (Bio::DB::GenBank, for  
> instance)?  If not it could be a proxy issue...
> 
> Just in case, here's the browsable CVS location for RemoteBlast:
> 
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
> 
> Click on the download link and save over your local version.
> 
> chris
> 
> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
> 
>>
>> hi Chris,
>> thanks for your reply. i set the RemoteBlast factory to a verbosity  
>> of 1,
>> and i get the same error message. i'm new to all these. so, could  
>> you plz
>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>
>> cheers!!!
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10024333
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Apr 16 18:24:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 17:24:11 -0500
Subject: [Bioperl-l] HOWTO:Writing BioPerl Tests
Message-ID: <547A30CD-6BAA-4C08-A935-9975634691B2@uiuc.edu>

I have posted a quickie HOWTO on writing up BioPerl tests using  
Test::More.  If anyone wants to add to it feel free (make sure to  
credit yourself in the authors section).

http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

There is space in there if we decide to add more modules for  
enhancing tests (I think Nathan suggested Test::Exception or similar).

chris


From cjfields at uiuc.edu  Mon Apr 16 19:24:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 18:24:32 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10024333.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
	<10024333.post@talk.nabble.com>
Message-ID: <54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>

What version of bioperl are you using?  I get an error but it is b/c  
the ID doesn't exist.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: acc KPYK_ECOLI does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /Users/cjfields/src/bioperl- 
live/Bio/DB/WebDBSeqI.pm:181
STACK: genpept.pl:21
-----------------------------------------------------------

The actual accession is 'KPYK1_ECOLI'.

chris

On Apr 16, 2007, at 3:42 PM, DeeGee wrote:

>
> hi
> i tried the following code just to check the network, and it worked  
> fine
> except for the SwissProt part, for which i got the error message  
> instead of
> the sequence:
>
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq
> /usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
> STACK toplevel bbbbb.pl:21
> --------------------------------------
>
> #### check #####
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::GenBank;
> use Bio::DB::SwissProt;
> use Bio::DB::GenPept;
> use Bio::SeqIO;
>
> my $genpeptdb = new Bio::DB::GenPept();
> my $genbankdb = new Bio::DB::GenBank();
> my $swissdb = new Bio::DB::SwissProt();
>
> my $seqio = new Bio::SeqIO(-format => 'fasta',
>                            -fh     => \*STDOUT);
>
> my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
> $seqio->write_seq($protseq);
>
> my $seq = $genbankdb->get_Seq_by_acc('AF303112');
> $seqio->write_seq($seq);
>
> $protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
> $seqio->write_seq($protseq);
>
> thanks a lot.
>
>
> Chris Fields wrote:
>>
>> The 'verbose' setting doesn't change the way the BLAST query is sent,
>> it just sends the raw output from the repeated attempts to retrieve
>> the report (using the RID) to STDERR.  The error you saw won't be
>> fixed by doing so.
>>
>> What I was interested in was the raw HTML output dumped to the
>> screen.  If it is querying the NCBI server it should dump stuff that
>> includes something like this:
>>
>> ...
>> <HTML>
>> <p></p>
>> <!--
>> QBlastInfoBegin
>>          Status=WAITING
>> QBlastInfoEnd
>> --><p></p>
>> <SCRIPT LANGUAGE="JavaScript"><!--
>> ...
>>
>> which indicates you have a request in the BLAST queue.  If you aren't
>> seeing anything then the problem is likely network-related on your
>> end, so getting the latest RemoteBlast won't help.  Do any other
>> BioPerl modules requiring network access work (Bio::DB::GenBank, for
>> instance)?  If not it could be a proxy issue...
>>
>> Just in case, here's the browsable CVS location for RemoteBlast:
>>
>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
>>
>> Click on the download link and save over your local version.
>>
>> chris
>>
>> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
>>
>>>
>>> hi Chris,
>>> thanks for your reply. i set the RemoteBlast factory to a verbosity
>>> of 1,
>>> and i get the same error message. i'm new to all these. so, could
>>> you plz
>>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>>
>>> cheers!!!
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/error-while- 
> remote-blast-against-swissprot-db-tf3577674.html#a10024333
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjm at fruitfly.org  Mon Apr 16 20:59:46 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 17:59:46 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
	<50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
	<3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>
Message-ID: <9612F3E7-F239-49C1-A5BE-E10FF2FC2063@fruitfly.org>


You could perform the closure locally and then iterate over the  
individual IDs or construct a big disjunctive query to Entrez -  
either way it's not so efficient, especially for less specific nodes  
(distributed queries with ontologies is an interesting challenge).

Soon you'll be able to do the same query over the GO Database / AmiGO  
using a REST API

On Apr 16, 2007, at 12:25 PM, Chris Fields wrote:

> You are correct; it explains why the list is only 120 genes.  The
> only way (currently) to do so would be to perform the closure locally
> somehow (maybe via go-perl or similar).
>
> chris
>
> On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:
>
>>
>> Unless the Entrez interface has changed since I last looked, the
>> query below for "pyrimidine ribonucleotide biosynthetic process"
>> will NOT perform the transitive closure over the graph; this means
>> genes and gene products annotated to GO:0009174 "pyrimidine
>> ribonucleoside monophosphate biosynthetic process", for example
>>
>> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>>
>>> You can limit EntrezGene searches by Gene Ontology ID using the  
>>> [Gene
>>> Ontology] field in queries.  The following query:
>>>
>>> '9220[Gene Ontology]'
>>>
>>> will give 120 gene IDs.  You can get the same list using the still-
>>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>>> still working on this):
>>>
>>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                         -db => 'gene',
>>>                                         -term => '9220[Gene
>>> Ontology]',
>>>                                         -retmax => 300);
>>> $esearch->get_response;
>>> my @ids = $esearch->get_ids;
>>> print join "\n", at ids;
>>>
>>> In my opinion, Sean's idea of using SQL is probably better if you
>>> have tons of searches to do.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> Given a GO id, is there a way to extract all
>>>> the related gene names from that id with Perl?
>>>>
>>>> Anybody has experience with that?
>>>> I've looked through GO module in CPAN, but can't seem
>>>> to find any tool that facilitated that searc
>>>>
>>>> Look forward very much for your advice.
>>>>
>>>> --
>>>> Edward WIJAYA
>>>> SINGAPORE
>>>>
>>>> ------------ Institute For Infocomm Research - Disclaimer
>>>> -------------
>>>> This email is confidential and may be privileged.  If you are not
>>>> the intended recipient, please delete it and notify us immediately.
>>>> Please do not copy or use it for any purpose, or disclose its
>>>> contents to any other person. Thank you.
>>>> --------------------------------------------------------
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 23:51:18 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Tue, 17 Apr 2007 11:51:18 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu><50A1CCF2-4650-4F87-8386-DB0
	E87292023@fruitfly.org><3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu><9612
	F3E7-F239-49C1-A5BE-E10FF2FC2063@fruitfly.org>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061686@mailbe01.teak.local.net>


Thanks so much for all the suggestion.
It was really helpful to me. 
 
--
Edward WIJAYA

________________________________

From: Chris Mungall [mailto:cjm at fruitfly.org]
Sent: Tue 4/17/2007 8:59 AM
To: Chris Fields
Cc: bioperl-l at lists.open-bio.org; Wijaya Edward
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


You could perform the closure locally and then iterate over the 
individual IDs or construct a big disjunctive query to Entrez - 
either way it's not so efficient, especially for less specific nodes 
(distributed queries with ontologies is an interesting challenge).

Soon you'll be able to do the same query over the GO Database / AmiGO 
using a REST API

On Apr 16, 2007, at 12:25 PM, Chris Fields wrote:

> You are correct; it explains why the list is only 120 genes.  The
> only way (currently) to do so would be to perform the closure locally
> somehow (maybe via go-perl or similar).
>
> chris
>
> On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:
>
>>
>> Unless the Entrez interface has changed since I last looked, the
>> query below for "pyrimidine ribonucleotide biosynthetic process"
>> will NOT perform the transitive closure over the graph; this means
>> genes and gene products annotated to GO:0009174 "pyrimidine
>> ribonucleoside monophosphate biosynthetic process", for example
>>
>> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>>
>>> You can limit EntrezGene searches by Gene Ontology ID using the 
>>> [Gene
>>> Ontology] field in queries.  The following query:
>>>
>>> '9220[Gene Ontology]'
>>>
>>> will give 120 gene IDs.  You can get the same list using the still-
>>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>>> still working on this):
>>>
>>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                         -db => 'gene',
>>>                                         -term => '9220[Gene
>>> Ontology]',
>>>                                         -retmax => 300);
>>> $esearch->get_response;
>>> my @ids = $esearch->get_ids;
>>> print join "\n", at ids;
>>>
>>> In my opinion, Sean's idea of using SQL is probably better if you
>>> have tons of searches to do.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> Given a GO id, is there a way to extract all
>>>> the related gene names from that id with Perl?
>>>>
>>>> Anybody has experience with that?
>>>> I've looked through GO module in CPAN, but can't seem
>>>> to find any tool that facilitated that searc
>>>>
>>>> Look forward very much for your advice.
>>>>
>>>> --
>>>> Edward WIJAYA
>>>> SINGAPORE
>>>>
>>>> ------------ Institute For Infocomm Research - Disclaimer
>>>> -------------
>>>> This email is confidential and may be privileged.  If you are not
>>>> the intended recipient, please delete it and notify us immediately.
>>>> Please do not copy or use it for any purpose, or disclose its
>>>> contents to any other person. Thank you.
>>>> --------------------------------------------------------
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From hlapp at gmx.net  Tue Apr 17 00:00:55 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 17 Apr 2007 00:00:55 -0400
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>

Hi Leighton, please see below:

On Apr 16, 2007, at 11:55 AM, Leighton Pritchard wrote:

> Hi,
>
> I've been trying to upload the GO into a clean BioSQL (MySQL, 1.4.1)
> schema using the BioPerl bp_load_ontology.pl script, with the OBOv1.0,
> OBOv1.2, and the most recent flatfiles from
> http://www.geneontology.org/GO.downloads.ontology.shtml - none of my
> attempts have been successful.  The errors below are from a Linux
> installation, but the same errors are thrown on OS X, too.  I am using
> the most recent versions of BioPerl and bioperl-db, installed via  
> CPAN:
>
> [lpritc at lplinuxdev sequence_data]$ perl -MBio::Root::Version -e 'print
> $Bio::Root::Version::VERSION,"\n"'
> 1.005002102
>
> and bioperl-db 1.5.2.
>
> I have attached the traceback below (running with --safe throws a  
> number
> of equivalent errors),

Using --safe will throw the same errors, but will continue loading.  
I.e., you'd lose the one term, but keep everything else.

I do realize that especially for a graph losing an internal node can  
be quite detrimental.

> [...]
> ########
>
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format obo ~/Downloads/gene_ontology_edit.obo
> Loading ontology gene_ontology:
>         ... terms
>         ... relationships
>         Done with gene_ontology.
> Loading ontology biological_process:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------

This would point to a problem of the BioPerl obo parser. According to  
the message, both the database name and the accession of the db_xref  
for the term are - surely erroneously - empty. Apparently the parser  
fails to parse out database and accession for this db_xref of term GO: 
0018901.

If you can edit the obo file, you can try deleting the db_xref(s) for  
that term that look odd (or delete all if you don't need them).

I'd have to debug the obo parser to see exactly where it's going  
wrong in parsing.

> Could not store term GO:0018901, name '2,4-dichlorophenoxyacetic acid
> metabolic process':
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> [...]
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format goflat --fmtargs ~/Downloads/GO.defs

Note that the argument for --fmtargs here should read
"-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
there is no tilde expansion.)

> ~/Downloads/function.ontology
> Loading ontology Gene Ontology:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("MetaCyc","2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RXN","0","")  
> FKs
> ()
> Duplicate entry '2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX- 
> MetaCyc-0' for
> key 2
> ---------------------------------------------------

This is one the things why you've got to love MySQL (and I am correct  
in inferring that you're using MySQL?). The width of the  
dbxref.accession column (for which the second value in parentheses  
is) is 40 chars. The apparently pre-existing value ("2\,3- 
DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX-MetaCyc-0") is 50 chars, which  
when loaded should have resulted in an exception. Instead, MySQL just  
simply and silently truncates it to 40 chars, which makes it  
identical to the first 40 chars of "2\,3-DIHYDROXYINDOLE-2\,3- 
DIOXYGENASE-RXN" (which is 41 chars in length).

It may be necessary to widen the length of dbname.accession here, for  
example to 80 chars? Let me know if you need help with the DDL  
command to do this.

Let me know how far this gets you.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lubapardo at gmail.com  Tue Apr 17 05:16:04 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 17 Apr 2007 10:16:04 +0100
Subject: [Bioperl-l] CVS AND PAML
Message-ID: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>

Dear all,
I have two questions.
1.) I am trying to download some modules from Bioperl-run via CVS but I can
not login.

$ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.

The error I get is: time out, failed to connect to the server. I have
no trouble to download other files and I installed bioperl modules via
CPAN and it works.

2) The second question I have is that I am using the PAML:CODEML
module to do phylogenetic analysis.

I have used the example provided in the HOWTO:PAML (also given as
example: pairwise_ka_ks.PL). The program does not crash but it returns
and empty object. I think the problem is in the last part of the
script because I manage to get sequences and also the alignment, but I
can not get any ka, ks value. I am not sure whether there is a bug in
the last part of the script.

Does anyone have an idea?

Thank you very much

Luba Pardo

$kaks_factory->alignment($dna_aln);

# run the KaKs analysis
my ($rc,$parser) = $kaks_factory->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();

my @otus = $result->get_seqs();
# this gives us a mapping from the PAML order of sequences back to
# the input order (since names get truncated)
my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
    my $c= 1;
    foreach my $s ( @each ) {
      last if( $s->display_id eq $_->display_id );
      $c++;
    }
    $c;
   } @otus;

print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
<http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
<http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
for( my $i = 0; $i < (scalar
<http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
$i++) {
  for( my $j = $i+1; $j < (scalar
<http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
    my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
    my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
    print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
join <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
$otus[$i]->display_id,
                         $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
                         $MLmatrix->[$i]->[$j]->{'dS'},
                         $MLmatrix->[$i]->[$j]->{'omega'},
                         sprintf
<http://www.perldoc.com/perl5.6/pod/func/sprintf.html>("%.2f",$sub_aa_aln->percentage_identity),
                         sprintf
<http://www.perldoc.com/perl5.6/pod/func/sprintf.html>("%.2f",$sub_dna_aln->percentage_identity),
                         ), "\n";
  }
}


From avilella at gmail.com  Tue Apr 17 05:25:40 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 17 Apr 2007 10:25:40 +0100
Subject: [Bioperl-l] CVS AND PAML
In-Reply-To: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
References: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
Message-ID: <358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>

hmmm, there are some perldoc links around your code snippet. can you post
the code again? what is the input data you are trying this with?

On 4/17/07, Luba Pardo <lubapardo at gmail.com> wrote:
>
> Dear all,
> I have two questions.
> 1.) I am trying to download some modules from Bioperl-run via CVS but I
> can
> not login.
>
> $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.
>
> The error I get is: time out, failed to connect to the server. I have
> no trouble to download other files and I installed bioperl modules via
> CPAN and it works.
>
> 2) The second question I have is that I am using the PAML:CODEML
> module to do phylogenetic analysis.
>
> I have used the example provided in the HOWTO:PAML (also given as
> example: pairwise_ka_ks.PL). The program does not crash but it returns
> and empty object. I think the problem is in the last part of the
> script because I manage to get sequences and also the alignment, but I
> can not get any ka, ks value. I am not sure whether there is a bug in
> the last part of the script.
>
> Does anyone have an idea?
>
> Thank you very much
>
> Luba Pardo
>
> $kaks_factory->alignment($dna_aln);
>
> # run the KaKs analysis
> my ($rc,$parser) = $kaks_factory->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
>
> my @otus = $result->get_seqs();
> # this gives us a mapping from the PAML order of sequences back to
> # the input order (since names get truncated)
> my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
>     my $c= 1;
>     foreach my $s ( @each ) {
>       last if( $s->display_id eq $_->display_id );
>       $c++;
>     }
>     $c;
>    } @otus;
>
> print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
> <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
> <http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
> Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
> for( my $i = 0; $i < (scalar
> <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
> $i++) {
>   for( my $j = $i+1; $j < (scalar
> <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
>     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>     print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
> join <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
> $otus[$i]->display_id,
>
> $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>                          $MLmatrix->[$i]->[$j]->{'dS'},
>                          $MLmatrix->[$i]->[$j]->{'omega'},
>                          sprintf
> <http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> >("%.2f",$sub_aa_aln->percentage_identity),
>                          sprintf
> <http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> >("%.2f",$sub_dna_aln->percentage_identity),
>                          ), "\n";
>   }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From IoannisKirmitzoglou at gmail.com  Tue Apr 17 09:05:37 2007
From: IoannisKirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Tue, 17 Apr 2007 06:05:37 -0700 (PDT)
Subject: [Bioperl-l] Parsing FASTA m10 output
Message-ID: <10034698.post@talk.nabble.com>


I apologize if this question has already been answered but my search came up
with no relevant results.
I am new to the FASTA program and after reading the fasta3x.doc I decided to
run it using the m10 output. The reason for doing such a choice was 

Quote from fasta3x.doc:  
     -m 10 is a new, parseable format for use with other
     programs.... 


I ran FASTA in batch mode and waited about 3-4 days for the results.
My problem is that today, when i started writing a perl script to parse the
output I realized that SearchIO doesn't supports m10 format.
Seems like I had to be more careful...
Before starting coding a module that will be able to parse the output (or
re-running FASTA with -m9 -d0 switches which will take 4 more days) I would
be really thankful if any of you knows of any other way to parse those
files?
Thanks in advance...

Ioannis Kirmitzoglou, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus

-- 
View this message in context: http://www.nabble.com/Parsing-FASTA-m10-output-tf3590568.html#a10034698
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From ewijaya at i2r.a-star.edu.sg  Tue Apr 17 09:10:00 2007
From: ewijaya at i2r.a-star.edu.sg (Edward WIJAYA)
Date: Tue, 17 Apr 2007 21:10:00 +0800
Subject: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
Message-ID: <462473B7.4070905@i2r.a-star.edu.sg>


Dear all,

How do you usually construct a graph for TFBS (binding sites) position
within their sequences? I was thinking to build something like this kind of
visualization tool:

http://research.i2r.a-star.edu.sg/Dragon/Motif_Search/cgi-bin/tmp/29740M1.html

or

http://wingless.cs.washington.edu:8080/assessment/servlet?filenameID=submission/SPACE.D9F26D506DE90E9A0A0010BB6BCCAEF3&pageType=visualizationForm&action=Visualize+It

Is there a BioPerl module to do that?

--
Edward


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From lubapardo at gmail.com  Tue Apr 17 10:01:57 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 17 Apr 2007 15:01:57 +0100
Subject: [Bioperl-l] CVS AND PAML
In-Reply-To: <358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>
References: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
	<358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>
Message-ID: <58ff33550704170701p1207ad51r271b0aff235bfd05@mail.gmail.com>

Hi,
Sorry. Bellow is the code. The part of the code that does not work is when
using the codeml module.
Thanks
Luba
# for projecting alignments from protein to R/DNA space
use Bio::Align::Utilities qw(aa_to_dna_aln);
# for input of the sequence data
use Bio::SeqIO;
use Bio::AlignIO;

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new;
my $seqdata = shift || 'cds.fa';

my $seqio = new Bio::SeqIO(-file   => $seqdata,
                           -format => 'fasta');
my %seqs;
my @prots;
# process each sequence
while ( my $seq = $seqio->next_seq ) {
    $seqs{$seq->display_id} = $seq;
    # translate them into protein
    my $protein = $seq->translate();
    my $pseq = $protein->seq();
    if( $pseq =~ /\*/ &&
        $pseq !~ /\*$/ ) {
          warn("provided a CDS sequence with a stop codon, PAML will
choke!");
          exit(0);
    }
    # Tcoffee can't handle '*' even if it is trailing
    $pseq =~ s/\*//g;
    $protein->seq($pseq);
    push @prots, $protein;
}

if( @prots < 2 ) {
    warn("Need at least 2 CDS sequences to proceed");
    exit(0);
}

open(OUT, ">align_output.txt") ||  die("cannot open output align_output for
writing");
# Align the sequences with clustalw
my $aa_aln = $aln_factory->align(\@prots);
# project the protein alignment back to CDS coordinates
my $dna_aln = aa_to_dna_aln($aa_aln, \%seqs);

my @each = $dna_aln->each_seq();

my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
                   ( -params => { 'runmode' => -2,
                                  'seqtype' => 1,
                                } );

# set the alignment object
$kaks_factory->alignment($dna_aln);

# run the KaKs analysis
my ($rc,$parser) = $kaks_factory->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();

my @otus = $result->get_seqs();
# this gives us a mapping from the PAML order of sequences back to
# the input order (since names get truncated)
my @pos = map {
    my $c= 1;
    foreach my $s ( @each ) {
      last if( $s->display_id eq $_->display_id );
      $c++;
    }
    $c;
   } @otus;

print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
CDNA_PERCENTID)),"\n";
for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
  for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
    my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
    my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
    print OUT join("\t", $otus[$i]->display_id,

$otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
                         $MLmatrix->[$i]->[$j]->{'dS'},
                         $MLmatrix->[$i]->[$j]->{'omega'},
                         sprintf("%.2f",$sub_aa_aln->percentage_identity),
                         sprintf("%.2f",$sub_dna_aln->percentage_identity),
                         ), "\n";
  }
}

On 17/04/07, Albert Vilella <avilella at gmail.com> wrote:
>
> hmmm, there are some perldoc links around your code snippet. can you post
> the code again? what is the input data you are trying this with?
>
> On 4/17/07, Luba Pardo <lubapardo at gmail.com> wrote:
>
> > Dear all,
> > I have two questions.
> > 1.) I am trying to download some modules from Bioperl-run via CVS but I
> > can
> > not login.
> >
> > $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.
> >
> > The error I get is: time out, failed to connect to the server. I have
> > no trouble to download other files and I installed bioperl modules via
> > CPAN and it works.
> >
> > 2) The second question I have is that I am using the PAML:CODEML
> > module to do phylogenetic analysis.
> >
> > I have used the example provided in the HOWTO:PAML (also given as
> > example: pairwise_ka_ks.PL). The program does not crash but it returns
> > and empty object. I think the problem is in the last part of the
> > script because I manage to get sequences and also the alignment, but I
> > can not get any ka, ks value. I am not sure whether there is a bug in
> > the last part of the script.
> >
> > Does anyone have an idea?
> >
> > Thank you very much
> >
> > Luba Pardo
> >
> > $kaks_factory->alignment($dna_aln);
> >
> > # run the KaKs analysis
> > my ($rc,$parser) = $kaks_factory->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> >
> > my @otus = $result->get_seqs();
> > # this gives us a mapping from the PAML order of sequences back to
> > # the input order (since names get truncated)
> > my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
> >     my $c= 1;
> >     foreach my $s ( @each ) {
> >       last if( $s->display_id eq $_->display_id );
> >       $c++;
> >     }
> >     $c;
> >    } @otus;
> >
> > print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
> > < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
> > <http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
> > Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
> > for( my $i = 0; $i < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
> > $i++) {
> >   for( my $j = $i+1; $j < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
> >     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
> >     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
> >     print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
> > join < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
> > $otus[$i]->display_id,
> >
> > $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
> >                          $MLmatrix->[$i]->[$j]->{'dS'},
> >                          $MLmatrix->[$i]->[$j]->{'omega'},
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_aa_aln->percentage_identity),
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_dna_aln->percentage_identity),
> >                          ), "\n";
> >   }
> > }
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>


From alexl at users.sourceforge.net  Tue Apr 17 09:54:13 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Tue, 17 Apr 2007 06:54:13 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu> (Chris Fields's
	message of "Fri\, 30 Mar 2007 23\:39\:15 -0500")
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
Message-ID: <5h4pnff6nu.fsf@delpy.biol.berkeley.edu>

On Mar 30, 2007, at 11:02 PM, Allen Day wrote:

[...]

>> If we're in agreement that the primary data sets and
>> libraries/applications for producing derivative data should not be
>> present in Fedora Extras, then it follows that the Bioperl classes
>> for manipulating these primary and derivative data should also not
>> be present in Fedora Extras as they are of little use without data
>> to manipulate.

Chris Fields wrote:

CF> I respectfully disagree.  BioPerl, to me, is a toolkit which helps
CF> accomplish certain tasks.  As with any toolkit, not all parts are
CF> required to do what one needs.  A good number of end-users use
CF> BioPerl for remote database queries
CF> (Bio::DB::GenBank/Taxonomy/etc), remote BLAST, seq analysis,
CF> alignment analysis, phylogenetic tree manipulation, etc, none of
CF> which require outside apps be installed.  For many a remote db is
CF> their primary source of data; not everybody sets up BioPerl for
CF> accessing local db records, running programs, etc (just the smart
CF> ones!).  As for outside apps, the docs are pretty explicit where
CF> certain outside resources (libxml2, expat, libgd) are needed for
CF> functionality.

CF> When we package up a new release we generally have ActiveState PPM
CF> archives available for Win32 users who want an easy way to install
CF> BioPerl.  I wouldn't have a problem if ActiveState wanted to post
CF> these to their repository.  Why would allowing someone to do the
CF> same for fedora extras be any different?

Hi all,

Given that there seems to be a reasonable consensus (including list
discussion here as well as in private e-mail) from bioperl folks that
including bioperl in Fedora is OK, I'm going ahead and building
bioperl for Fedora >= 6 (it's currently in the development branch).  I
thought about the issue carefully and this seems to makes sense for
several reasons:

1. Biopackages.net isn't currently building packages for Fedora Core 6
   and later (as Allen said, that may happen later when more build
   resources come online).  I won't build perl-bioperl for FC-5 or
   earlier to make sure that the Fedora package doesn't disrupt
   installations with the biopackages.net version.

2. Currently I've only run the the base bioperl (live) package through
   the reviewing gauntlet, but I plan to add the bioperl-run package
   as well.  Even though the bioperl-run package is intended to use
   third party packages (e.g. Clustal etc.) which may not be
   distributed with Fedora, it appears that the bioperl-run package
   contains code that can download those packages directly (albeit
   outside the RPM package system).  And some of the external tools
   could be packaged in Fedora because they have open-source licenses
   (e.g. Wise2, EMBOSS, NCBI toolkit etc.)

   Furthermore it appears the biopackages.net version of that package
   doesn't actually have "Requires:" that would automatically install
   those third-party tool that is run via bioperl (e.g. Clustal) in
   any case, so when biopackages start building for >FC-6 the Fedora
   perl-bioperl* packages can function as a drop-in replacement
   without disturbing other biopackages dependencies such as genome
   databases.

3. Third-party packages that can't be included directly in Fedora
   (such as Clustal) that can be used by bioperl-run could still be
   added via third-party repos like biopackages.net, in the same way
   that the multimedia packages gstreamer and gstreamer-plugins-good
   live in Fedora, but gstreamer-plugins-bad containing patent
   encumbered MP3 codecs with live in Livna.

Cheers,
Alex


From cjfields at uiuc.edu  Tue Apr 17 10:35:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 09:35:10 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
Message-ID: <0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>

On Apr 17, 2007, at 8:54 AM, Alex Lancaster wrote:

> Hi all,
>
> Given that there seems to be a reasonable consensus (including list
> discussion here as well as in private e-mail) from bioperl folks that
> including bioperl in Fedora is OK, I'm going ahead and building
> bioperl for Fedora >= 6 (it's currently in the development branch).  I
> thought about the issue carefully and this seems to makes sense for
> several reasons:
>
> ...
> 2. Currently I've only run the the base bioperl (live) package through
>    the reviewing gauntlet, but I plan to add the bioperl-run package
>    as well.  Even though the bioperl-run package is intended to use
>    third party packages (e.g. Clustal etc.) which may not be
>    distributed with Fedora, it appears that the bioperl-run package
>    contains code that can download those packages directly (albeit
>    outside the RPM package system).  And some of the external tools
>    could be packaged in Fedora because they have open-source licenses
>    (e.g. Wise2, EMBOSS, NCBI toolkit etc.)
...

Do you mean the bioperl core modules instead of "bioperl-live"?  We  
use the term "bioperl-live" to designate code updated regularly via  
CVS, which can be buggy depending on when it's retrieved.

I'm not sure how others feel about this, but it's probably best to  
stick with either the latest official releases (v 1.5.2 at this time)  
or even GBrowse-sponsored interim releases (which fix GBrowse-related  
bugs and normally pass tests).

chris


From hlapp at gmx.net  Tue Apr 17 11:09:45 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 17 Apr 2007 11:09:45 -0400
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>


On Apr 17, 2007, at 9:35 AM, Leighton Pritchard wrote:

> Hi Hilmar,
>
> Thanks for the very quick response.  Apologies for the long reply,  
> but I
> thought it might be useful if anyone else happens across the same
> problems that I did.

Thanks for reporting all these.

> [...]
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------
> Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
> lactonase activity':
> [...]
> I tracked this down to an apparently poor formatting of the GO.defs  
> file
> (note that the first and third definition_lines appear to be two  
> halves
> of the same entry):
>
> term: 2-pyrone-4,6-dicarboxylate lactonase activity
> goid: GO:0047554
> definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate +  
> H2O
> = 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
> definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN

I wonder whether this is the line that throws the parser off. It  
looks like the database part of the reference is missing - bad.

> definition_reference: EC:3.1.1.57
> definition_reference: MetaCyc:2-PYRONE-4
>
> I found 43 similar errors for other GOIDs, and it appears to result  
> from
> the occurrence of the string "\," in a dbxref - mostly MetaCyc  
> entries,
> but also some UM-BBD_pathwayID entries.

I'm not sure - although the string "\," might indeed trip up the  
parser, would have to investigate to confirm. Could it be a  
coincidence with definition_references that lack the database part  
before the colon?

>
> These errors appear to have followed through into the generation of  
> the
> OBO format files in each case, e.g.:
>
> def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
> 4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE- 
> LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]

Again, the first db_xref lacks the database in front of the colon. I  
can also see why "\," will trip up the parser in this format.

>
> and so is something for the GO guys to fix, I guess.

The lack of a database for certain xrefs surely is. If the escaped  
comma does throw off the BioPerl parser then that part is for BioPerl  
to fix. It does seem to extract the parts correctly, if the error  
message is any indication, though you may argue that it should remove  
the escaping backslashes (and I'd certainly agree with that).

>
>
> Another error is thrown after fixing the above, though (with the same
> command as before):
>
> Loading ontology Gene Ontology:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values  
> were
> ("GO:0006905","vesicle transport","OBSOLETE (was not defined before
> being made obsolete).","X","") FKs (1)
> Duplicate entry 'vesicle transport-1-X' for key 3
> ---------------------------------------------------
> Could not store term GO:0006905, name 'vesicle transport':
> [...]
> There are duplicate terms, identical in the term table except for  
> GOID:
> GO:0006905 and GO:0005480.  They are both "vesicle transport", and
> obsoleted:

That violates the uniqueness constraint, and this sounds more like a  
bug in the GO file. I'm also not sure what motivated them to create  
the same term multiple times only to obsolete it immediately.

> [...]
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("PMID","","0","") FKs ()
> Column 'accession' cannot be null
> ---------------------------------------------------
> Could not store term GO:0032933, name 'SREBP-mediated signaling
> pathway':
> [...]
> with the offending entry being
>
> term: SREBP-mediated signaling pathway
> goid: GO:0032933
> definition: A series of molecular signals from the endoplasmic  
> reticulum
> to the nucleus generated as a consequence of altered levels of one or
> more lipids, and resulting in the activation of transcription by  
> SREBP.
> definition_reference: GOC:mah
> definition_reference: PMID:0
>
> I commented out the definition_reference for PMID:0, which seemed  
> to fix
> matters.

Right, it seems to be a bogus reference.

>
> The process.ontology and component.ontology files then went into the
> database without a hitch.  Thanks again for your help,

Fantastic you got it all loaded!

Note that you also have the --computetc switch which will compute the  
transitive closure for you automatically.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From alexl at users.sourceforge.net  Tue Apr 17 11:13:30 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Tue, 17 Apr 2007 08:13:30 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu> (Chris Fields's
	message of "Tue\, 17 Apr 2007 09\:35\:10 -0500")
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
	<0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>
Message-ID: <nwy7krdof9.fsf@delpy.biol.berkeley.edu>

>>>>> "CF" == Chris Fields  writes:

[...]

CF> Do you mean the bioperl core modules instead of "bioperl-live"?
CF> We use the term "bioperl-live" to designate code updated regularly
CF> via CVS, which can be buggy depending on when it's retrieved.

Yes, I am referring to the core package.  Called perl-bioperl in the
Fedora naming scheme.

CF> I'm not sure how others feel about this, but it's probably best to
CF> stick with either the latest official releases (v 1.5.2 at this
CF> time) or even GBrowse-sponsored interim releases (which fix
CF> GBrowse-related bugs and normally pass tests).

Yes I am sticking to the latest official release 1.5.2_102.  The
package is here:

http://download.fedora.redhat.com/pub/fedora/linux/extras/development/i386/repoview/perl-bioperl.html

and installable via yum (on the development branch) using:

$ yum install perl-bioperl

The FC-6 package will be available soon.

Alex


From cjfields at uiuc.edu  Tue Apr 17 12:18:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:18:19 -0500
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
	<1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>

On Apr 17, 2007, at 11:05 AM, Leighton Pritchard wrote:
...
>
>>> and so is something for the GO guys to fix, I guess.
>>
>> The lack of a database for certain xrefs surely is. If the escaped
>> comma does throw off the BioPerl parser then that part is for BioPerl
>> to fix.
>
> I thinkk the problems are now all in the data I downloaded from
> http://www.geneontology.org/GO.downloads.shtml - I believe the BioPerl
> parser to be innocent of these charges ;)  I've submitted the issue at
> the GO site, and with any luck they'll handle it quite soon (if it  
> is in
> fact their problem).
>
>> Note that you also have the --computetc switch which will compute the
>> transitive closure for you automatically.
>
> :D Excellent!  Thanks for the pointer, and again for your efforts,
>
> L.
...

If you do find anything that is BioSQL- or Bioperl-related then file  
a bug report so we can track it.  I agree with Hilmar that it's  
likely the parser is partly to blame.

http://bugzilla.open-bio.org/

We really appreciate the work you're putting into this!

chris


From cjfields at uiuc.edu  Tue Apr 17 12:19:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:19:02 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <nwy7krdof9.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
	<0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>
	<nwy7krdof9.fsf@delpy.biol.berkeley.edu>
Message-ID: <3963AFE3-68B6-43F0-8A20-82A575CA8806@uiuc.edu>


On Apr 17, 2007, at 10:13 AM, Alex Lancaster wrote:

>
> [...]
>
> CF> Do you mean the bioperl core modules instead of "bioperl-live"?
> CF> We use the term "bioperl-live" to designate code updated regularly
> CF> via CVS, which can be buggy depending on when it's retrieved.
>
> Yes, I am referring to the core package.  Called perl-bioperl in the
> Fedora naming scheme.
>
> CF> I'm not sure how others feel about this, but it's probably best to
> CF> stick with either the latest official releases (v 1.5.2 at this
> CF> time) or even GBrowse-sponsored interim releases (which fix
> CF> GBrowse-related bugs and normally pass tests).
>
> Yes I am sticking to the latest official release 1.5.2_102.  The
> package is here:
>
> http://download.fedora.redhat.com/pub/fedora/linux/extras/ 
> development/i386/repoview/perl-bioperl.html
>
> and installable via yum (on the development branch) using:
>
> $ yum install perl-bioperl
>
> The FC-6 package will be available soon.
>
> Alex

Sounds good.  Thanks Alex!

chris


From ioanniskirmitzoglou at gmail.com  Tue Apr 17 12:21:36 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Tue, 17 Apr 2007 19:21:36 +0300
Subject: [Bioperl-l]  Parsing FASTA m10 output
In-Reply-To: <b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
Message-ID: <b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>

Thanks for the prompt reply...
Seems like I will have to "quit talking and begin doing"
I will post the code here in case someone else finds himself in the same
situation...

-- 
Ioannis Kirmitzoglou, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


On 17/04/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
> I am parsing FASTA outputs these days.
>
> The m 10 format is a recent implementation, not so popular yet. So, I have

> first tested the Bio::SearchIO against a default output and everything is
> fine.
>
> I think future releases of Bio::SearchIO will deal with the m10 output. By
> now, you can run all again or code a little bit to parse what you want
(not
> a hard task).
>
> T.
>
>
> On 4/17/07, Ioannis Kirmitzoglou < IoannisKirmitzoglou at gmail.com> wrote:
> >
> > I apologize if this question has already been answered but my search
came
> up
> > with no relevant results.
> > I am new to the FASTA program and after reading the fasta3x.doc I
decided
> to
> > run it using the m10 output. The reason for doing such a choice was
> >
> > Quote from fasta3x.doc:
> >      -m 10 is a new, parseable format for use with other
> >      programs....
> >
> >
> > I ran FASTA in batch mode and waited about 3-4 days for the results.
> > My problem is that today, when i started writing a perl script to parse
> the
> > output I realized that SearchIO doesn't supports m10 format.
> > Seems like I had to be more careful...
> > Before starting coding a module that will be able to parse the output
(or
> > re-running FASTA with -m9 -d0 switches which will take 4 more days) I
> would
> > be really thankful if any of you knows of any other way to parse those
> > files?
> > Thanks in advance...
> >
> > Ioannis Kirmitzoglou, MSc
> > PhD. Student,
> > Bioinformatics Research Laboratory
> > Department of Biological Sciences
> > University of Cyprus
> >
> > --
> > View this message in context:
> http://www.nabble.com/Parsing-FASTA-m10-output-tf3590568.html#a10034698
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> "The way to get started is to quit talking and begin doing."
>       Walt Disney
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================


From cjfields at uiuc.edu  Tue Apr 17 12:49:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:49:53 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
Message-ID: <639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>

You can post here or add it to Bugzilla as an enhancement request if  
the code is particularly long.

chris

On Apr 17, 2007, at 11:21 AM, Ioannis Kirmitzoglou wrote:

> Thanks for the prompt reply...
> Seems like I will have to "quit talking and begin doing"
> I will post the code here in case someone else finds himself in the  
> same
> situation...
>
> -- 
> Ioannis Kirmitzoglou, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
>
>
> On 17/04/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
>> I am parsing FASTA outputs these days.
>>
>> The m 10 format is a recent implementation, not so popular yet.  
>> So, I have
>
>> first tested the Bio::SearchIO against a default output and  
>> everything is
>> fine.
>>
>> I think future releases of Bio::SearchIO will deal with the m10  
>> output. By
>> now, you can run all again or code a little bit to parse what you  
>> want
> (not
>> a hard task).
>>
>> T.
>>
>>
>> On 4/17/07, Ioannis Kirmitzoglou < IoannisKirmitzoglou at gmail.com>  
>> wrote:
>>>
>>> I apologize if this question has already been answered but my search
> came
>> up
>>> with no relevant results.
>>> I am new to the FASTA program and after reading the fasta3x.doc I
> decided
>> to
>>> run it using the m10 output. The reason for doing such a choice was
>>>
>>> Quote from fasta3x.doc:
>>>      -m 10 is a new, parseable format for use with other
>>>      programs....
>>>
>>>
>>> I ran FASTA in batch mode and waited about 3-4 days for the results.
>>> My problem is that today, when i started writing a perl script to  
>>> parse
>> the
>>> output I realized that SearchIO doesn't supports m10 format.
>>> Seems like I had to be more careful...
>>> Before starting coding a module that will be able to parse the  
>>> output
> (or
>>> re-running FASTA with -m9 -d0 switches which will take 4 more  
>>> days) I
>> would
>>> be really thankful if any of you knows of any other way to parse  
>>> those
>>> files?
>>> Thanks in advance...
>>>
>>> Ioannis Kirmitzoglou, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>>
>>> --
>>> View this message in context:
>> http://www.nabble.com/Parsing-FASTA-m10-output- 
>> tf3590568.html#a10034698
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> --
>> "The way to get started is to quit talking and begin doing."
>>       Walt Disney
>>
>> ========================
>> Thiago Motta Venancio, MSc
>> PhD student in Bioinformatics
>> University of Sao Paulo
>> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lpritc at scri.ac.uk  Tue Apr 17 09:35:44 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 14:35:44 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
Message-ID: <1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>

Hi Hilmar, 

Thanks for the very quick response.  Apologies for the long reply, but I
thought it might be useful if anyone else happens across the same
problems that I did.

On Tue, 2007-04-17 at 00:00 -0400, Hilmar Lapp wrote:
> Apparently the parser  
> fails to parse out database and accession for this db_xref of term GO: 
> 0018901.
> 
> If you can edit the obo file, you can try deleting the db_xref(s) for  
> that term that look odd (or delete all if you don't need them).

You're spot on - see further down for details...

> Note that the argument for --fmtargs here should read
> "-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
> there is no tilde expansion.)

D'oh!  Thanks for the note - my bad, there.

> This is one the things why you've got to love MySQL (and I am correct  
> in inferring that you're using MySQL?). 

The 'choice' was forced upon me ;)

> It may be necessary to widen the length of dbname.accession here, for  
> example to 80 chars? Let me know if you need help with the DDL  
> command to do this.

I've fixed that now (and added it to my local biosqldb-mysql.sql
schema), but with a clean BioSQL schema and using:

[lpritc at lplinuxdev sql]$ bp_load_ontology.pl --host localhost --dbname
biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass ********
--format goflat --fmtargs
"-defs_file,/home/lpritc/Downloads/GO.defs" /home/lpritc/Downloads/function.ontology 

I was still getting errors with the GO flatfile:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("","","0","") FKs ()
Column 'dbname' cannot be null
---------------------------------------------------
Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
lactonase activity':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0x88497a4)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x897f074)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x8d64ad8)', '-throw',
'CODE(0x851abc8)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

I tracked this down to an apparently poor formatting of the GO.defs file
(note that the first and third definition_lines appear to be two halves
of the same entry):

term: 2-pyrone-4,6-dicarboxylate lactonase activity
goid: GO:0047554
definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O
= 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
definition_reference: EC:3.1.1.57
definition_reference: MetaCyc:2-PYRONE-4

I found 43 similar errors for other GOIDs, and it appears to result from
the occurrence of the string "\," in a dbxref - mostly MetaCyc entries,
but also some UM-BBD_pathwayID entries.

These errors appear to have followed through into the generation of the
OBO format files in each case, e.g.:

def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE-LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]

and so is something for the GO guys to fix, I guess.


Another error is thrown after fixing the above, though (with the same
command as before):

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were
("GO:0006905","vesicle transport","OBSOLETE (was not defined before
being made obsolete).","X","") FKs (1)
Duplicate entry 'vesicle transport-1-X' for key 3
---------------------------------------------------
Could not store term GO:0006905, name 'vesicle transport':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbcac418)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x957805c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x995db20)', '-throw',
'CODE(0x9113bd0)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

There are duplicate terms, identical in the term table except for GOID:
GO:0006905 and GO:0005480.  They are both "vesicle transport", and
obsoleted:

term: vesicle transport
goid: GO:0005480
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because it represents a biological
process and not a molecular function. To update annotations, use the
biological process term 'vesicle-mediated transport ; GO:0016192'.

term: vesicle transport
goid: GO:0006905
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because the meaning of the term is
ambiguous. To update annotations, consider the biological process term
'vesicle-mediated transport ; GO:0016192'.

I used the --noobsolete flag to avoid this error - reasoning that since
I'm populating the database for the first time, ignoring the obsolete
terms won't hurt - but finally this error was thrown:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("PMID","","0","") FKs ()
Column 'accession' cannot be null
---------------------------------------------------
Could not store term GO:0032933, name 'SREBP-mediated signaling
pathway':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbe18f14)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x99bbf2c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x9da0ad8)', '-throw',
'CODE(0x9556bb4)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

with the offending entry being 

term: SREBP-mediated signaling pathway
goid: GO:0032933
definition: A series of molecular signals from the endoplasmic reticulum
to the nucleus generated as a consequence of altered levels of one or
more lipids, and resulting in the activation of transcription by SREBP.
definition_reference: GOC:mah
definition_reference: PMID:0

I commented out the definition_reference for PMID:0, which seemed to fix
matters.

The process.ontology and component.ontology files then went into the
database without a hitch.  Thanks again for your help,

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From lpritc at scri.ac.uk  Tue Apr 17 12:05:16 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 17:05:16 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
Message-ID: <1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>

Hello again,

On Tue, 2007-04-17 at 11:09 -0400, Hilmar Lapp wrote:
> Thanks for reporting all these.

No problem at all.

> On Apr 17, 2007, at 9:35 AM, Leighton Pritchard wrote:
> > term: 2-pyrone-4,6-dicarboxylate lactonase activity
[...]
> > definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
> 
> I wonder whether this is the line that throws the parser off. It  
> looks like the database part of the reference is missing - bad.

> > definition_reference: MetaCyc:2-PYRONE-4

I don't think the parser is to blame, here.  Note that if you join the
definition_reference strings from the GO.defs file, you get:

MetaCyc:2-PYRONE-4:6-DICARBOXYLATE-LACTONASE-RXN

Then if you replace the colon by "\," you get what should (I think)
actually be the MetaCyc entry:

MetaCyc:2-PYRONE-4\,6-DICARBOXYLATE-LACTONASE-RXN

> > I found 43 similar errors for other GOIDs, and it appears to result  
> > from
> > the occurrence of the string "\," in a dbxref - mostly MetaCyc  
> > entries,
> > but also some UM-BBD_pathwayID entries.
> 
> I'm not sure - although the string "\," might indeed trip up the  
> parser, would have to investigate to confirm. Could it be a  
> coincidence with definition_references that lack the database part  
> before the colon?

Inspecting the troublesome entries by eye seems to turn up the same
problem as above consistently: a GO term in the GO.defs file is
malformed.  The term should have a definition_reference field describing
a MetaCyc entry that matches the term field.  In the term string, there
would be an escaped comma, but the string ends where we expect this.
The string that would follow the escaped comma is present as the first
definition_reference.

This observation also extends to cases where there should be two
occurrences of "\," in the MetaCyc field, e.g.:

term: 2,3-dihydroxyindole 2,3-dioxygenase activity
goid: GO:0047528
definition: Catalysis of the reaction: 2,3-dihydroxyindole + O2 =
anthranilate + CO2.
definition_reference: :3-DIHYDROXYINDOLE-2
definition_reference: :3-DIOXYGENASE-RXN
definition_reference: EC:1.13.11.2
definition_reference: MetaCyc:2

It then appears as though the GO flatfiles were used automatically to
generate the OBO format files, and propagated the same error into the
square brackets in each case.

> > and so is something for the GO guys to fix, I guess.
> 
> The lack of a database for certain xrefs surely is. If the escaped  
> comma does throw off the BioPerl parser then that part is for BioPerl  
> to fix. 

I thinkk the problems are now all in the data I downloaded from
http://www.geneontology.org/GO.downloads.shtml - I believe the BioPerl
parser to be innocent of these charges ;)  I've submitted the issue at
the GO site, and with any luck they'll handle it quite soon (if it is in
fact their problem).

> Note that you also have the --computetc switch which will compute the  
> transitive closure for you automatically.

:D Excellent!  Thanks for the pointer, and again for your efforts,

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From stefan.kirov at bms.com  Tue Apr 17 11:09:30 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 17 Apr 2007 11:09:30 -0400
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph with
	Perl]
Message-ID: <4624E32A.6010704@bms.com>

Missed to send this to the list....
Stefan
-------------- next part --------------
An embedded message was scrubbed...
From: Stefan Kirov <stefan.kirov at bms.com>
Subject: Re: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
Date: Tue, 17 Apr 2007 10:30:11 -0400
Size: 2262
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070417/cc49d62a/attachment-0003.mht>

From lpritc at scri.ac.uk  Tue Apr 17 12:55:38 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 17:55:38 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
	<1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
	<146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>
Message-ID: <1176828938.988.133.camel@lplinuxdev.scri.sari.ac.uk>

Hi Chris,

On Tue, 2007-04-17 at 11:18 -0500, Chris Fields wrote:
> If you do find anything that is BioSQL- or Bioperl-related then file  
> a bug report so we can track it.  I agree with Hilmar that it's  
> likely the parser is partly to blame.
> 
> http://bugzilla.open-bio.org/

I've submitted a bug report, mostly replicating my first post in this
thread.  I added links to the appropriate point in the list archives so
that the rest of the discussion can be considered, too.

> We really appreciate the work you're putting into this!

Thanks - I'm just grateful that the Bio* repertoire is there at all so
that my problems are relatively minor (as opposed to attempting to
replicate the functionality independently).

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From lstein at cshl.edu  Tue Apr 17 13:47:25 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 17 Apr 2007 13:47:25 -0400
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <C2340DDA.D83F%bosborne11@verizon.net>
References: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<C2340DDA.D83F%bosborne11@verizon.net>
Message-ID: <6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>

Hi,

I've been updating the WIKI in anticipation of a new GBrowse release and
have added a "stub" for the biopackages.net install. Since I don't use yum
(I've been running Slackware for ages and have recently started working with
Ubuntu) I'm not sure I got the details right. Could someone check?


        http://www.gmod.org/wiki/index.php/GBrowse_RPM_HOWTO

Also, I think some verbiage on how to use yum to install MySQL and Apache
would be great, since it will be consistent with the Ubuntu install page.

Thanks,

Lincoln

On 3/31/07, Brian Osborne <bosborne11 at verizon.net> wrote:
>
> Allen et al.,
>
> What happened to the "GMOD" package or packages? I've had some
> conversations
> in the past few months with you-all suggesting that a GMOD package, or
> packages, would be useful.
>
> Brian O.
>
>
>
>
> On 3/30/07 8:30 PM, "Allen Day" <allenday at gmail.com> wrote:
>
> > Hi Alex,
> >
> > You've aptly noted that there are several classes of packages being
> > discussed here, and that they should not be treated equally.  From my
> > point of view and of specific relevance to the Bioperl community we
> > have at least:
> >
> > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran
> > dependencies.  These should all be in Fedora Extras, as they are of
> > general utility.  Biopackages.net currently hosts about 200 packages
> > (.spec files, specifically) that are like this.  Maybe 80 of these are
> > needed for Bioperl.
> >
> > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan,
> > etc.  From what I've seen, these typically have strange/custom
> > licenses that may not be valid for some users.  BLAT has a dual
> > licensing scheme for academic and non-academic licensees, for
> > instance.  These packages are not of general utility.  For these two
> > reasons, my stance is that they should not be included in Fedora
> > Extras.
> >
> > 3) Bioperl packages.  Several subsets here.  The Bioperl-run libraries
> > depend directly on type (2) packages, so aren't appropriate to include
> > in Fedora Extras.  Bioperl-live is not really that useful without type
> > (2) packages.  It is also sensible to all of the keep the Bioperl-*
> > packages in the same repository.  For these reasons, my stance is that
> > they should not be included in Fedora Extras.
> >
> > 4) Bioinformatics / Comp. Bio. data sets.  These don't have licensing
> > problems, but they tend to be large.  Usually in the 10E7 - 10E10 byte
> > range.  RPM can not even generate correct metadata for some of them
> > correctly if the files are too large (overflow problems).  Probably
> > not appropriate to put in Fedora Extras because they are too large and
> > not generally useful.
> >
> > 5) Bioinformatics-specific System databases / daemons.  These
> > high-level packages depend on types (2), (3), and (4), and so are not
> > appropriate to put into Fedora Extras.  An example is a BLAT daemon,
> > which relies on the BLAT server, as well as NIB-formatted genome
> > sequence files.
> >
> > That said, there are a lot of type (1) packages in the Biopackages.net
> > repository.  If you're interested in migrating the spec files from our
> > repository to the Fedora project it would save us (the Biopackages.net
> > maintainers) a ton of build and maintenance time, so please feel free
> > to take them, just let us know.  If we can reach some agreement on
> > where the bioinformatics-specific packages should be maintained/built
> > we may be able to work together on these as well.
> >
> > -Allen
> >
> >
> > On 3/30/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> >>>>>>> "AD" == Allen Day  writes:
> >>
> >> AD> Hi Alex, The Biopackages.net project is still active, we are
> >> AD> regularly adding packages to it, mostly R packages lately.  Most
> >> AD> of the systems we use are running CentOS at this point, which is
> >> AD> why you have not seen support for FC6 yet.  There is nothing
> >> AD> preventing building FC6 packages aside from lack of time to set up
> >> AD> the FC6 build farm nodes.
> >>
> >> Hi Allen and other,
> >>
> >> Great news to hear that Biopackages.net is still active!  I would like
> >> to help out if possible.  I don't believe in "FUD" either... ;)
> >>
> >> AD> If you're interested in packaging BioPerl or other
> >> AD> bioinformatics-related software, please join the Biopackages
> >> AD> project on SourceForge.  We object to the Fedora Extras FUD
> >> AD> tactics used to discourage people from using 3rd party
> >> AD> repositories, and suspect they may not want to host some of our
> >> AD> data packages, such as the >2GB genome packages.  Biopackages
> >> AD> project is likely to partially merge with RPMForge.  We are
> >> AD> already discussing with them how best to do it.
> >>
> >> The packages that I created which are currently available in Fedora
> >> Packages are Perl dependencies which, as I said are useful for
> >> packages outside the bioinformatics purview.  I do have a (base)
> >> bioperl package in review, but it is not yet released.
> >>
> >> As for third-party repos, I don't object to them at all, and for some
> >> kinds of projects they are indeed appropriate. (e.g. for non-free
> >> stuff like Livna or Freshrpms).  However I do have practical concerns
> >> about repository mixing, but I think that it does need to be handled
> >> carefully but that co-operation between Fedora and third-party repos
> >> can make it work.
> >>
> >> For example, one practical concern is that as of the
> >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there
> >> will be no distinction at the repository-level between formerly Extras
> >> packages and formerly Core packages (as of now there are only "Fedora
> >> Packages"), which means that it will not be possible for third-party
> >> repos to limit their dependencies to just those in a former base set
> >> (i.e. excluding Extras).
> >>
> >> I agree that a few years ago (circa 2003-2004) there was concern about
> >> the way some third party repositories were treated somewhat badly by
> >> the (then) Fedora Extras (with some people going so far as to say that
> >> third-party repos were bad in principle and should always be ignored
> >> which I disagree with too).  But it seems to me that culture has
> >> shifted since, with some notable packagers such as Matthias Saou (of
> >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to
> >> Fedora itself.  The process of contributing has also become much
> >> simpler and reviews are conducted speedily and efficiently, I had
> >> packages in the repository in a matter of a few days from initial
> >> submission.  Freshrpms itself now enables and depends on the (old)
> >> Extras.
> >>
> >> The real question for me, then is what packages it makes sense to go
> >> in Fedora, and what packages go in third party repositories.  It seems
> >> to me that in the case of Perl packages which could be dependencies
> >> for other packages not specific to the third-party repo in question,
> >> it makes sense for them to go into Fedora itself, so I think I will
> >> continue to package them.  This lessens the load on the third-party
> >> repo, while making them available for all other third-party repos.
> >> (This is approach that Freshrpms seems to be taking, Matthias has
> >> contributed most packages back to Fedora now other than the non-free
> >> ones).
> >>
> >> At the other end of the spectrum are packages like you mention, genome
> >> packages, which may be of concern because of their size and/or highly
> >> specialised nature, and, as you say, may make sense to go in a
> >> third-party repo like Biopackages.net.  Also packages which can't be
> >> packaged by Fedora for legal reasons like Clustal could/should go in
> >> Biopackages.net.
> >>
> >> In the middle are packages like bioperl itself which are potentially
> >> useful to perhaps a wider group of people than the genome packages but
> >> may not necessarily be dependencies for other packages.  I lean
> >> towards making them part of Fedora so that they will be available of
> >> out the box on the planned "Everything" DVD ISO, but I welcome a
> >> discussion on this.
> >>
> >> As I said, I'm glad to hear that Biopackages.net is alive and well and
> >> I welcome a discussion on how upstream Fedora can usefully interact
> >> with Biopackages.net (I guess perhaps on the Biopackages.net list).
> >>
> >> Regards,
> >> Alex
> >>
> >> PS.  As the upstream author If you could clarify the license on
> >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be great.
> >> --
> >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of
> Arizona
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From alexl at users.sourceforge.net  Wed Apr 18 04:50:51 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 18 Apr 2007 01:50:51 -0700
Subject: [Bioperl-l] bioperl-run and Bio::Root::AccessorMaker
Message-ID: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>

In packaging bioperl-run for Fedora, I think I stumbled across a bug
in the bioperl-run package.  It appears from this edit:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl

that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
bioperl-run 1.5.2_100 still contains modules that use this module:

$ cd bioperl-run-1.5.2_100
$ grep -r AccessorMaker  *
Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
class min_version)]);
Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
('$'=>[qw(input_file output_file)]);

This causes the automatic Perl dependency generator for RPM to add
Bio::Root::AccessorMake as a requires which means RPM will refuse to
install perl-bioperl-run because it's looking for the now-removed-
from-Core-bioperl module

$ sudo rpm -Uvh --test
/home/alex/rpmbuild/RPMS/noarch/perl-bioperl-run-1.5.2_100-1.noarch.rpm 
error: Failed dependencies:
        perl(Bio::Root::AccessorMaker) is needed by
        perl-bioperl-run-1.5.2_100-1.noarch

Are the SDI and JavaRunner modules being actively developed?  What's
the best course of action for these modules, should I just exclude
them from the package for now? since they won't work, even if if you
tell RPM to ignore the dependency warning.

Alex


From shameer at ncbs.res.in  Wed Apr 18 06:16:07 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Wed, 18 Apr 2007 15:46:07 +0530 (IST)
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph
 with Perl]
In-Reply-To: <4624E32A.6010704@bms.com>
References: <4624E32A.6010704@bms.com>
Message-ID: <36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>

Hi,

I am also interested to use the Bio::Graphics modules from dynamic image
display. I have a doubt,  I tried all the sample programs explained in
this page http://stein.cshl.org/genome_informatics/BioGraphics/index.html.
Is it possible to generate a png/jpg/gif image from this module by
altering the same program. Currently its using diplay option. I know this
can be done by using GD/Image::MAgick in Perl. But Is there any quick way
to accomplish it in BioPerl .

Thanks,


> Missed to send this to the list....
> Stefan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From sdavis2 at mail.nih.gov  Wed Apr 18 07:18:48 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 18 Apr 2007 07:18:48 -0400
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph
	with Perl]
In-Reply-To: <36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
References: <4624E32A.6010704@bms.com>
	<36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
Message-ID: <200704180718.48811.sdavis2@mail.nih.gov>

On Wednesday 18 April 2007 06:16, Shameer Khadar wrote:
> Hi,
>
> I am also interested to use the Bio::Graphics modules from dynamic image
> display. I have a doubt,  I tried all the sample programs explained in
> this page http://stein.cshl.org/genome_informatics/BioGraphics/index.html.
> Is it possible to generate a png/jpg/gif image from this module by
> altering the same program. Currently its using diplay option. 

You just need to print $panel->png to a file.

Sean


From bix at sendu.me.uk  Wed Apr 18 07:48:27 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 12:48:27 +0100
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
Message-ID: <4626058B.8090801@sendu.me.uk>

Alex Lancaster wrote:
> In packaging bioperl-run for Fedora, I think I stumbled across a bug
> in the bioperl-run package.  It appears from this edit:
> 
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
> 
> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
> bioperl-run 1.5.2_100 still contains modules that use this module:
> 
> $ cd bioperl-run-1.5.2_100
> $ grep -r AccessorMaker  *
> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
> class min_version)]);
> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
> ('$'=>[qw(input_file output_file)]);

It looks like I've implemented a similar idea to AccessorMaker and 
AbstractRunner in Bio::Root::Root->_set_from_args() and 
Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses 
AbstractRunner I propose deprecating it immediately.

Forester::SDI and JavaRunner have no tests which is why we didn't notice 
the problem. Since they've been out of use for a number of years now I 
also propose their immediate deprecation. Alternatively, it may not be 
too difficult to just update them to use _set_from_args and _setparams, 
but I've nothing to test against (and JavaRunner is self-described as 
"probably incomplete").


I can remove the modules from cvs and create bioperl-run-1.5.2_101, 
resolving the packaging issue. I plan on doing precisely this within the 
next seven days unless someone puts a hand up to stop me.


[BCC: author, Juguang Xiao]


From cjfields at uiuc.edu  Wed Apr 18 08:43:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 07:43:45 -0500
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <4626058B.8090801@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>


On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:

> It looks like I've implemented a similar idea to AccessorMaker and
> AbstractRunner in Bio::Root::Root->_set_from_args() and
> Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
> AbstractRunner I propose deprecating it immediately.

JavaRunner is-a AbstractRunner, but what you propose below takes care  
of that.

> Forester::SDI and JavaRunner have no tests which is why we didn't  
> notice
> the problem. Since they've been out of use for a number of years now I
> also propose their immediate deprecation. Alternatively, it may not be
> too difficult to just update them to use _set_from_args and  
> _setparams,
> but I've nothing to test against (and JavaRunner is self-described as
> "probably incomplete").
>
>
> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
> resolving the packaging issue. I plan on doing precisely this  
> within the
> next seven days unless someone puts a hand up to stop me.
>
>
> [BCC: author, Juguang Xiao]

I suppose you could just remove the modules from the branch for now,  
but (as you point out) the code appears largely incomplete, so might  
as well deprecate the entire lot.  The code will be in the 'attic'  
once removed if anyone's really interested in it.

You've forwarded the author and the mail list so let's see what the  
response is (if any)...

chris


From cjfields at uiuc.edu  Wed Apr 18 11:30:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 10:30:45 -0500
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <462634DB.2040701@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
Message-ID: <143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>


On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>>> resolving the packaging issue. I plan on doing precisely this  
>>> within the
>>> next seven days unless someone puts a hand up to stop me.
>>>
>>> [BCC: author, Juguang Xiao]
> [snip]
>> You've forwarded the author and the mail list so let's see what  
>> the response is (if any)...
>
> Unfortunately the mail was undeliverable, and I have no other  
> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few  
> more days for other responses on the list.
>
> I never made a branch for bioperl-run 1.5.2, so they'd be removed  
> from HEAD.

It might be a good idea to repost this using the module names  
affected in the subject, just in case, though the last post he made  
on the mail list was ~3 years ago using the same email:

http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/ 
match=xiao

He may be MIA.

chris


From bix at sendu.me.uk  Wed Apr 18 11:10:19 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 16:10:19 +0100
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
Message-ID: <462634DB.2040701@sendu.me.uk>

Chris Fields wrote:
> 
> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
> 
>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>> resolving the packaging issue. I plan on doing precisely this within the
>> next seven days unless someone puts a hand up to stop me.
>>
>> [BCC: author, Juguang Xiao]
[snip]
> You've forwarded the author and the mail list so let's see what the 
> response is (if any)...

Unfortunately the mail was undeliverable, and I have no other address 
for Juguang (I tried juguang at tll.org.sg). I'll wait a few more days for 
other responses on the list.

I never made a branch for bioperl-run 1.5.2, so they'd be removed from HEAD.


From hlapp at gmx.net  Wed Apr 18 11:59:52 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 18 Apr 2007 11:59:52 -0400
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
	<143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
Message-ID: <EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>

There is a Juguang Xiao at juguang.swf at gmail.com. Not sure he's  
the same, but sounds like it's a geek at least. (google and you'll  
see; has anyone here ever heard about neko??)

	-hilmar

On Apr 18, 2007, at 11:30 AM, Chris Fields wrote:

>
> On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:
>
>> Chris Fields wrote:
>>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>>>> resolving the packaging issue. I plan on doing precisely this
>>>> within the
>>>> next seven days unless someone puts a hand up to stop me.
>>>>
>>>> [BCC: author, Juguang Xiao]
>> [snip]
>>> You've forwarded the author and the mail list so let's see what
>>> the response is (if any)...
>>
>> Unfortunately the mail was undeliverable, and I have no other
>> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few
>> more days for other responses on the list.
>>
>> I never made a branch for bioperl-run 1.5.2, so they'd be removed
>> from HEAD.
>
> It might be a good idea to repost this using the module names
> affected in the subject, just in case, though the last post he made
> on the mail list was ~3 years ago using the same email:
>
> http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/
> match=xiao
>
> He may be MIA.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Apr 18 12:00:49 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 18 Apr 2007 12:00:49 -0400
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <4626058B.8090801@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <9159C9DF-41BC-46AA-8511-763AD9B7A3D0@gmx.net>

sounds good to me - the less cruft the better. -hilmar
On Apr 18, 2007, at 7:48 AM, Sendu Bala wrote:

> Alex Lancaster wrote:
>> In packaging bioperl-run for Fedora, I think I stumbled across a bug
>> in the bioperl-run package.  It appears from this edit:
>>
>> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/ 
>> Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
>>
>> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
>> bioperl-run 1.5.2_100 still contains modules that use this module:
>>
>> $ cd bioperl-run-1.5.2_100
>> $ grep -r AccessorMaker  *
>> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
>> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw 
>> (jar
>> class min_version)]);
>> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
>> ('$'=>[qw(input_file output_file)]);
>
> It looks like I've implemented a similar idea to AccessorMaker and
> AbstractRunner in Bio::Root::Root->_set_from_args() and
> Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
> AbstractRunner I propose deprecating it immediately.
>
> Forester::SDI and JavaRunner have no tests which is why we didn't  
> notice
> the problem. Since they've been out of use for a number of years now I
> also propose their immediate deprecation. Alternatively, it may not be
> too difficult to just update them to use _set_from_args and  
> _setparams,
> but I've nothing to test against (and JavaRunner is self-described as
> "probably incomplete").
>
>
> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
> resolving the packaging issue. I plan on doing precisely this  
> within the
> next seven days unless someone puts a hand up to stop me.
>
>
> [BCC: author, Juguang Xiao]
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Apr 18 12:25:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 11:25:54 -0500
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
	<143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
	<EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>
Message-ID: <E0195EBD-731D-4915-91AD-7FFE1FA9F608@uiuc.edu>

My guess is the hilmar's is the most current as posts were made this  
year.  I found another email: juguang at fugu-sg.org.  Looks like he  
added some stuff to Ensembl a while back (sorry about the long URL).

http://www.ensembl.org/info/software/Pdoc/ensembl/modules/Bio/EnsEMBL/ 
Utils/Converter/ens_bio_featurePair_raw.html

chris

On Apr 18, 2007, at 10:59 AM, Hilmar Lapp wrote:

> There is a Juguang Xiao at juguang.swf at gmail.com. Not sure he's
> the same, but sounds like it's a geek at least. (google and you'll
> see; has anyone here ever heard about neko??)
>
> 	-hilmar
>
> On Apr 18, 2007, at 11:30 AM, Chris Fields wrote:
>
>>
>> On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:
>>
>>> Chris Fields wrote:
>>>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>>>> I can remove the modules from cvs and create bioperl- 
>>>>> run-1.5.2_101,
>>>>> resolving the packaging issue. I plan on doing precisely this
>>>>> within the
>>>>> next seven days unless someone puts a hand up to stop me.
>>>>>
>>>>> [BCC: author, Juguang Xiao]
>>> [snip]
>>>> You've forwarded the author and the mail list so let's see what
>>>> the response is (if any)...
>>>
>>> Unfortunately the mail was undeliverable, and I have no other
>>> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few
>>> more days for other responses on the list.
>>>
>>> I never made a branch for bioperl-run 1.5.2, so they'd be removed
>>> from HEAD.
>>
>> It might be a good idea to repost this using the module names
>> affected in the subject, just in case, though the last post he made
>> on the mail list was ~3 years ago using the same email:
>>
>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/
>> match=xiao
>>
>> He may be MIA.
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 18 12:37:55 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 17:37:55 +0100
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
Message-ID: <46264963.9020306@sendu.me.uk>

Hi all,

t/DB.t is currently failing tests 40 and 41:

ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
                                          '-ids' => [qw(J00522 AF303112 
2981014)],
                                          -verbose => 1);

cmp_ok $query->count, '>', 0;

You can see that 
http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522%2CAF303112%2C2981014&retmax=100 
gives no results, where presumably it used to give 3. querying on the 3 
ids individually works fine. So... what changed and how do we get around it?


From cjfields at uiuc.edu  Wed Apr 18 13:05:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 12:05:12 -0500
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <46264963.9020306@sendu.me.uk>
References: <46264963.9020306@sendu.me.uk>
Message-ID: <6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>

I can verify on this end.  Not sure why, but the same accessions are  
used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)  
with success.

chris

On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:

> Hi all,
>
> t/DB.t is currently failing tests 40 and 41:
>
> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>                                           '-ids' => [qw(J00522  
> AF303112
> 2981014)],
>                                           -verbose => 1);
>
> cmp_ok $query->count, '>', 0;
>
> You can see that
> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi? 
> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522% 
> 2CAF303112%2C2981014&retmax=100
> gives no results, where presumably it used to give 3. querying on  
> the 3
> ids individually works fine. So... what changed and how do we get  
> around it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Apr 18 14:07:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 13:07:22 -0500
Subject: [Bioperl-l] Skipping/Failing tests
Message-ID: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>

To the BioPerl community at large,

I have noticed a problem with some BioPerl tests when converting to  
Test::More.  When using the following:

     while ($seq = $seqin->next_seq) {
         my $acc = $seq->accession;
         ok exists $result{ $acc };
         is $seq->length, $result{ $acc };
         delete $result{$acc};
     }

if $seq is undef then the test plan is off by a factor of 2 for every  
iteration of the loop.  Two serious problems:

1) No specific failures are seen until the end of the test suite when  
the test plan doesn't match the number of tests (which could be  
several hundred lines away from the actual failure).
2) Worse, if one were lazy enough to not track the actual number of  
tests (heh, not that would happen) they could inadvertently change  
the test plan to match the final number of tests.

There are several ways to work around this, such as using a counter  
to track the number of iterations and check to make sure they pass:

     $ct = 0;
     while ($seq = $seqin->next_seq) {
         $ct++;
         my $acc = $seq->accession;
         ok exists $result{ $acc };
         is $seq->length, $result{ $acc };
         delete $result{$acc};
     }
     is($ct, 3);

Here, if $ct is 0 you'll get an error.  However, the test count will  
still be off at the end (the test plan will be off by 6 tests).

My opinion is that we should try to match the plan, as a single fail  
doesn't reflect the severity of the bug (i.e. it should fail each  
test per iteration, as expected).  Skipping to match is an option as  
well (one I've used) but again doesn't reflect the severity of the  
problem in my opinion.  The flip side is that some consider any  
failed test significant, so there is no reason to try matching the  
tests up.

What I would like to do is hammer out something we can add to the  
Writing Tests HOWTO which addresses some ways to deal with the above  
for those who want to contribute code and tests to BioPerl.  I'm  
looking for some (any) additional opinions on the matter (or, if you  
have the initiative, adding some ideas to the HOWTO itself).

http://www.bioperl.org/wiki/Special:Recentchanges

Thanks!

chris


From ki.baik at roche.com  Wed Apr 18 14:32:35 2007
From: ki.baik at roche.com (Baik, Ki)
Date: Wed, 18 Apr 2007 11:32:35 -0700
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
References: <46264963.9020306@sendu.me.uk>
	<6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
Message-ID: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>

I have had similar problems in which a couple of accession numbers out
of a series were not retrieved, yet they do exist in ncbi.

Ki Baik

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Wednesday, April 18, 2007 10:05 AM
To: Sendu Bala
Cc: bioperl-l
Subject: Re: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures

I can verify on this end.  Not sure why, but the same accessions are  
used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)  
with success.

chris

On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:

> Hi all,
>
> t/DB.t is currently failing tests 40 and 41:
>
> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>                                           '-ids' => [qw(J00522  
> AF303112
> 2981014)],
>                                           -verbose => 1);
>
> cmp_ok $query->count, '>', 0;
>
> You can see that
> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi? 
> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522% 
> 2CAF303112%2C2981014&retmax=100
> gives no results, where presumably it used to give 3. querying on  
> the 3
> ids individually works fine. So... what changed and how do we get  
> around it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed Apr 18 15:12:29 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 18 Apr 2007 14:12:29 -0500
Subject: [Bioperl-l] Skipping/Failing tests
In-Reply-To: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>
References: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>
Message-ID: <46266D9D.1050703@campus.iztacala.unam.mx>

Hey Chris,

I don't know if this helps those working on the test suite but, there's 
a recently-cooked recipe for keeping track on the number of tests (thus 
helping to update the test plan accordingly):

http://www.perl.com/pub/a/2007/04/12/lightning-four.html?page=3

My quick .2 cents :)

Cheers,
Mauricio.

Chris Fields wrote:
> To the BioPerl community at large,
> 
> I have noticed a problem with some BioPerl tests when converting to  
> Test::More.  When using the following:
> 
>      while ($seq = $seqin->next_seq) {
>          my $acc = $seq->accession;
>          ok exists $result{ $acc };
>          is $seq->length, $result{ $acc };
>          delete $result{$acc};
>      }
> 
> if $seq is undef then the test plan is off by a factor of 2 for every  
> iteration of the loop.  Two serious problems:
> 
> 1) No specific failures are seen until the end of the test suite when  
> the test plan doesn't match the number of tests (which could be  
> several hundred lines away from the actual failure).
> 2) Worse, if one were lazy enough to not track the actual number of  
> tests (heh, not that would happen) they could inadvertently change  
> the test plan to match the final number of tests.
> 
> There are several ways to work around this, such as using a counter  
> to track the number of iterations and check to make sure they pass:
> 
>      $ct = 0;
>      while ($seq = $seqin->next_seq) {
>          $ct++;
>          my $acc = $seq->accession;
>          ok exists $result{ $acc };
>          is $seq->length, $result{ $acc };
>          delete $result{$acc};
>      }
>      is($ct, 3);
> 
> Here, if $ct is 0 you'll get an error.  However, the test count will  
> still be off at the end (the test plan will be off by 6 tests).
> 
> My opinion is that we should try to match the plan, as a single fail  
> doesn't reflect the severity of the bug (i.e. it should fail each  
> test per iteration, as expected).  Skipping to match is an option as  
> well (one I've used) but again doesn't reflect the severity of the  
> problem in my opinion.  The flip side is that some consider any  
> failed test significant, so there is no reason to try matching the  
> tests up.
> 
> What I would like to do is hammer out something we can add to the  
> Writing Tests HOWTO which addresses some ways to deal with the above  
> for those who want to contribute code and tests to BioPerl.  I'm  
> looking for some (any) additional opinions on the matter (or, if you  
> have the initiative, adding some ideas to the HOWTO itself).
> 
> http://www.bioperl.org/wiki/Special:Recentchanges
> 
> Thanks!
> 
> chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Wed Apr 18 15:41:56 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 14:41:56 -0500
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
References: <46264963.9020306@sendu.me.uk>
	<6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
	<6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
Message-ID: <208DCD0F-6A0B-4054-A1C7-D599D32AC344@uiuc.edu>

The problem appears to be with eutils.  Using bare accession numbers  
no longer works with esearch (which Bio::DB::Query::GenBank uses).   
Using them via efetch still works, which explains why  
Bio::DB::GenBank passes tests using the same accession/GI mix.

NCBI has added an extra field descriptor specifically for accessions  
in esearch, which means any queries with accessions must look like  
the following (the last is a GI):

'J00522[accession] OR AF303112[accession] OR 2981014'

'J00522[accession] | AF303112[accession] | 2981014' also works.

We could separate them into two groups based on presence of letters  
and set up the query that way, or we can define exactly what kind of  
ID is acceptable for passing to ids() (GI or accession), or have ids 
() be GI and have a new method for accessions (or vice versa).   
Thoughts?

chris

On Apr 18, 2007, at 1:32 PM, Baik, Ki wrote:

> I have had similar problems in which a couple of accession numbers out
> of a series were not retrieved, yet they do exist in ncbi.
>
> Ki Baik
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
> Fields
> Sent: Wednesday, April 18, 2007 10:05 AM
> To: Sendu Bala
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
>
> I can verify on this end.  Not sure why, but the same accessions are
> used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)
> with success.
>
> chris
>
> On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:
>
>> Hi all,
>>
>> t/DB.t is currently failing tests 40 and 41:
>>
>> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>>                                           '-ids' => [qw(J00522
>> AF303112
>> 2981014)],
>>                                           -verbose => 1);
>>
>> cmp_ok $query->count, '>', 0;
>>
>> You can see that
>> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?
>> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522%
>> 2CAF303112%2C2981014&retmax=100
>> gives no results, where presumably it used to give 3. querying on
>> the 3
>> ids individually works fine. So... what changed and how do we get
>> around it?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From boconnor at ucla.edu  Wed Apr 18 15:00:32 2007
From: boconnor at ucla.edu (Brian O'Connor)
Date: Wed, 18 Apr 2007 12:00:32 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>
References: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>	
	<C2340DDA.D83F%bosborne11@verizon.net>
	<6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>
Message-ID: <46266AD0.3070209@ucla.edu>

Hey Lincoln,

This looks good but the configuration step is about to change for 
Biopackages.  I'm writing config RPMs today so the end user can just 
install the config RPM for their distro and they don't have to manually 
change the yum.conf file.  It will also install the biopackages gpg key 
too so we can support signed packages.  I'll update the wiki when these 
config RPMs are available.

--Brian

Lincoln Stein wrote:

> Hi,
>
> I've been updating the WIKI in anticipation of a new GBrowse release 
> and have added a "stub" for the biopackages.net 
> <http://biopackages.net> install. Since I don't use yum (I've been 
> running Slackware for ages and have recently started working with 
> Ubuntu) I'm not sure I got the details right. Could someone check?
>
>
>         http://www.gmod.org/wiki/index.php/GBrowse_RPM_HOWTO
>
> Also, I think some verbiage on how to use yum to install MySQL and 
> Apache would be great, since it will be consistent with the Ubuntu 
> install page.
>
> Thanks,
>
> Lincoln
>
> On 3/31/07, *Brian Osborne* <bosborne11 at verizon.net 
> <mailto:bosborne11 at verizon.net>> wrote:
>
>     Allen et al.,
>
>     What happened to the "GMOD" package or packages? I've had some
>     conversations
>     in the past few months with you-all suggesting that a GMOD package, or
>     packages, would be useful.
>
>     Brian O.
>
>
>
>
>     On 3/30/07 8:30 PM, "Allen Day" <allenday at gmail.com
>     <mailto:allenday at gmail.com>> wrote:
>
>     > Hi Alex,
>     >
>     > You've aptly noted that there are several classes of packages being
>     > discussed here, and that they should not be treated
>     equally.  From my
>     > point of view and of specific relevance to the Bioperl community we
>     > have at least:
>     >
>     > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran
>     > dependencies.  These should all be in Fedora Extras, as they are of
>     > general utility.  Biopackages.net <http://Biopackages.net>
>     currently hosts about 200 packages
>     > (.spec files, specifically) that are like this.  Maybe 80 of
>     these are
>     > needed for Bioperl.
>     >
>     > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan,
>     > etc.  From what I've seen, these typically have strange/custom
>     > licenses that may not be valid for some users.  BLAT has a dual
>     > licensing scheme for academic and non-academic licensees, for
>     > instance.  These packages are not of general utility.  For these two
>     > reasons, my stance is that they should not be included in Fedora
>     > Extras.
>     >
>     > 3) Bioperl packages.  Several subsets here.  The Bioperl-run
>     libraries
>     > depend directly on type (2) packages, so aren't appropriate to
>     include
>     > in Fedora Extras.  Bioperl-live is not really that useful
>     without type
>     > (2) packages.  It is also sensible to all of the keep the Bioperl-*
>     > packages in the same repository.  For these reasons, my stance
>     is that
>     > they should not be included in Fedora Extras.
>     >
>     > 4) Bioinformatics / Comp. Bio. data sets.  These don't have
>     licensing
>     > problems, but they tend to be large.  Usually in the 10E7 -
>     10E10 byte
>     > range.  RPM can not even generate correct metadata for some of them
>     > correctly if the files are too large (overflow problems).  Probably
>     > not appropriate to put in Fedora Extras because they are too
>     large and
>     > not generally useful.
>     >
>     > 5) Bioinformatics-specific System databases / daemons.  These
>     > high-level packages depend on types (2), (3), and (4), and so
>     are not
>     > appropriate to put into Fedora Extras.  An example is a BLAT daemon,
>     > which relies on the BLAT server, as well as NIB-formatted genome
>     > sequence files.
>     >
>     > That said, there are a lot of type (1) packages in the
>     Biopackages.net <http://Biopackages.net>
>     > repository.  If you're interested in migrating the spec files
>     from our
>     > repository to the Fedora project it would save us (the
>     Biopackages.net <http://Biopackages.net>
>     > maintainers) a ton of build and maintenance time, so please feel
>     free
>     > to take them, just let us know.  If we can reach some agreement on
>     > where the bioinformatics-specific packages should be
>     maintained/built
>     > we may be able to work together on these as well.
>     >
>     > -Allen
>     >
>     >
>     > On 3/30/07, Alex Lancaster < alexl at users.sourceforge.net
>     <mailto:alexl at users.sourceforge.net>> wrote:
>     >>>>>>> "AD" == Allen Day  writes:
>     >>
>     >> AD> Hi Alex, The Biopackages.net <http://Biopackages.net>
>     project is still active, we are
>     >> AD> regularly adding packages to it, mostly R packages
>     lately.  Most
>     >> AD> of the systems we use are running CentOS at this point,
>     which is
>     >> AD> why you have not seen support for FC6 yet.  There is nothing
>     >> AD> preventing building FC6 packages aside from lack of time to
>     set up
>     >> AD> the FC6 build farm nodes.
>     >>
>     >> Hi Allen and other,
>     >>
>     >> Great news to hear that Biopackages.net
>     <http://Biopackages.net> is still active!  I would like
>     >> to help out if possible.  I don't believe in "FUD" either... ;)
>     >>
>     >> AD> If you're interested in packaging BioPerl or other
>     >> AD> bioinformatics-related software, please join the Biopackages
>     >> AD> project on SourceForge.  We object to the Fedora Extras FUD
>     >> AD> tactics used to discourage people from using 3rd party
>     >> AD> repositories, and suspect they may not want to host some of our
>     >> AD> data packages, such as the >2GB genome packages.  Biopackages
>     >> AD> project is likely to partially merge with RPMForge.  We are
>     >> AD> already discussing with them how best to do it.
>     >>
>     >> The packages that I created which are currently available in Fedora
>     >> Packages are Perl dependencies which, as I said are useful for
>     >> packages outside the bioinformatics purview.  I do have a (base)
>     >> bioperl package in review, but it is not yet released.
>     >>
>     >> As for third-party repos, I don't object to them at all, and
>     for some
>     >> kinds of projects they are indeed appropriate. (e.g. for non-free
>     >> stuff like Livna or Freshrpms).  However I do have practical
>     concerns
>     >> about repository mixing, but I think that it does need to be
>     handled
>     >> carefully but that co-operation between Fedora and third-party
>     repos
>     >> can make it work.
>     >>
>     >> For example, one practical concern is that as of the
>     >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there
>     >> will be no distinction at the repository-level between formerly
>     Extras
>     >> packages and formerly Core packages (as of now there are only
>     "Fedora
>     >> Packages"), which means that it will not be possible for
>     third-party
>     >> repos to limit their dependencies to just those in a former
>     base set
>     >> (i.e. excluding Extras).
>     >>
>     >> I agree that a few years ago (circa 2003-2004) there was
>     concern about
>     >> the way some third party repositories were treated somewhat
>     badly by
>     >> the (then) Fedora Extras (with some people going so far as to
>     say that
>     >> third-party repos were bad in principle and should always be
>     ignored
>     >> which I disagree with too).  But it seems to me that culture has
>     >> shifted since, with some notable packagers such as Matthias
>     Saou (of
>     >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to
>     >> Fedora itself.  The process of contributing has also become much
>     >> simpler and reviews are conducted speedily and efficiently, I had
>     >> packages in the repository in a matter of a few days from initial
>     >> submission.  Freshrpms itself now enables and depends on the (old)
>     >> Extras.
>     >>
>     >> The real question for me, then is what packages it makes sense
>     to go
>     >> in Fedora, and what packages go in third party
>     repositories.  It seems
>     >> to me that in the case of Perl packages which could be
>     dependencies
>     >> for other packages not specific to the third-party repo in
>     question,
>     >> it makes sense for them to go into Fedora itself, so I think I will
>     >> continue to package them.  This lessens the load on the
>     third-party
>     >> repo, while making them available for all other third-party repos.
>     >> (This is approach that Freshrpms seems to be taking, Matthias has
>     >> contributed most packages back to Fedora now other than the
>     non-free
>     >> ones).
>     >>
>     >> At the other end of the spectrum are packages like you mention,
>     genome
>     >> packages, which may be of concern because of their size and/or
>     highly
>     >> specialised nature, and, as you say, may make sense to go in a
>     >> third-party repo like Biopackages.net
>     <http://Biopackages.net>.  Also packages which can't be
>     >> packaged by Fedora for legal reasons like Clustal could/should
>     go in
>     >> Biopackages.net <http://Biopackages.net>.
>     >>
>     >> In the middle are packages like bioperl itself which are
>     potentially
>     >> useful to perhaps a wider group of people than the genome
>     packages but
>     >> may not necessarily be dependencies for other packages.  I lean
>     >> towards making them part of Fedora so that they will be
>     available of
>     >> out the box on the planned "Everything" DVD ISO, but I welcome a
>     >> discussion on this.
>     >>
>     >> As I said, I'm glad to hear that Biopackages.net
>     <http://Biopackages.net> is alive and well and
>     >> I welcome a discussion on how upstream Fedora can usefully interact
>     >> with Biopackages.net <http://Biopackages.net> (I guess perhaps
>     on the Biopackages.net <http://Biopackages.net> list).
>     >>
>     >> Regards,
>     >> Alex
>     >>
>     >> PS.  As the upstream author If you could clarify the license on
>     >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be
>     great.
>     >> --
>     >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology,
>     University of Arizona
>     >>
>     >>
>     >>
>     >> _______________________________________________
>     >> Bioperl-l mailing list
>     >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>     >>
>     > _______________________________________________
>     > Bioperl-l mailing list
>     > Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>     _______________________________________________
>     Bioperl-l mailing list
>     Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu <mailto:michelse at cshl.edu> 


From alexl at users.sourceforge.net  Wed Apr 18 21:17:34 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 18 Apr 2007 18:17:34 -0700
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <4626058B.8090801@sendu.me.uk> (Sendu Bala's message of "Wed\,
	18 Apr 2007 12\:48\:27 +0100")
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <e43b2x6u35.fsf@delpy.biol.berkeley.edu>

>>>>> "SB" == Sendu Bala  writes:

[...]

SB> I can remove the modules from cvs and create
SB> bioperl-run-1.5.2_101, resolving the packaging issue. I plan on
SB> doing precisely this within the next seven days unless someone
SB> puts a hand up to stop me.

In the meantime, until bioperl-run-1.5.2_101 is available, is it safe
just to remove these four .pm files during the packaging so they
don't get installed?  It looks like these four files are
self-contained and are only required/used by each other:

$ grep -r AccessorMaker *
Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar class min_version)]);
Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(input_file output_file)]);

$ grep -r AbstractRunner *
Tools/Run/JavaRunner.pm:use Bio::Tools::Run::AbstractRunner;
Tools/Run/JavaRunner.pm:our @ISA=qw(Bio::Tools::Run::AbstractRunner);
Tools/Run/AbstractRunner.pm:package Bio::Tools::Run::AbstractRunner;
Tools/Run/AbstractRunner.pm:Bio::Tools::Run::AbstractRunner

$ grep -r JavaRunner *
Tools/Run/Phylo/Forester/SDI.pm:use Bio::Tools::Run::JavaRunner;
Tools/Run/Phylo/Forester/SDI.pm:our @ISA=qw(Bio::Tools::Run::JavaRunner);
Tools/Run/JavaRunner.pm:package Bio::Tools::Run::JavaRunner;
Tools/Run/JavaRunner.pm: Usage   : $runner = Bio::Tools::Run::JavaRunner->new(-jar => $jar)
Tools/Run/JavaRunner.pm: Function: Builds a new Bio::Tools::Run::JavaRunner object
Tools/Run/JavaRunner.pm: Returns : Bio::Tools::Run::JavaRunner
Tools/Run/JavaRunner.pm:Bio::Tools::Run::JavaRunner - run java programs
Tools/Run/JavaRunner.pm:   my $runner = Bio::Tools::Run::JavaRunner->new(-jar => $jar);

$ grep -r Forester *
Tools/Run/Phylo/Forester/SDI.pm:package Bio::Tools::Run::Phylo::Forester::SDI;
Tools/Run/Phylo/Forester/SDI.pm:Bio::Tools::Run::Phylo::Forester::SDI
Tools/Run/Phylo/Forester/SDI.pm:    my $runner = Bio::Tools::Run::Phylo::Forester::SDI->new();
Tools/Run/Phylo/Forester/SDI.pm:This wrapper is for SDI in Forester package. 
Tools/Run/Phylo/Forester/SDI.pm:For more details on Forester, please see 

Alex


From sac at bioperl.org  Thu Apr 19 01:14:02 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 18 Apr 2007 22:14:02 -0700
Subject: [Bioperl-l] GenericHit->start/end needs tiled hsps?
In-Reply-To: <461F3FBA.2010101@sendu.me.uk>
References: <461F3FBA.2010101@sendu.me.uk>
Message-ID: <8f200b4c0704182214j77a4accy72f71b2061764d5b@mail.gmail.com>

Sendu,

Your thinking here seems correct and in fact agrees with the documentation
for those methods:

start():  If there is more than one HSP, the lowest start
           value of all HSPs is returned.

end():  If there is more than one HSP, the largest end
          value of all HSPs is returned.

It would be fine with me to change the implementation in GenericHit as you
suggest and to not tile the HSPs. Tiling is only necessary for data that is
summed across the region covered by all HSPs, as is done by these methods:
matches(), gaps(), frac_* and percent_*.

Steve

On 4/13/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Hi all,
>
> I want to double-check my thinking regarding
> Bio::Search::Hit::GenericHit->start() and end(). Right now the docs
> claim that hsps of the hit object must be tiled before the answer can be
> produced. The code is implemented in that way
> (Bio::Search::SearchUtils::tile_hsps($self)).
>
> Yet as far as I can see, all you need to do is loop through all hsps and
> pick out the smallest start and largest end respectively in terms of
> subject and query.
>
> This comes up because I have a blast report where a single hit contains
> over 80000 hsps and the tiling takes over an hour (I gave up on it,
> don't know how long it really takes). The simple loop through hsps takes
> seconds or less.
>
> Now in this situation the answer isn't especially useful (to me). An
> alternative way of fixing the problem would be to re-write the tiling
> algorithm (again) to somehow make it hundreds of times faster, then
> provide some way in start() and end() for the user to request the start
> and end of the best contig, or other contig of choice. Easier said than
> done though!
>
>
> What do people think?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bix at sendu.me.uk  Thu Apr 19 06:52:45 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 11:52:45 +0100
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <e43b2x6u35.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>	<4626058B.8090801@sendu.me.uk>
	<e43b2x6u35.fsf@delpy.biol.berkeley.edu>
Message-ID: <462749FD.3080603@sendu.me.uk>

Alex Lancaster wrote:
>>>>>> "SB" == Sendu Bala  writes:
> 
> [...]
> 
> SB> I can remove the modules from cvs and create
> SB> bioperl-run-1.5.2_101, resolving the packaging issue. I plan on
> SB> doing precisely this within the next seven days unless someone
> SB> puts a hand up to stop me.
> 
> In the meantime, until bioperl-run-1.5.2_101 is available, is it safe
> just to remove these four .pm files during the packaging so they
> don't get installed?

Sure, go ahead with that.


From bix at sendu.me.uk  Thu Apr 19 06:51:53 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 11:51:53 +0100
Subject: [Bioperl-l] To be deprecated: Bio::Tools::Run::AbstractRunner,
 Bio::Tools::Run::Phylo::Forester::SDI and
 Bio::Tools::Run::JavaRunner
In-Reply-To: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
Message-ID: <462749C9.1040503@sendu.me.uk>

[repost under new subject to make sure it is seen by those it may concern]

[BCC: Juguang Xiao at a variety of possible email addresses]


Alex Lancaster wrote:
> In packaging bioperl-run for Fedora, I think I stumbled across a bug
> in the bioperl-run package.  It appears from this edit:
> 
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
> 
> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
> bioperl-run 1.5.2_100 still contains modules that use this module:
> 
> $ cd bioperl-run-1.5.2_100
> $ grep -r AccessorMaker  *
> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
> class min_version)]);
> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
> ('$'=>[qw(input_file output_file)]);

It looks like I've implemented a similar idea to AccessorMaker and
AbstractRunner in Bio::Root::Root->_set_from_args() and
Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
AbstractRunner I propose deprecating it immediately.

Forester::SDI and JavaRunner have no tests which is why we didn't notice
the problem. Since they've been out of use for a number of years now I
also propose their immediate deprecation. Alternatively, it may not be
too difficult to just update them to use _set_from_args and _setparams,
but I've nothing to test against (and JavaRunner is self-described as
"probably incomplete").


I can remove the modules from cvs and create bioperl-run-1.5.2_101,
resolving the packaging issue. I plan on doing precisely this within the
next seven days unless someone puts a hand up to stop me.


From bix at sendu.me.uk  Thu Apr 19 08:17:19 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 13:17:19 +0100
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
In-Reply-To: <200704050059.l350xNF07452@cricket.bio.indiana.edu>
References: <200704050059.l350xNF07452@cricket.bio.indiana.edu>
Message-ID: <46275DCF.6030103@sendu.me.uk>

Don Gilbert wrote:
> Dear Bioperl list,
> 
> There is a small bug in what I think is the current Bio::Tools::GFF.pm,
> that blocks output of Target attributes (in gff3 at least).  See a patch
> here
> 
> http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF

The patch was applied by Brian but is currently generating this warning:

./Build test --test_files t/GbrowseGFF.t --verbose
t/GbrowseGFF....1..5
ok 1 - use Bio::SearchIO;
ok 2 - use Bio::SearchIO::Writer::GbrowseGFF;
ok 3 - use Bio::Root::IO;
ok 4
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
ok 5
ok
All tests successful.

Can this patch be looked at again and rolled-back if the problem can't 
be fixed?


Cheers,
Sendu.


From sm8 at sanger.ac.uk  Thu Apr 19 07:49:30 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 12:49:30 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>

 is there an existing method for copying a Bio::Tree::Tree object by
value?

All the best,
Stephen


From bix at sendu.me.uk  Thu Apr 19 08:43:44 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 13:43:44 +0100
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
Message-ID: <46276400.2020207@sendu.me.uk>

Stephen Montgomery wrote:
>  is there an existing method for copying a Bio::Tree::Tree object by
> value?

What do you mean? Describe in a little more detail what you're trying to do.


From sm8 at sanger.ac.uk  Thu Apr 19 09:13:44 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 14:13:44 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>

my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object

as an example, a method like
my $tree_copy = $tree->clone; #copies by value (this method doesn't
exist) or
my $tree_copy = Storable::dclone($tree); 

Cheers,
Stephen

-----Original Message-----
From: Sendu Bala [mailto:bix at sendu.me.uk] 
Sent: 19 April 2007 13:44
To: Stephen Montgomery
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] tree copy by-value

Stephen Montgomery wrote:
>  is there an existing method for copying a Bio::Tree::Tree object by
> value?

What do you mean? Describe in a little more detail what you're trying to
do.


From jason at bioperl.org  Thu Apr 19 09:19:05 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 19 Apr 2007 06:19:05 -0700
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
Message-ID: <35813ADC-6597-46FC-8FB8-C70AA3541BEC@bioperl.org>

I don't think so, worst case you serialize to/from TreeIO and get a  
new one, but the _internal_id of the nodes will be necessarily  
different (and new).

-jason
On Apr 19, 2007, at 4:49 AM, Stephen Montgomery wrote:

>  is there an existing method for copying a Bio::Tree::Tree object by
> value?
>
> All the best,
> Stephen
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From bix at sendu.me.uk  Thu Apr 19 09:24:41 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 14:24:41 +0100
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>
Message-ID: <46276D99.2060108@sendu.me.uk>

Stephen Montgomery wrote:
> my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object
> 
> as an example, a method like
> my $tree_copy = $tree->clone; #copies by value (this method doesn't
> exist) or
> my $tree_copy = Storable::dclone($tree); 

Right, sorry for being a little slow on the uptake. As a matter of fact 
I recently added _clone() to Bio::Tree::TreeFunctionsI which does a 
"safe tree clone that doesn't seg fault". Its undocumented and I thought 
would only be needed by simplify_to_leaves_string(), but I guess I can 
document it and make it public (ie. remove the underscore from the name) 
if this might be popular.

Oh, it's also not that well tested, so proceed with caution and provide 
feedback if you can.


Cheers,
Sendu.


From ewijaya at gmail.com  Thu Apr 19 09:27:45 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Thu, 19 Apr 2007 21:27:45 +0800
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
	Enable Connector
Message-ID: <3521d3670704190627u6aba98b1nc3892833b6a77c1c@mail.gmail.com>

Dear expert,

My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png
is created with the script (down below).

How can I modify the script such that:

1. The arrow track is represented in negative form.
    I.e. instead of 1 to 300, we use -300 to 0.

I tried this, but won't do:

my $flen = Bio::SeqFeature::Generic->new(
        -start => -300,
        -end => 0, );

And how can I make these number to appear
for every gridpoints (not just two as I have now).


2. How can I enable the connector with grid just like
   I had in the first panel? (as you can see, my script
   has connector added, but still doesn't show).

All, in all, I am trying to mimic this figure:
http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg

And here is my script:

__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
use List::Compare;
use List::Util qw(max);

my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
my @seqid = keys %nofseq;
my @lenlist = values %nofseq;
my $maxlen = max (@lenlist);
#print Dumper \@seqid ;

my $panel = Bio::Graphics::Panel->new(
    -length    => 300,
    -width     => 500,
    -pad_left  => 70,
    -pad_right => 70,
    -key_style => 'left',
    -connector => 'solid',
);

my $flen = Bio::SeqFeature::Generic->new(
        -start => 1,   # tried -300
        -end => 300, # and 0, but failed.
);

    my $track1 = $panel->add_track(
        $flen,
        -glyph   => 'arrow',
        -tick    => 2,
        -fgcolor => 'black',
        -double  => 1,
    );


my %nlist;

while ( <DATA> ) {
    chomp;
    next if /^\#/;
    my ($sqi,$pos,$str,$progname) = split /\,/;
    my $start = $pos + $nofseq{$sqi};
    my $end = $start + length($str) + 1;
    push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
}

# Check which sequence has no motifs;
my @bssi = keys %nlist;

my $lc = List::Compare->new(\@seqid, \@bssi);
my @comp = $lc->get_unique;


foreach my $comp ( @comp  ) {
    push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";

}

my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );

foreach my $seqid ( sort keys %nlist ) {


    my $track = $panel->add_track(
        -glyph     => 'graded_segments',
        -key       => "SEQ ". $seqid,
        -connector => "dashed"
        -label     => 1,
        -bgcolor   => 'blue',
		-bump      =>  +1,
		-height    =>  8,
        -min_score => 0,
        -max_score => 5000
    );


    foreach my $range ( @{$nlist{$seqid}} ) {

        my ($st,$en,$progname) = split(" ", $range);
        my $dname = " ";
        if ( $st != 0 and $en !=0  ) {
           $dname = "Seq ". $seqid;
        }

        my $score;
        if ( $progname eq "WEEDER" ) {
            $score = $prog_color{$progname};

        }
        elsif ($progname eq "MEME" ) {
            $score = $prog_color{$progname};
        }

        my $feature = Bio::SeqFeature::Generic->new(
            -display_name => $dname,
            -start        => $st,
            -end          => $en,
            -score        => $score
        );

        $track->add_feature($feature);

    }

}

print $panel->png;

#The DATA is simply just list of string and its location in their
respective sequence.
# The figure is just the plot of it out.
__DATA__
# sequence number,pos,binding sites,program
4,-63,AGCTTTCTCT,MEME
0,-22,AACTTTGTAC,WEEDER
1,-13,AAGTTTCTCT,WEEDER
5,-228,ACCTTTGCCA,MEME
5,-121,AAGTTTGTCT,WEEDER
5,-88,AAGTTTTTCC,SPACE
3,-148,AACTTAGTCA,MEME
0,-184,AACTTTGTCT,MEME
__END__


Thanks and hope to hear from you again.

--
Regards,
Edward WIJAYA


From sm8 at sanger.ac.uk  Thu Apr 19 09:33:18 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 14:33:18 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B038675FB@exchsrv2.internal.sanger.ac.uk>

Thanks Sendu!  That is perfect.
Cheers
Stephen

-----Original Message-----
From: Sendu Bala [mailto:bix at sendu.me.uk] 
Sent: 19 April 2007 14:25
To: Stephen Montgomery
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] tree copy by-value

Stephen Montgomery wrote:
> my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object
> 
> as an example, a method like
> my $tree_copy = $tree->clone; #copies by value (this method doesn't
> exist) or
> my $tree_copy = Storable::dclone($tree); 

Right, sorry for being a little slow on the uptake. As a matter of fact 
I recently added _clone() to Bio::Tree::TreeFunctionsI which does a 
"safe tree clone that doesn't seg fault". Its undocumented and I thought

would only be needed by simplify_to_leaves_string(), but I guess I can 
document it and make it public (ie. remove the underscore from the name)

if this might be popular.

Oh, it's also not that well tested, so proceed with caution and provide 
feedback if you can.


Cheers,
Sendu.


From ewijaya at i2r.a-star.edu.sg  Thu Apr 19 09:59:05 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Thu, 19 Apr 2007 21:59:05 +0800
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
	Enable Connector
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>


Dear expert,

My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png <http://defiant.i2r.a-star.edu.sg/%7Eewijaya/misc/foo2.png> 
is created with the script (down below).

How can I modify the script such that:

1. The arrow track is represented in negative form.
   I.e. instead of 1 to 300, we use -300 to 0.

I tried this, but won't do:

my $flen = Bio::SeqFeature::Generic->new(
       -start => -300,
       -end => 0, );

And how can I make these number to appear
for every gridpoints (not just two as I have now).


2. How can I enable the connector with grid just like
  I had in the first panel? (as you can see, my script
  has connector added, but still doesn't show).

All, in all, I am trying to mimic this figure:
http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg <http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg> 

And here is my script:

__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
use List::Compare;
use List::Util qw(max);

my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
my @seqid = keys %nofseq;
my @lenlist = values %nofseq;
my $maxlen = max (@lenlist);
#print Dumper \@seqid ;

my $panel = Bio::Graphics::Panel->new(
   -length    => 300,
   -width     => 500,
   -pad_left  => 70,
   -pad_right => 70,
   -key_style => 'left',
   -connector => 'solid',
);

my $flen = Bio::SeqFeature::Generic->new(
       -start => 1,   # tried -300
       -end => 300, # and 0, but failed.
);

   my $track1 = $panel->add_track(
       $flen,
       -glyph   => 'arrow',
       -tick    => 2,
       -fgcolor => 'black',
       -double  => 1,
   );


my %nlist;

while ( <DATA> ) {
   chomp;
   next if /^\#/;
   my ($sqi,$pos,$str,$progname) = split /\,/;
   my $start = $pos + $nofseq{$sqi};
   my $end = $start + length($str) + 1;
   push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
}

# Check which sequence has no motifs;
my @bssi = keys %nlist;

my $lc = List::Compare->new(\@seqid, \@bssi);
my @comp = $lc->get_unique;


foreach my $comp ( @comp  ) {
   push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";

}

my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );

foreach my $seqid ( sort keys %nlist ) {


   my $track = $panel->add_track(
       -glyph     => 'graded_segments',
       -key       => "SEQ ". $seqid,
       -connector => "dashed"
       -label     => 1,
       -bgcolor   => 'blue',
               -bump      =>  +1,
               -height    =>  8,
       -min_score => 0,
       -max_score => 5000
   );


   foreach my $range ( @{$nlist{$seqid}} ) {

       my ($st,$en,$progname) = split(" ", $range);
       my $dname = " ";
       if ( $st != 0 and $en !=0  ) {
          $dname = "Seq ". $seqid;
       }

       my $score;
       if ( $progname eq "WEEDER" ) {
           $score = $prog_color{$progname};

       }
       elsif ($progname eq "MEME" ) {
           $score = $prog_color{$progname};
       }

       my $feature = Bio::SeqFeature::Generic->new(
           -display_name => $dname,
           -start        => $st,
           -end          => $en,
           -score        => $score
       );

       $track->add_feature($feature);

   }

}

print $panel->png;

#The DATA is simply just list of string and its location in their
respective sequence.
# The figure is just the plot of it out.
__DATA__
# sequence number,pos,binding sites,program
4,-63,AGCTTTCTCT,MEME
0,-22,AACTTTGTAC,WEEDER
1,-13,AAGTTTCTCT,WEEDER
5,-228,ACCTTTGCCA,MEME
5,-121,AAGTTTGTCT,WEEDER
5,-88,AAGTTTTTCC,SPACE
3,-148,AACTTAGTCA,MEME
0,-184,AACTTTGTCT,MEME
__END__


Thanks and hope to hear from you again.

--
Regards,

Edward WIJAYA

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From ioanniskirmitzoglou at gmail.com  Thu Apr 19 10:06:06 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Thu, 19 Apr 2007 17:06:06 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
Message-ID: <b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>

I have reported it as a bug on the bugzilla but due to bugzilla problems I
was not able to attach my code and/or sample m10 files.
Nevertheless here is the code that converts an m10 fasta output to an m8
BLAST output which is parseable by the vast majority of software.

<----------- CODE BEGINS HERE ------------------->

#!/usr/bin/perl -w

=head1 NAME

fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular output

=head1 SYNOPSIS

 fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...

=head1 DESCRIPTION

Command line options:
  --header                -- boolean flag to print column header
  -o/--out                -- optional outputfile to write data,
                             otherwise will write to STDOUT
  -h/--help               -- show this documentation

Not technically a SearchIO script as this doesn't use any Bioperl
components but is a useful and fast.  The output is tabular output
with the standard NCBI -m8 columns.

 queryname
 hit name
 percent identity
 alignment length
 number mismatches
 number gaps
 query start  (if on rev-strand start > end)
 query end
 hit start (if on rev-strand start > end)
 hit end
 evalue
 bit score

Additionally 4 more columns are provided
 percent similar
 query length
 hit length
 query gaps
 hit gaps

=head1 AUTHOR - Ioannis Kirmitzoglou

Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org

=head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou

Headers as well as portions of code were taken
from fastam9_to_table.pl by Jason Stajich

=head1 DISCLAIMER

Copyright (c) <2007> <Ioannis Kirmitzolgou>

Permission to use, copy, modify, merge, publish and distribute
this software and its documentation, with or without modification,
for any purpose, and without fee or royalty to the copyright holder(s)
is hereby granted with no restictions and/or prerequisites.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

=cut

use strict;
use Getopt::Long;

my %data=();

my $outfile=''; my $header='';
GetOptions(
    'header'              => \$header,
    'o|out|outfile:s'     => \$outfile,
    'h|help'              => sub { exec('perldoc',$0); exit; }
       );

my $outfh;
if( $outfile ) {
    open($outfh, ">$outfile") || die("$outfile: $!");
} else {
    $outfh = \*STDOUT;
}


$/="\n>>>";

my @fields = qw(qname hname percid alen mmcount gapcount
        qstart qend hstart hend evalue bits percsim qlen hlen qgap hgap);

print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)), "\n" if
$header;

while (<>) {

        chomp;
        if ($_=~/^>/ || $_=~/^\#/) {next;}
        my @hits = split(/\d+>>/, $_);
        @hits= split("\n>>", $hits[0]);

        my $hit = shift @hits;

        ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d+)/));

        foreach my $align (@hits) {

            my @details= split ("\n>", $align);
           my $detail = shift @details;
            ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
            $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
            $data{'bits'}=$1;
            $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
            $data{'evalue'}=$1;

            my $term = quotemeta("; sw_score");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'score'}=$1;

            $term = quotemeta("; sw_ident:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'percid'}=$1;

            $term = quotemeta("; sw_sim:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'percsim'}=$1;

            $term = quotemeta("; sw_overlap:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'alen'}=$1;

            $detail = shift @details;

            $term = quotemeta("; al_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'qstart'}=$1;

            $term = quotemeta("; al_stop:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'qend'}=$1;

            $term = quotemeta("; al_display_start:");
            $term =~ s/\\ /\\s/;
            my $lakis ='';
            $detail=~ /$term.+\s\-*([\-\w\s]+)/g;

            $data{'qgap'}=($1 =~ tr/\-//);

            $detail = shift @details;

            $term = quotemeta("; sq_len:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hlen'}=$1;

            $term = quotemeta("; al_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hstart'}=$1;

            $term = quotemeta("; al_stop:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hend'}=$1;

            $term = quotemeta("; al_display_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
            $data{'hgap'}=($1 =~ tr/-//);
            $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
            $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
$data{'alen'}) + $data{'gapcount'});

for ( $data{'percid'}, $data{'percsim'} ) {
    $_ = sprintf("%.2f",$_*100);
}

            print $outfh join( "\t",map { $data{$_} } @fields),"\n"
        }

}

<----------------- CODE ENDS HERE ---------------------->

-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


From gilbertd at cricket.bio.indiana.edu  Thu Apr 19 13:38:05 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Thu, 19 Apr 2007 12:38:05 -0500 (EST)
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
Message-ID: <200704191738.l3JHc5s10658@cricket.bio.indiana.edu>


I'm not sure what kind of test data would have bad Target strings,
but this should clear up those warnings -- insert the '+' line:

  sub _gff3_string:
    for my $tag ( @all_tags ) {
       ##dgg.patch.was# next if $tag eq 'Target';
      if ($tag eq 'Target'
         and ! $origfeat->isa('Bio::SeqFeature::FeaturePair'))
       {  
       my($target_id, $b,$e,$strand)= $feat->get_tag_values($tag); 
+       next unless(defined($e) && defined($b) && $target_id);
       ($b,$e)= ($e,$b) if(defined $strand && $strand<0);
       $target_id =~ s/([\t\n\r%&\=;,])/sprintf("%%%X",ord($1))/ge;    
       push @groups, sprintf("Target=%s %d %d", $target_id,$b,$e);
       next;
       }

-- Don


From stefan.kirov at bms.com  Thu Apr 19 14:01:28 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 19 Apr 2007 14:01:28 -0400
Subject: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
In-Reply-To: <4626E1A3.4070405@i2r.a-star.edu.sg>
References: <462473B7.4070905@i2r.a-star.edu.sg> <4624D9F3.5050805@bms.com>
	<4626E1A3.4070405@i2r.a-star.edu.sg>
Message-ID: <4627AE78.200@bms.com>

I will see if I can post it or perhaps commit something to the bp 
scripts. In any case it won't be before Monday- I have deadlines to meet.
Stefan
Edward WIJAYA wrote:
>
> Hi Stefan,
>> I believe you can use Bio::Graphics for this. I have done so in the 
>> past and I find it quite straightforward.
> Do you still have that sample script? I don't find it simple to do.
> I was thinking of doing something like this:
>
> http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg 
>
>
> Appreciate if you can share it with us.
>
> -- 
> Edward
>
>
>>
>>
>> Edward WIJAYA wrote:
>>> Dear all,
>>>
>>> How do you usually construct a graph for TFBS (binding sites) position
>>> within their sequences? I was thinking to build something like this 
>>> kind of
>>> visualization tool:
>>>
>>> http://research.i2r.a-star.edu.sg/Dragon/Motif_Search/cgi-bin/tmp/29740M1.html 
>>>
>>>
>>> or
>>>
>>> http://wingless.cs.washington.edu:8080/assessment/servlet?filenameID=submission/SPACE.D9F26D506DE90E9A0A0010BB6BCCAEF3&pageType=visualizationForm&action=Visualize+It 
>>>
>>>
>>> Is there a BioPerl module to do that?
>>>
>>> -- 
>>> Edward
>>>
>>>
>>>
>>> ------------ Institute For Infocomm Research - Disclaimer -------------
>>> This email is confidential and may be privileged.  If you are not 
>>> the intended recipient, please delete it and notify us immediately. 
>>> Please do not copy or use it for any purpose, or disclose its 
>>> contents to any other person. Thank you.
>>> --------------------------------------------------------
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>   
>>
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the 
> intended recipient, please delete it and notify us immediately. Please 
> do not copy or use it for any purpose, or disclose its contents to any 
> other person. Thank you.
> --------------------------------------------------------
>


From shameer at ncbs.res.in  Fri Apr 20 07:45:23 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Fri, 20 Apr 2007 17:15:23 +0530 (IST)
Subject: [Bioperl-l] Protparam using BioPerl
In-Reply-To: <200704180718.48811.sdavis2@mail.nih.gov>
References: <4624E32A.6010704@bms.com>
	<36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
	<200704180718.48811.sdavis2@mail.nih.gov>
Message-ID: <45682.192.168.1.1.1177069523.squirrel@mail.ncbs.res.in>

Hi,

I would like to know whether Bioperl have a wrapper for protparam from
Expasy.
I need to calculate Instability Index using Guruprasad et.al 1990 values
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2075190&dopt=Abstract)
for 100 sequences I did some googling and I didnt get any valid
information.

Thanks,
-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From basu at pharm.sunysb.edu  Fri Apr 20 12:37:57 2007
From: basu at pharm.sunysb.edu (Siddhartha Basu)
Date: Fri, 20 Apr 2007 12:37:57 -0400
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
 Enable Connector
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
Message-ID: <4628EC65.7070505@pharm.sunysb.edu>

Hi,

Wijaya Edward wrote:
> Dear expert,
> 
> My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png <http://defiant.i2r.a-star.edu.sg/%7Eewijaya/misc/foo2.png> 
> is created with the script (down below).
> 
> How can I modify the script such that:
> 
> 1. The arrow track is represented in negative form.
>    I.e. instead of 1 to 300, we use -300 to 0.
> 
> I tried this, but won't do:
> 
> my $flen = Bio::SeqFeature::Generic->new(
>        -start => -300,
>        -end => 0, );

It works if you pass the 'SeqFeature' object to the '-segment' option of 
  "Bio::Graphics::Panel".

  my $panel = Bio::Graphics::Panel->new(
    -length    => 300,
   -width     => 500,
   -pad_left  => 70,
   -pad_right => 70,
    -key_style => 'left',
   -connector => 'solid',
      -segment => $flen,
);

For more, read one of the previous posting,
http://article.gmane.org/gmane.comp.lang.perl.bio.general/1721/match=negative+seqfeature

-siddhartha

> 
> And how can I make these number to appear
> for every gridpoints (not just two as I have now).
> 
> 
> 2. How can I enable the connector with grid just like
>   I had in the first panel? (as you can see, my script
>   has connector added, but still doesn't show).
> 
> All, in all, I am trying to mimic this figure:
> http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg <http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg> 
> 
> And here is my script:
> 
> __BEGIN__
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> use List::Compare;
> use List::Util qw(max);
> 
> my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
> my @seqid = keys %nofseq;
> my @lenlist = values %nofseq;
> my $maxlen = max (@lenlist);
> #print Dumper \@seqid ;
> 
> my $panel = Bio::Graphics::Panel->new(
>    -length    => 300,
>    -width     => 500,
>    -pad_left  => 70,
>    -pad_right => 70,
>    -key_style => 'left',
>    -connector => 'solid',
> );
> 
> my $flen = Bio::SeqFeature::Generic->new(
>        -start => 1,   # tried -300
>        -end => 300, # and 0, but failed.
> );
> 
>    my $track1 = $panel->add_track(
>        $flen,
>        -glyph   => 'arrow',
>        -tick    => 2,
>        -fgcolor => 'black',
>        -double  => 1,
>    );
> 
> 
> 
> my %nlist;
> 
> while ( <DATA> ) {
>    chomp;
>    next if /^\#/;
>    my ($sqi,$pos,$str,$progname) = split /\,/;
>    my $start = $pos + $nofseq{$sqi};
>    my $end = $start + length($str) + 1;
>    push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
> }
> 
> # Check which sequence has no motifs;
> my @bssi = keys %nlist;
> 
> my $lc = List::Compare->new(\@seqid, \@bssi);
> my @comp = $lc->get_unique;
> 
> 
> foreach my $comp ( @comp  ) {
>    push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";
> 
> }
> 
> my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );
> 
> foreach my $seqid ( sort keys %nlist ) {
> 
> 
>    my $track = $panel->add_track(
>        -glyph     => 'graded_segments',
>        -key       => "SEQ ". $seqid,
>        -connector => "dashed"
>        -label     => 1,
>        -bgcolor   => 'blue',
>                -bump      =>  +1,
>                -height    =>  8,
>        -min_score => 0,
>        -max_score => 5000
>    );
> 
> 
>    foreach my $range ( @{$nlist{$seqid}} ) {
> 
>        my ($st,$en,$progname) = split(" ", $range);
>        my $dname = " ";
>        if ( $st != 0 and $en !=0  ) {
>           $dname = "Seq ". $seqid;
>        }
> 
>        my $score;
>        if ( $progname eq "WEEDER" ) {
>            $score = $prog_color{$progname};
> 
>        }
>        elsif ($progname eq "MEME" ) {
>            $score = $prog_color{$progname};
>        }
> 
>        my $feature = Bio::SeqFeature::Generic->new(
>            -display_name => $dname,
>            -start        => $st,
>            -end          => $en,
>            -score        => $score
>        );
> 
>        $track->add_feature($feature);
> 
>    }
> 
> }
> 
> print $panel->png;
> 
> #The DATA is simply just list of string and its location in their
> respective sequence.
> # The figure is just the plot of it out.
> __DATA__
> # sequence number,pos,binding sites,program
> 4,-63,AGCTTTCTCT,MEME
> 0,-22,AACTTTGTAC,WEEDER
> 1,-13,AAGTTTCTCT,WEEDER
> 5,-228,ACCTTTGCCA,MEME
> 5,-121,AAGTTTGTCT,WEEDER
> 5,-88,AAGTTTTTCC,SPACE
> 3,-148,AACTTAGTCA,MEME
> 0,-184,AACTTTGTCT,MEME
> __END__
> 
> 
> Thanks and hope to hear from you again.
> 
> --
> Regards,
> 
> Edward WIJAYA
> 
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Fri Apr 20 15:47:30 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 20 Apr 2007 15:47:30 -0400
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
In-Reply-To: <200704191738.l3JHc5s10658@cricket.bio.indiana.edu>
Message-ID: <C24E9112.DD2B%bosborne11@verizon.net>

Applied.


On 4/19/07 1:38 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu> wrote:

> 
> I'm not sure what kind of test data would have bad Target strings,
> but this should clear up those warnings -- insert the '+' line:
> 
>   sub _gff3_string:
>     for my $tag ( @all_tags ) {
>        ##dgg.patch.was# next if $tag eq 'Target';
>       if ($tag eq 'Target'
>          and ! $origfeat->isa('Bio::SeqFeature::FeaturePair'))
>        {  
>        my($target_id, $b,$e,$strand)= $feat->get_tag_values($tag);
> +       next unless(defined($e) && defined($b) && $target_id);
>        ($b,$e)= ($e,$b) if(defined $strand && $strand<0);
>        $target_id =~ s/([\t\n\r%&\=;,])/sprintf("%%%X",ord($1))/ge;
>        push @groups, sprintf("Target=%s %d %d", $target_id,$b,$e);
>        next;
>        }
> 
> -- Don
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ewijaya at i2r.a-star.edu.sg  Sat Apr 21 10:44:08 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sat, 21 Apr 2007 22:44:08 +0800
Subject: [Bioperl-l] Getting Gene Sequences with Bioperl
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06168D@mailbe01.teak.local.net>


Hi all,
 
Is there a BioPerl module that allow us to extract
gene sequence given a list of gene names (gene symbol)?
 
In particular we would pass window size of the sequence,
then returning  upstream, downstream or ORF sequences for that list of genes.
We may also prespecify the on specific organism or all organsims.
 
Is there also a freely downloadable gene database that support
BioPerl module for that task?
 
Thanks and hope to hear from you again.
 
--
Edward WIJAYA
SINGAPORE

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From hlapp at gmx.net  Sat Apr 21 13:14:10 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 21 Apr 2007 13:14:10 -0400
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
Message-ID: <19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>

I haven't kept track of this - did this go anywhere? Do we not have  
an -m10 fasta output parser in SearchIO? (I.e., my first thought  
would be that that would be the desired solution; am I misled in this?)

	-hilmar

On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:

> I have reported it as a bug on the bugzilla but due to bugzilla  
> problems I
> was not able to attach my code and/or sample m10 files.
> Nevertheless here is the code that converts an m10 fasta output to  
> an m8
> BLAST output which is parseable by the vast majority of software.
>
> <----------- CODE BEGINS HERE ------------------->
>
> #!/usr/bin/perl -w
>
> =head1 NAME
>
> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular  
> output
>
> =head1 SYNOPSIS
>
>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>
> =head1 DESCRIPTION
>
> Command line options:
>   --header                -- boolean flag to print column header
>   -o/--out                -- optional outputfile to write data,
>                              otherwise will write to STDOUT
>   -h/--help               -- show this documentation
>
> Not technically a SearchIO script as this doesn't use any Bioperl
> components but is a useful and fast.  The output is tabular output
> with the standard NCBI -m8 columns.
>
>  queryname
>  hit name
>  percent identity
>  alignment length
>  number mismatches
>  number gaps
>  query start  (if on rev-strand start > end)
>  query end
>  hit start (if on rev-strand start > end)
>  hit end
>  evalue
>  bit score
>
> Additionally 4 more columns are provided
>  percent similar
>  query length
>  hit length
>  query gaps
>  hit gaps
>
> =head1 AUTHOR - Ioannis Kirmitzoglou
>
> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>
> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>
> Headers as well as portions of code were taken
>> from fastam9_to_table.pl by Jason Stajich
>
> =head1 DISCLAIMER
>
> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>
> Permission to use, copy, modify, merge, publish and distribute
> this software and its documentation, with or without modification,
> for any purpose, and without fee or royalty to the copyright holder(s)
> is hereby granted with no restictions and/or prerequisites.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>
> =cut
>
> use strict;
> use Getopt::Long;
>
> my %data=();
>
> my $outfile=''; my $header='';
> GetOptions(
>     'header'              => \$header,
>     'o|out|outfile:s'     => \$outfile,
>     'h|help'              => sub { exec('perldoc',$0); exit; }
>        );
>
> my $outfh;
> if( $outfile ) {
>     open($outfh, ">$outfile") || die("$outfile: $!");
> } else {
>     $outfh = \*STDOUT;
> }
>
>
> $/="\n>>>";
>
> my @fields = qw(qname hname percid alen mmcount gapcount
>         qstart qend hstart hend evalue bits percsim qlen hlen qgap  
> hgap);
>
> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),  
> "\n" if
> $header;
>
> while (<>) {
>
>         chomp;
>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>         my @hits = split(/\d+>>/, $_);
>         @hits= split("\n>>", $hits[0]);
>
>         my $hit = shift @hits;
>
>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d 
> +)/));
>
>         foreach my $align (@hits) {
>
>             my @details= split ("\n>", $align);
>            my $detail = shift @details;
>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>             $data{'bits'}=$1;
>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>             $data{'evalue'}=$1;
>
>             my $term = quotemeta("; sw_score");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'score'}=$1;
>
>             $term = quotemeta("; sw_ident:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percid'}=$1;
>
>             $term = quotemeta("; sw_sim:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percsim'}=$1;
>
>             $term = quotemeta("; sw_overlap:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'alen'}=$1;
>
>             $detail = shift @details;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             my $lakis ='';
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>
>             $data{'qgap'}=($1 =~ tr/\-//);
>
>             $detail = shift @details;
>
>             $term = quotemeta("; sq_len:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hlen'}=$1;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>             $data{'hgap'}=($1 =~ tr/-//);
>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>             $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
> $data{'alen'}) + $data{'gapcount'});
>
> for ( $data{'percid'}, $data{'percsim'} ) {
>     $_ = sprintf("%.2f",$_*100);
> }
>
>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>         }
>
> }
>
> <----------------- CODE ENDS HERE ---------------------->
>
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Sat Apr 21 13:44:00 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 21 Apr 2007 10:44:00 -0700
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
Message-ID: <E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>

We don't have one yet. This is a new format introduced in the most  
recent release of FASTA.  Hopefully someone can make some time to add  
some code to SearchIO::fasta for it.

I do find that I when I need a fast FASTA to TAB converter that the  
simple script (fastam9_to_table) is more efficient that SearchIO  
framework so Ioannis is making a parallel one for the new m10  
output.  So I think having both is useful.

-jason
On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:

> I haven't kept track of this - did this go anywhere? Do we not have
> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> would be that that would be the desired solution; am I misled in  
> this?)
>
> 	-hilmar
>
> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>
>> I have reported it as a bug on the bugzilla but due to bugzilla
>> problems I
>> was not able to attach my code and/or sample m10 files.
>> Nevertheless here is the code that converts an m10 fasta output to
>> an m8
>> BLAST output which is parseable by the vast majority of software.
>>
>> <----------- CODE BEGINS HERE ------------------->
>>
>> #!/usr/bin/perl -w
>>
>> =head1 NAME
>>
>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
>> output
>>
>> =head1 SYNOPSIS
>>
>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>
>> =head1 DESCRIPTION
>>
>> Command line options:
>>   --header                -- boolean flag to print column header
>>   -o/--out                -- optional outputfile to write data,
>>                              otherwise will write to STDOUT
>>   -h/--help               -- show this documentation
>>
>> Not technically a SearchIO script as this doesn't use any Bioperl
>> components but is a useful and fast.  The output is tabular output
>> with the standard NCBI -m8 columns.
>>
>>  queryname
>>  hit name
>>  percent identity
>>  alignment length
>>  number mismatches
>>  number gaps
>>  query start  (if on rev-strand start > end)
>>  query end
>>  hit start (if on rev-strand start > end)
>>  hit end
>>  evalue
>>  bit score
>>
>> Additionally 4 more columns are provided
>>  percent similar
>>  query length
>>  hit length
>>  query gaps
>>  hit gaps
>>
>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>
>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>
>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>
>> Headers as well as portions of code were taken
>>> from fastam9_to_table.pl by Jason Stajich
>>
>> =head1 DISCLAIMER
>>
>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>
>> Permission to use, copy, modify, merge, publish and distribute
>> this software and its documentation, with or without modification,
>> for any purpose, and without fee or royalty to the copyright holder 
>> (s)
>> is hereby granted with no restictions and/or prerequisites.
>>
>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND  
>> NONINFRINGEMENT.
>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>
>> =cut
>>
>> use strict;
>> use Getopt::Long;
>>
>> my %data=();
>>
>> my $outfile=''; my $header='';
>> GetOptions(
>>     'header'              => \$header,
>>     'o|out|outfile:s'     => \$outfile,
>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>>        );
>>
>> my $outfh;
>> if( $outfile ) {
>>     open($outfh, ">$outfile") || die("$outfile: $!");
>> } else {
>>     $outfh = \*STDOUT;
>> }
>>
>>
>> $/="\n>>>";
>>
>> my @fields = qw(qname hname percid alen mmcount gapcount
>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
>> hgap);
>>
>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>> "\n" if
>> $header;
>>
>> while (<>) {
>>
>>         chomp;
>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>>         my @hits = split(/\d+>>/, $_);
>>         @hits= split("\n>>", $hits[0]);
>>
>>         my $hit = shift @hits;
>>
>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> +)/));
>>
>>         foreach my $align (@hits) {
>>
>>             my @details= split ("\n>", $align);
>>            my $detail = shift @details;
>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>>             $data{'bits'}=$1;
>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>>             $data{'evalue'}=$1;
>>
>>             my $term = quotemeta("; sw_score");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'score'}=$1;
>>
>>             $term = quotemeta("; sw_ident:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percid'}=$1;
>>
>>             $term = quotemeta("; sw_sim:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percsim'}=$1;
>>
>>             $term = quotemeta("; sw_overlap:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'alen'}=$1;
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             my $lakis ='';
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>
>>             $data{'qgap'}=($1 =~ tr/\-//);
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; sq_len:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hlen'}=$1;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>             $data{'hgap'}=($1 =~ tr/-//);
>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>>             $data{'mmcount'} = $data{'alen'} - ( int($data 
>> {'percid'} *
>> $data{'alen'}) + $data{'gapcount'});
>>
>> for ( $data{'percid'}, $data{'percsim'} ) {
>>     $_ = sprintf("%.2f",$_*100);
>> }
>>
>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>>         }
>>
>> }
>>
>> <----------------- CODE ENDS HERE ---------------------->
>>
>> -- 
>>
>> *Ioannis Kirmitzoglou*, MSc
>> PhD. Student,
>> Bioinformatics Research Laboratory
>> Department of Biological Sciences
>> University of Cyprus
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From akozik at atgc.org  Sat Apr 21 13:40:47 2007
From: akozik at atgc.org (Alexander Kozik)
Date: Sat, 21 Apr 2007 10:40:47 -0700
Subject: [Bioperl-l] ncbi blast -V T option
Message-ID: <462A4C9F.8010902@atgc.org>

Hi all,

It was many postings about parsing problems of stand-alone (local) NCBI 
Blast output of version 2.2.15 or later. Recently, I (re?)-discovered 
that Blast option '-V T' fixes the problem with old parsers I have. 
Option '-V T' generates detailed statistics after _each_ query sequence 
in Blast output, like:
... ... ...
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 17,650,109
Number of Sequences: 26534
Number of extensions: 430364
Number of successful extensions: 1496
Number of sequences better than 1.0e-020: 1
Number of HSP's better than  0.0 without gapping: 1400
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 1495
length of database: 11,047,616
effective HSP length: 96
effective length of database: 8,500,352
effective search space used: 1275052800
frameshift window, decay const: 40,  0.1
... ... ...

Option '-V F' (default) will generate statistics at the end of batch 
Blast output summarizing all query hits together.

Did I miss something from previous postings?
Sorry, if it was already discussed.

-Alex

-- 
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/


From gdorjee at hotmail.com  Sat Apr 21 15:14:05 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 21 Apr 2007 12:14:05 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
	<10024333.post@talk.nabble.com>
	<54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>
Message-ID: <10120148.post@talk.nabble.com>


hi
how do i check to see if i've installed the bioperl on my system properly. i
think i installed the bioperl-1.5.2_101 version, but i can't say for sure.
althought i can use some of the modules like Bio::SearchIO and
Bio::SearchIO, i can't seem to get the remote blast working for some reason.
is this something to do with the bioperl installation? i'm using perl v5.6.1
built for sun4-solaris-64int. 
i tried to install the same bioperl version on my Linux machine which has
perl v5.8.5 built for i386-linux-thread-multi, and it seem to give me the
same problem with the remote blast.
your help would be much appreciated.
thanks


Chris Fields wrote:
> 
> What version of bioperl are you using?  I get an error but it is b/c  
> the ID doesn't exist.
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: acc KPYK_ECOLI does not exist
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
> Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /Users/cjfields/src/bioperl- 
> live/Bio/DB/WebDBSeqI.pm:181
> STACK: genpept.pl:21
> -----------------------------------------------------------
> 
> The actual accession is 'KPYK1_ECOLI'.
> 
> chris
> 
> On Apr 16, 2007, at 3:42 PM, DeeGee wrote:
> 
>>
>> hi
>> i tried the following code just to check the network, and it worked  
>> fine
>> except for the SwissProt part, for which i got the error message  
>> instead of
>> the sequence:
>>
>> ------------- EXCEPTION  -------------
>> MSG: swissprot stream with no ID. Not swissprot in my book
>> STACK Bio::SeqIO::swiss::next_seq
>> /usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
>> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
>> /usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
>> STACK toplevel bbbbb.pl:21
>> --------------------------------------
>>
>> #### check #####
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::GenBank;
>> use Bio::DB::SwissProt;
>> use Bio::DB::GenPept;
>> use Bio::SeqIO;
>>
>> my $genpeptdb = new Bio::DB::GenPept();
>> my $genbankdb = new Bio::DB::GenBank();
>> my $swissdb = new Bio::DB::SwissProt();
>>
>> my $seqio = new Bio::SeqIO(-format => 'fasta',
>>                            -fh     => \*STDOUT);
>>
>> my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
>> $seqio->write_seq($protseq);
>>
>> my $seq = $genbankdb->get_Seq_by_acc('AF303112');
>> $seqio->write_seq($seq);
>>
>> $protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
>> $seqio->write_seq($protseq);
>>
>> thanks a lot.
>>
>>
>> Chris Fields wrote:
>>>
>>> The 'verbose' setting doesn't change the way the BLAST query is sent,
>>> it just sends the raw output from the repeated attempts to retrieve
>>> the report (using the RID) to STDERR.  The error you saw won't be
>>> fixed by doing so.
>>>
>>> What I was interested in was the raw HTML output dumped to the
>>> screen.  If it is querying the NCBI server it should dump stuff that
>>> includes something like this:
>>>
>>> ...
>>> <HTML>
>>> <p></p>
>>> <!--
>>> QBlastInfoBegin
>>>          Status=WAITING
>>> QBlastInfoEnd
>>> --><p></p>
>>> <SCRIPT LANGUAGE="JavaScript"><!--
>>> ...
>>>
>>> which indicates you have a request in the BLAST queue.  If you aren't
>>> seeing anything then the problem is likely network-related on your
>>> end, so getting the latest RemoteBlast won't help.  Do any other
>>> BioPerl modules requiring network access work (Bio::DB::GenBank, for
>>> instance)?  If not it could be a proxy issue...
>>>
>>> Just in case, here's the browsable CVS location for RemoteBlast:
>>>
>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>>> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
>>>
>>> Click on the download link and save over your local version.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
>>>
>>>>
>>>> hi Chris,
>>>> thanks for your reply. i set the RemoteBlast factory to a verbosity
>>>> of 1,
>>>> and i get the same error message. i'm new to all these. so, could
>>>> you plz
>>>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>>>
>>>> cheers!!!
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/error-while- 
>> remote-blast-against-swissprot-db-tf3577674.html#a10024333
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10120148
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Sat Apr 21 16:09:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 21 Apr 2007 15:09:48 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
Message-ID: <A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>

Ioannis's fastm10_to_table script is available in the bugzilla  
enhancement request (as an attachment) if anyone's interested:

http://bugzilla.open-bio.org/show_bug.cgi?id=2278

I haven't had a chance to really look into m10 output yet but it  
looks easy enough to parse; may not be hard to get something SearchIO- 
based up and running.

chris

On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:

> We don't have one yet. This is a new format introduced in the most
> recent release of FASTA.  Hopefully someone can make some time to add
> some code to SearchIO::fasta for it.
>
> I do find that I when I need a fast FASTA to TAB converter that the
> simple script (fastam9_to_table) is more efficient that SearchIO
> framework so Ioannis is making a parallel one for the new m10
> output.  So I think having both is useful.
>
> -jason
> On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
>
>> I haven't kept track of this - did this go anywhere? Do we not have
>> an -m10 fasta output parser in SearchIO? (I.e., my first thought
>> would be that that would be the desired solution; am I misled in
>> this?)
>>
>> 	-hilmar
>>
>> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>>
>>> I have reported it as a bug on the bugzilla but due to bugzilla
>>> problems I
>>> was not able to attach my code and/or sample m10 files.
>>> Nevertheless here is the code that converts an m10 fasta output to
>>> an m8
>>> BLAST output which is parseable by the vast majority of software.
>>>
>>> <----------- CODE BEGINS HERE ------------------->
>>>
>>> #!/usr/bin/perl -w
>>>
>>> =head1 NAME
>>>
>>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
>>> output
>>>
>>> =head1 SYNOPSIS
>>>
>>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>>
>>> =head1 DESCRIPTION
>>>
>>> Command line options:
>>>   --header                -- boolean flag to print column header
>>>   -o/--out                -- optional outputfile to write data,
>>>                              otherwise will write to STDOUT
>>>   -h/--help               -- show this documentation
>>>
>>> Not technically a SearchIO script as this doesn't use any Bioperl
>>> components but is a useful and fast.  The output is tabular output
>>> with the standard NCBI -m8 columns.
>>>
>>>  queryname
>>>  hit name
>>>  percent identity
>>>  alignment length
>>>  number mismatches
>>>  number gaps
>>>  query start  (if on rev-strand start > end)
>>>  query end
>>>  hit start (if on rev-strand start > end)
>>>  hit end
>>>  evalue
>>>  bit score
>>>
>>> Additionally 4 more columns are provided
>>>  percent similar
>>>  query length
>>>  hit length
>>>  query gaps
>>>  hit gaps
>>>
>>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>>
>>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>>
>>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>>
>>> Headers as well as portions of code were taken
>>>> from fastam9_to_table.pl by Jason Stajich
>>>
>>> =head1 DISCLAIMER
>>>
>>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>>
>>> Permission to use, copy, modify, merge, publish and distribute
>>> this software and its documentation, with or without modification,
>>> for any purpose, and without fee or royalty to the copyright holder
>>> (s)
>>> is hereby granted with no restictions and/or prerequisites.
>>>
>>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>> NONINFRINGEMENT.
>>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>>
>>> =cut
>>>
>>> use strict;
>>> use Getopt::Long;
>>>
>>> my %data=();
>>>
>>> my $outfile=''; my $header='';
>>> GetOptions(
>>>     'header'              => \$header,
>>>     'o|out|outfile:s'     => \$outfile,
>>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>>>        );
>>>
>>> my $outfh;
>>> if( $outfile ) {
>>>     open($outfh, ">$outfile") || die("$outfile: $!");
>>> } else {
>>>     $outfh = \*STDOUT;
>>> }
>>>
>>>
>>> $/="\n>>>";
>>>
>>> my @fields = qw(qname hname percid alen mmcount gapcount
>>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
>>> hgap);
>>>
>>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>>> "\n" if
>>> $header;
>>>
>>> while (<>) {
>>>
>>>         chomp;
>>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>>>         my @hits = split(/\d+>>/, $_);
>>>         @hits= split("\n>>", $hits[0]);
>>>
>>>         my $hit = shift @hits;
>>>
>>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>>> +)/));
>>>
>>>         foreach my $align (@hits) {
>>>
>>>             my @details= split ("\n>", $align);
>>>            my $detail = shift @details;
>>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>>>             $data{'bits'}=$1;
>>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>>>             $data{'evalue'}=$1;
>>>
>>>             my $term = quotemeta("; sw_score");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'score'}=$1;
>>>
>>>             $term = quotemeta("; sw_ident:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'percid'}=$1;
>>>
>>>             $term = quotemeta("; sw_sim:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'percsim'}=$1;
>>>
>>>             $term = quotemeta("; sw_overlap:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'alen'}=$1;
>>>
>>>             $detail = shift @details;
>>>
>>>             $term = quotemeta("; al_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'qstart'}=$1;
>>>
>>>             $term = quotemeta("; al_stop:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'qend'}=$1;
>>>
>>>             $term = quotemeta("; al_display_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             my $lakis ='';
>>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>>
>>>             $data{'qgap'}=($1 =~ tr/\-//);
>>>
>>>             $detail = shift @details;
>>>
>>>             $term = quotemeta("; sq_len:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hlen'}=$1;
>>>
>>>             $term = quotemeta("; al_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hstart'}=$1;
>>>
>>>             $term = quotemeta("; al_stop:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hend'}=$1;
>>>
>>>             $term = quotemeta("; al_display_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>>             $data{'hgap'}=($1 =~ tr/-//);
>>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>>>             $data{'mmcount'} = $data{'alen'} - ( int($data
>>> {'percid'} *
>>> $data{'alen'}) + $data{'gapcount'});
>>>
>>> for ( $data{'percid'}, $data{'percsim'} ) {
>>>     $_ = sprintf("%.2f",$_*100);
>>> }
>>>
>>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>>>         }
>>>
>>> }
>>>
>>> <----------------- CODE ENDS HERE ---------------------->
>>>
>>> -- 
>>>
>>> *Ioannis Kirmitzoglou*, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at i2r.a-star.edu.sg  Sun Apr 22 07:59:28 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sun, 22 Apr 2007 19:59:28 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	
	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com><3ACF03E372996
	C4EACD542EA8A05E66A061684@mailbe01.teak.local.net><AAF82F3A-3C75-4D51-AFD4-
	FDE358391A03@fruitfly.org>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061690@mailbe01.teak.local.net>


Hi Chris,
 
I've downloaded GO Database.
Which of these we should install in our MySQL database,
so that it can be used for GO::AppHandle task below?
 
-rw-rw-r--   1 ewijaya ewijaya 1.6G Apr  9 12:23 go_200704-assocdb-data
-rw-rw-r--   1 ewijaya ewijaya 483M Apr  9 12:23 go_200704-assocdb.rdf-xml
-rw-rw-r--   1 ewijaya ewijaya 3.2K Apr  9 12:23 go_200704-assocdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  7 00:41 go_200704-assocdb-tables
-rw-rw-r--   1 ewijaya ewijaya 3.3K Apr  9 12:23 go_200704-obo-xml.dtd
-rw-rw-r--   1 ewijaya ewijaya 4.5K Apr  9 12:23 go_200704-rdf.dtd
-rw-rw-r--   1 ewijaya ewijaya  29K Apr  9 12:23 go_200704-schema-mysql.sql
-rw-rw-r--   1 ewijaya ewijaya 3.1G Apr  9 12:25 go_200704-seqdb-data
-rw-rw-r--   1 ewijaya ewijaya  93M Apr  9 12:26 go_200704-seqdb.fasta
-rw-rw-r--   1 ewijaya ewijaya 3.2K Apr  9 12:25 go_200704-seqdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  8 05:38 go_200704-seqdb-tables
-rw-rw-r--   1 ewijaya ewijaya  51M Apr  9 12:26 go_200704-termdb-data
-rw-rw-r--   1 ewijaya ewijaya  18M Apr  9 12:26 go_200704-termdb.obo-xml
-rw-rw-r--   1 ewijaya ewijaya  39M Apr  9 12:26 go_200704-termdb.owl
-rw-rw-r--   1 ewijaya ewijaya  29M Apr  9 12:26 go_200704-termdb.rdf-xml
-rw-rw-r--   1 ewijaya ewijaya  749 Apr  9 12:26 go_200704-termdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  2 00:31 go_200704-termdb-tables
drwxrwxr-x  22 ewijaya ewijaya 4.0K Apr  1 23:35 go_200704-utilities-src

Or is there a way we can upload all of them automatically to mysql database?
Thanks and hope to hear from you again.
 
--
Edward
 

________________________________

From: Chris Mungall [mailto:cjm at fruitfly.org]
Sent: Tue 4/17/2007 2:49 AM
To: Wijaya Edward
Cc: spiros at lokku.com; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


Download:
http://search.cpan.org/~cmungall/go-db-perl

or do:

cpan GO::AppHandle

The API call you want is here:
http://search.cpan.org/~cmungall/go-db-perl/GO/
AppHandle.pm#get_deep_products

Here is an example snippet:

   use GO::AppHandle;
   my $apph=GO::AppHandle->connect(@ARGV);
   my $go_acc = shift @ARGV;
   my $gps = $apph->get_deep_products({term=>{acc=>$go_acc}});
   foreach my $gp (@$gps) {
     printf "%s %s\n", $gp->xref->acc, $gp->symbol;
   }

You will need to download the GO Database.

Cheers
Chris

On Apr 16, 2007, at 8:14 AM, Wijaya Edward wrote:

>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) 
> with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer 
>> -------------
>> This email is confidential and may be privileged.  If you are not 
>> the intended recipient, please delete it and notify us 
>> immediately. Please do not copy or use it for any purpose, or 
>> disclose its contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer 
> -------------
> This email is confidential and may be privileged.  If you are not 
> the intended recipient, please delete it and notify us immediately. 
> Please do not copy or use it for any purpose, or disclose its 
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From ioanniskirmitzoglou at gmail.com  Sun Apr 22 13:11:35 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Sun, 22 Apr 2007 20:11:35 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
Message-ID: <b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>

I agree with Jason. Both scripts (fastam9_to_table and fastam10_to_table)
are way faster and easier to use than the searchIO. Still, there are a lot
of cases where searchIO support for m10 would be useful (e.g when trying to
represent the alignment in a graphical way).
Nevertheless I do think that FASTA needs an output similar to the BLAST m8
one which is really compact. Although I haven't tried it yet I do believe
that both scripts can be piped, so one easy and rather fast way to produce
an tabular output from FASTA would be to pipe its output directly to one of
the scripts.
-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


On 21/04/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> Ioannis's fastm10_to_table script is available in the bugzilla
> enhancement request (as an attachment) if anyone's interested:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2278
>
> I haven't had a chance to really look into m10 output yet but it
> looks easy enough to parse; may not be hard to get something SearchIO-
> based up and running.
>
> chris
>
> On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:
>
> > We don't have one yet. This is a new format introduced in the most
> > recent release of FASTA.  Hopefully someone can make some time to add
> > some code to SearchIO::fasta for it.
> >
> > I do find that I when I need a fast FASTA to TAB converter that the
> > simple script (fastam9_to_table) is more efficient that SearchIO
> > framework so Ioannis is making a parallel one for the new m10
> > output.  So I think having both is useful.
> >
> > -jason
> > On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
> >
> >> I haven't kept track of this - did this go anywhere? Do we not have
> >> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> >> would be that that would be the desired solution; am I misled in
> >> this?)
> >>
> >>      -hilmar
> >>
> >> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
> >>
> >>> I have reported it as a bug on the bugzilla but due to bugzilla
> >>> problems I
> >>> was not able to attach my code and/or sample m10 files.
> >>> Nevertheless here is the code that converts an m10 fasta output to
> >>> an m8
> >>> BLAST output which is parseable by the vast majority of software.
> >>>
> >>> <----------- CODE BEGINS HERE ------------------->
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> =head1 NAME
> >>>
> >>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
> >>> output
> >>>
> >>> =head1 SYNOPSIS
> >>>
> >>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
> >>>
> >>> =head1 DESCRIPTION
> >>>
> >>> Command line options:
> >>>   --header                -- boolean flag to print column header
> >>>   -o/--out                -- optional outputfile to write data,
> >>>                              otherwise will write to STDOUT
> >>>   -h/--help               -- show this documentation
> >>>
> >>> Not technically a SearchIO script as this doesn't use any Bioperl
> >>> components but is a useful and fast.  The output is tabular output
> >>> with the standard NCBI -m8 columns.
> >>>
> >>>  queryname
> >>>  hit name
> >>>  percent identity
> >>>  alignment length
> >>>  number mismatches
> >>>  number gaps
> >>>  query start  (if on rev-strand start > end)
> >>>  query end
> >>>  hit start (if on rev-strand start > end)
> >>>  hit end
> >>>  evalue
> >>>  bit score
> >>>
> >>> Additionally 4 more columns are provided
> >>>  percent similar
> >>>  query length
> >>>  hit length
> >>>  query gaps
> >>>  hit gaps
> >>>
> >>> =head1 AUTHOR - Ioannis Kirmitzoglou
> >>>
> >>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
> >>>
> >>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
> >>>
> >>> Headers as well as portions of code were taken
> >>>> from fastam9_to_table.pl by Jason Stajich
> >>>
> >>> =head1 DISCLAIMER
> >>>
> >>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
> >>>
> >>> Permission to use, copy, modify, merge, publish and distribute
> >>> this software and its documentation, with or without modification,
> >>> for any purpose, and without fee or royalty to the copyright holder
> >>> (s)
> >>> is hereby granted with no restictions and/or prerequisites.
> >>>
> >>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> >>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> >>> NONINFRINGEMENT.
> >>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> >>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> >>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> >>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> >>>
> >>> =cut
> >>>
> >>> use strict;
> >>> use Getopt::Long;
> >>>
> >>> my %data=();
> >>>
> >>> my $outfile=''; my $header='';
> >>> GetOptions(
> >>>     'header'              => \$header,
> >>>     'o|out|outfile:s'     => \$outfile,
> >>>     'h|help'              => sub { exec('perldoc',$0); exit; }
> >>>        );
> >>>
> >>> my $outfh;
> >>> if( $outfile ) {
> >>>     open($outfh, ">$outfile") || die("$outfile: $!");
> >>> } else {
> >>>     $outfh = \*STDOUT;
> >>> }
> >>>
> >>>
> >>> $/="\n>>>";
> >>>
> >>> my @fields = qw(qname hname percid alen mmcount gapcount
> >>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
> >>> hgap);
> >>>
> >>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
> >>> "\n" if
> >>> $header;
> >>>
> >>> while (<>) {
> >>>
> >>>         chomp;
> >>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
> >>>         my @hits = split(/\d+>>/, $_);
> >>>         @hits= split("\n>>", $hits[0]);
> >>>
> >>>         my $hit = shift @hits;
> >>>
> >>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
> >>> +)/));
> >>>
> >>>         foreach my $align (@hits) {
> >>>
> >>>             my @details= split ("\n>", $align);
> >>>            my $detail = shift @details;
> >>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
> >>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
> >>>             $data{'bits'}=$1;
> >>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
> >>>             $data{'evalue'}=$1;
> >>>
> >>>             my $term = quotemeta("; sw_score");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'score'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_ident:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'percid'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_sim:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'percsim'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_overlap:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'alen'}=$1;
> >>>
> >>>             $detail = shift @details;
> >>>
> >>>             $term = quotemeta("; al_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'qstart'}=$1;
> >>>
> >>>             $term = quotemeta("; al_stop:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'qend'}=$1;
> >>>
> >>>             $term = quotemeta("; al_display_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             my $lakis ='';
> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
> >>>
> >>>             $data{'qgap'}=($1 =~ tr/\-//);
> >>>
> >>>             $detail = shift @details;
> >>>
> >>>             $term = quotemeta("; sq_len:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hlen'}=$1;
> >>>
> >>>             $term = quotemeta("; al_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hstart'}=$1;
> >>>
> >>>             $term = quotemeta("; al_stop:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hend'}=$1;
> >>>
> >>>             $term = quotemeta("; al_display_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
> >>>             $data{'hgap'}=($1 =~ tr/-//);
> >>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
> >>>             $data{'mmcount'} = $data{'alen'} - ( int($data
> >>> {'percid'} *
> >>> $data{'alen'}) + $data{'gapcount'});
> >>>
> >>> for ( $data{'percid'}, $data{'percsim'} ) {
> >>>     $_ = sprintf("%.2f",$_*100);
> >>> }
> >>>
> >>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
> >>>         }
> >>>
> >>> }
> >>>
> >>> <----------------- CODE ENDS HERE ---------------------->
> >>>
> >>> --
> >>>
> >>> *Ioannis Kirmitzoglou*, MSc
> >>> PhD. Student,
> >>> Bioinformatics Research Laboratory
> >>> Department of Biological Sciences
> >>> University of Cyprus
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org
> > http://jason.open-bio.org/
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From jason at bioperl.org  Sun Apr 22 16:24:23 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 22 Apr 2007 13:24:23 -0700
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
Message-ID: <69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>

I do think that m9 is pretty compact if you don't need to see the  
alignment and just want the pairwise statistics and is analogous to  
BLAST m8/9 format.   I typically just use that + fastam9_to_table for  
input to MCL and other systems that can process tabular formats.

I cleaned up a few things in SearchIO::fasta but have not been able  
to see whether we can auto-detect m10 format and insert the necessary  
code just yet.

-jason
On Apr 22, 2007, at 10:11 AM, Ioannis Kirmitzoglou wrote:

> I agree with Jason. Both scripts (fastam9_to_table and  
> fastam10_to_table)
> are way faster and easier to use than the searchIO. Still, there  
> are a lot
> of cases where searchIO support for m10 would be useful (e.g when  
> trying to
> represent the alignment in a graphical way).
> Nevertheless I do think that FASTA needs an output similar to the  
> BLAST m8
> one which is really compact. Although I haven't tried it yet I do  
> believe
> that both scripts can be piped, so one easy and rather fast way to  
> produce
> an tabular output from FASTA would be to pipe its output directly  
> to one of
> the scripts.
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
>
>
>
> On 21/04/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>> Ioannis's fastm10_to_table script is available in the bugzilla
>> enhancement request (as an attachment) if anyone's interested:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2278
>>
>> I haven't had a chance to really look into m10 output yet but it
>> looks easy enough to parse; may not be hard to get something  
>> SearchIO-
>> based up and running.
>>
>> chris
>>
>> On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:
>>
>> > We don't have one yet. This is a new format introduced in the most
>> > recent release of FASTA.  Hopefully someone can make some time  
>> to add
>> > some code to SearchIO::fasta for it.
>> >
>> > I do find that I when I need a fast FASTA to TAB converter that the
>> > simple script (fastam9_to_table) is more efficient that SearchIO
>> > framework so Ioannis is making a parallel one for the new m10
>> > output.  So I think having both is useful.
>> >
>> > -jason
>> > On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
>> >
>> >> I haven't kept track of this - did this go anywhere? Do we not  
>> have
>> >> an -m10 fasta output parser in SearchIO? (I.e., my first thought
>> >> would be that that would be the desired solution; am I misled in
>> >> this?)
>> >>
>> >>      -hilmar
>> >>
>> >> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>> >>
>> >>> I have reported it as a bug on the bugzilla but due to bugzilla
>> >>> problems I
>> >>> was not able to attach my code and/or sample m10 files.
>> >>> Nevertheless here is the code that converts an m10 fasta  
>> output to
>> >>> an m8
>> >>> BLAST output which is parseable by the vast majority of software.
>> >>>
>> >>> <----------- CODE BEGINS HERE ------------------->
>> >>>
>> >>> #!/usr/bin/perl -w
>> >>>
>> >>> =head1 NAME
>> >>>
>> >>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8  
>> tabular
>> >>> output
>> >>>
>> >>> =head1 SYNOPSIS
>> >>>
>> >>>  fastam10_to_table [--header] [-o outfile] inputfile1  
>> inputfile2 ...
>> >>>
>> >>> =head1 DESCRIPTION
>> >>>
>> >>> Command line options:
>> >>>   --header                -- boolean flag to print column header
>> >>>   -o/--out                -- optional outputfile to write data,
>> >>>                              otherwise will write to STDOUT
>> >>>   -h/--help               -- show this documentation
>> >>>
>> >>> Not technically a SearchIO script as this doesn't use any Bioperl
>> >>> components but is a useful and fast.  The output is tabular  
>> output
>> >>> with the standard NCBI -m8 columns.
>> >>>
>> >>>  queryname
>> >>>  hit name
>> >>>  percent identity
>> >>>  alignment length
>> >>>  number mismatches
>> >>>  number gaps
>> >>>  query start  (if on rev-strand start > end)
>> >>>  query end
>> >>>  hit start (if on rev-strand start > end)
>> >>>  hit end
>> >>>  evalue
>> >>>  bit score
>> >>>
>> >>> Additionally 4 more columns are provided
>> >>>  percent similar
>> >>>  query length
>> >>>  hit length
>> >>>  query gaps
>> >>>  hit gaps
>> >>>
>> >>> =head1 AUTHOR - Ioannis Kirmitzoglou
>> >>>
>> >>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>> >>>
>> >>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>> >>>
>> >>> Headers as well as portions of code were taken
>> >>>> from fastam9_to_table.pl by Jason Stajich
>> >>>
>> >>> =head1 DISCLAIMER
>> >>>
>> >>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>> >>>
>> >>> Permission to use, copy, modify, merge, publish and distribute
>> >>> this software and its documentation, with or without  
>> modification,
>> >>> for any purpose, and without fee or royalty to the copyright  
>> holder
>> >>> (s)
>> >>> is hereby granted with no restictions and/or prerequisites.
>> >>>
>> >>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> >>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE  
>> WARRANTIES OF
>> >>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> >>> NONINFRINGEMENT.
>> >>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE  
>> FOR ANY
>> >>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF  
>> CONTRACT,
>> >>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> >>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>> >>>
>> >>> =cut
>> >>>
>> >>> use strict;
>> >>> use Getopt::Long;
>> >>>
>> >>> my %data=();
>> >>>
>> >>> my $outfile=''; my $header='';
>> >>> GetOptions(
>> >>>     'header'              => \$header,
>> >>>     'o|out|outfile:s'     => \$outfile,
>> >>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>> >>>        );
>> >>>
>> >>> my $outfh;
>> >>> if( $outfile ) {
>> >>>     open($outfh, ">$outfile") || die("$outfile: $!");
>> >>> } else {
>> >>>     $outfh = \*STDOUT;
>> >>> }
>> >>>
>> >>>
>> >>> $/="\n>>>";
>> >>>
>> >>> my @fields = qw(qname hname percid alen mmcount gapcount
>> >>>         qstart qend hstart hend evalue bits percsim qlen hlen  
>> qgap
>> >>> hgap);
>> >>>
>> >>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) }  
>> @fields)),
>> >>> "\n" if
>> >>> $header;
>> >>>
>> >>> while (<>) {
>> >>>
>> >>>         chomp;
>> >>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>> >>>         my @hits = split(/\d+>>/, $_);
>> >>>         @hits= split("\n>>", $hits[0]);
>> >>>
>> >>>         my $hit = shift @hits;
>> >>>
>> >>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> >>> +)/));
>> >>>
>> >>>         foreach my $align (@hits) {
>> >>>
>> >>>             my @details= split ("\n>", $align);
>> >>>            my $detail = shift @details;
>> >>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>> >>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>> >>>             $data{'bits'}=$1;
>> >>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>> >>>             $data{'evalue'}=$1;
>> >>>
>> >>>             my $term = quotemeta("; sw_score");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'score'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_ident:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'percid'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_sim:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'percsim'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_overlap:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'alen'}=$1;
>> >>>
>> >>>             $detail = shift @details;
>> >>>
>> >>>             $term = quotemeta("; al_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'qstart'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_stop:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'qend'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_display_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             my $lakis ='';
>> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>> >>>
>> >>>             $data{'qgap'}=($1 =~ tr/\-//);
>> >>>
>> >>>             $detail = shift @details;
>> >>>
>> >>>             $term = quotemeta("; sq_len:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hlen'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hstart'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_stop:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hend'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_display_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>> >>>             $data{'hgap'}=($1 =~ tr/-//);
>> >>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>> >>>             $data{'mmcount'} = $data{'alen'} - ( int($data
>> >>> {'percid'} *
>> >>> $data{'alen'}) + $data{'gapcount'});
>> >>>
>> >>> for ( $data{'percid'}, $data{'percsim'} ) {
>> >>>     $_ = sprintf("%.2f",$_*100);
>> >>> }
>> >>>
>> >>>             print $outfh join( "\t",map { $data{$_} }  
>> @fields),"\n"
>> >>>         }
>> >>>
>> >>> }
>> >>>
>> >>> <----------------- CODE ENDS HERE ---------------------->
>> >>>
>> >>> --
>> >>>
>> >>> *Ioannis Kirmitzoglou*, MSc
>> >>> PhD. Student,
>> >>> Bioinformatics Research Laboratory
>> >>> Department of Biological Sciences
>> >>> University of Cyprus
>> >>> _______________________________________________
>> >>> Bioperl-l mailing list
>> >>> Bioperl-l at lists.open-bio.org
>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> >> --
>> >> ===========================================================
>> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> >> ===========================================================
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> > --
>> > Jason Stajich
>> > jason at bioperl.org
>> > http://jason.open-bio.org/
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From ioanniskirmitzoglou at gmail.com  Mon Apr 23 05:45:53 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Mon, 23 Apr 2007 12:45:53 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
References: <10034698.post@talk.nabble.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
	<69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
Message-ID: <b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>

I don't know about older versions but the latest version of FASTA starts its
output with a line similar to those:
# fasta34.exe -m9 -d0 -Q test.faa test.faa OR
# fasta34.exe -m10 -Q test.faa test.faa

This very first line is also the only one in the output that starts with
'#'.
Isn't this an easy way to determine the output type?


-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


From cjfields at uiuc.edu  Mon Apr 23 08:46:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Apr 2007 07:46:40 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
	<69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
	<b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>
Message-ID: <333A7BEF-71E3-4E15-B2EC-384AEBAA13B7@uiuc.edu>

That's true, but older versions of fasta don't do this.  For  
instance, the example files in the bioperl distribution in t/data  
(HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this  
line.

 From the fasta changelog:

-------------------------------------------------------------
 >>Nov 14-22, 2002  CVS fa34t20b6

Include compile-time define (-DPGM_DOC) that causes all the fasta
programs to provide the same command line echo that is provided by the
PVM and MPI parallel programs.  Thus, if you run the program:

     fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12

the first lines of output from FASTA will be:

     # fasta34_t -q gtt1_drome.aa /slib/swissprot
      FASTA searches a protein or DNA sequence data bank
      version 3.4t20 Nov 10, 2002
     Please cite:
      W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

This has been turned on by default in most FASTA Makefiles.
-------------------------------------------------------------

We could only support newer fasta output (newer that the above  
version) since there have been several bug fixes and changes to  
output; not sure how everyone else feels about this.

chris

On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:

> I don't know about older versions but the latest version of FASTA  
> starts its
> output with a line similar to those:
> # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
> # fasta34.exe -m10 -Q test.faa test.faa
>
> This very first line is also the only one in the output that starts  
> with
> '#'.
> Isn't this an easy way to determine the output type?
>
>
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Apr 23 09:49:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Apr 2007 08:49:45 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>
References: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>
Message-ID: <12707EA8-F245-4AE7-BFD1-EE861F431F3D@uiuc.edu>

Aaron,

I find -m 10 defined way back in fasta2 notes:

--------------------------------------------------------------
Changes with 2.0x4  (January, 1996)

The major change in with 2.0x4 is the ability to get a parseable
output from FASTA/TFASTA/SSEARCH.  This can be done using output
option -m 10.  ...
--------------------------------------------------------------

It goes on to define it in more detail (which is nice to have  
around!).  It's possible it wasn't implemented until recently for  
fasta3 but I find references to it in the various fasta3 notes going  
back to at least 2001, so maybe it wasn't not compiled by default  
until recently?  The extra '#' line was added in 2002 to all output  
as far as I can tell.

We could just have SearchIO::fasta fall back to default parsing if  
'#' isn't present.  The default format and m10 are sufficiently  
different enough that we probably want to separate m10 parsing into  
it's own parser subroutine so we don't screw with the default parsing  
too much.

chris

On Apr 23, 2007, at 8:29 AM, aaron.j.mackey at gsk.com wrote:

> Since -m10 is newer than PGM_DOC, you should be fine to use the  
> first line
> as a detection for m10, when that first line exists (when it does  
> not, the
> format cannot be m10, unless someone has re-compiled FASTA with an
> undefined PGM_DOC).
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 04/23/2007 08:46:40 AM:
>
>> That's true, but older versions of fasta don't do this.  For
>> instance, the example files in the bioperl distribution in t/data
>> (HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this
>> line.
>>
>>  From the fasta changelog:
>>
>> -------------------------------------------------------------
>>>> Nov 14-22, 2002  CVS fa34t20b6
>>
>> Include compile-time define (-DPGM_DOC) that causes all the fasta
>> programs to provide the same command line echo that is provided by  
>> the
>> PVM and MPI parallel programs.  Thus, if you run the program:
>>
>>      fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12
>>
>> the first lines of output from FASTA will be:
>>
>>      # fasta34_t -q gtt1_drome.aa /slib/swissprot
>>       FASTA searches a protein or DNA sequence data bank
>>       version 3.4t20 Nov 10, 2002
>>      Please cite:
>>       W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
>>
>> This has been turned on by default in most FASTA Makefiles.
>> -------------------------------------------------------------
>>
>> We could only support newer fasta output (newer that the above
>> version) since there have been several bug fixes and changes to
>> output; not sure how everyone else feels about this.
>>
>> chris
>>
>> On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:
>>
>>> I don't know about older versions but the latest version of FASTA
>>> starts its
>>> output with a line similar to those:
>>> # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
>>> # fasta34.exe -m10 -Q test.faa test.faa
>>>
>>> This very first line is also the only one in the output that starts
>>> with
>>> '#'.
>>> Isn't this an easy way to determine the output type?
>>>
>>>
>>> -- 
>>>
>>> *Ioannis Kirmitzoglou*, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From aaron.j.mackey at gsk.com  Mon Apr 23 09:29:39 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 23 Apr 2007 09:29:39 -0400
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <333A7BEF-71E3-4E15-B2EC-384AEBAA13B7@uiuc.edu>
Message-ID: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>

Since -m10 is newer than PGM_DOC, you should be fine to use the first line 
as a detection for m10, when that first line exists (when it does not, the 
format cannot be m10, unless someone has re-compiled FASTA with an 
undefined PGM_DOC).

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 04/23/2007 08:46:40 AM:

> That's true, but older versions of fasta don't do this.  For 
> instance, the example files in the bioperl distribution in t/data 
> (HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this 
> line.
> 
>  From the fasta changelog:
> 
> -------------------------------------------------------------
>  >>Nov 14-22, 2002  CVS fa34t20b6
> 
> Include compile-time define (-DPGM_DOC) that causes all the fasta
> programs to provide the same command line echo that is provided by the
> PVM and MPI parallel programs.  Thus, if you run the program:
> 
>      fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12
> 
> the first lines of output from FASTA will be:
> 
>      # fasta34_t -q gtt1_drome.aa /slib/swissprot
>       FASTA searches a protein or DNA sequence data bank
>       version 3.4t20 Nov 10, 2002
>      Please cite:
>       W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
> 
> This has been turned on by default in most FASTA Makefiles.
> -------------------------------------------------------------
> 
> We could only support newer fasta output (newer that the above 
> version) since there have been several bug fixes and changes to 
> output; not sure how everyone else feels about this.
> 
> chris
> 
> On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:
> 
> > I don't know about older versions but the latest version of FASTA 
> > starts its
> > output with a line similar to those:
> > # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
> > # fasta34.exe -m10 -Q test.faa test.faa
> >
> > This very first line is also the only one in the output that starts 
> > with
> > '#'.
> > Isn't this an easy way to determine the output type?
> >
> >
> > -- 
> >
> > *Ioannis Kirmitzoglou*, MSc
> > PhD. Student,
> > Bioinformatics Research Laboratory
> > Department of Biological Sciences
> > University of Cyprus
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bix at sendu.me.uk  Tue Apr 24 06:21:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 24 Apr 2007 11:21:29 +0100
Subject: [Bioperl-l] WrapperBase / StandAloneBlast executable() method
	confusion
Message-ID: <462DDA29.4090104@sendu.me.uk>

Hi,

I'm a little unsure of the intent for executable() in wrapper modules. 
The WrapperBase version of the method and the StandAloneBlast version 
have the same POD but different implementations.

WrapperBase takes as a first arg an 'exe' which it will blindly trust is 
the path to a working executable. (That doesn't seem sensible already.) 
It is only capable of storing one such path.

If no arg is supplied it uses program_path() (which uses program_name()) 
to find the executable. Failing that it does a further direct test on 
program_name() to see if its executable.


StandAloneBlast takes as a first arg merely the name of your exe and 
also (undocumented) the path to the corresponding executable (which is 
tested to see if it really executable). It can store executable paths 
for multiple different exenames (corresponding better with the docs for 
the first arg: "name of executable to set path to").

If no second arg is supplied it does something similar to WrapperBase, 
except that it uses the first arg exename (or a default if that wasn't 
supplied) in place of program_name().


I'm trying to generalize this so StandAloneBlast can just use the 
WrapperBase version (and so other wrappers can then store executable 
paths for different sub-programs). Any suggestions for a good way of 
melding these two together whilst somehow retaining backward compatibility?


From cjfields at uiuc.edu  Tue Apr 24 08:55:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 24 Apr 2007 07:55:43 -0500
Subject: [Bioperl-l] WrapperBase / StandAloneBlast executable() method
	confusion
In-Reply-To: <462DDA29.4090104@sendu.me.uk>
References: <462DDA29.4090104@sendu.me.uk>
Message-ID: <8F1427D6-8654-461E-B9AA-E51CC3A20318@uiuc.edu>

I'm not sure, but you might want to bring Torsten in on this as he  
took over maintaining StandAloneBlast.  Much of the confusion may  
stem from the independent evolution of StandAloneBlast and WrapperBase.

Also, (a bit unrelated), there were plans for unifying the  
Bio::Tools::Run BLAST modules described here:

http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

Seemed like there was a general consensus at the time on the need to  
refactor StandAloneBlast and RemoteBlast code, so maybe the best  
place to start is StandAloneBlast (the others could be added in from  
there).  We could just deprecate use of the older modules at some  
point in favor of the new scheme.

chris

On Apr 24, 2007, at 5:21 AM, Sendu Bala wrote:

> Hi,
>
> I'm a little unsure of the intent for executable() in wrapper modules.
> The WrapperBase version of the method and the StandAloneBlast version
> have the same POD but different implementations.
>
> WrapperBase takes as a first arg an 'exe' which it will blindly  
> trust is
> the path to a working executable. (That doesn't seem sensible  
> already.)
> It is only capable of storing one such path.
>
> If no arg is supplied it uses program_path() (which uses  
> program_name())
> to find the executable. Failing that it does a further direct test on
> program_name() to see if its executable.
>
>
> StandAloneBlast takes as a first arg merely the name of your exe and
> also (undocumented) the path to the corresponding executable (which is
> tested to see if it really executable). It can store executable paths
> for multiple different exenames (corresponding better with the docs  
> for
> the first arg: "name of executable to set path to").
>
> If no second arg is supplied it does something similar to WrapperBase,
> except that it uses the first arg exename (or a default if that wasn't
> supplied) in place of program_name().
>
>
> I'm trying to generalize this so StandAloneBlast can just use the
> WrapperBase version (and so other wrappers can then store executable
> paths for different sub-programs). Any suggestions for a good way of
> melding these two together whilst somehow retaining backward  
> compatibility?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From avilella at gmail.com  Tue Apr 24 12:10:19 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 24 Apr 2007 17:10:19 +0100
Subject: [Bioperl-l] lack of markers for some genotypes in some
	Bio::PopGen::Statistics methods
Message-ID: <358f4d650704240910u4c90864cqd6c4e38ecedef4c5@mail.gmail.com>

Hi,

I have some genotype data where some individuals don't have a given marker
in the population.

This means that some methods in Bio::PopGen::Statistics will fail when
trying to get them, so I've added a couple of "next unless (defined($sth));"
around to overcome this. But I am not sure if this breaks any assumption
made when implementing the methods.

Anyone able to check this?

Thanks,

    Albert.

avilella at magneto:~$ diff -u
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm.modif
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm
---
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm.modif
2007-04-24 15:05:51.000000000 +0100
+++
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm
2007-04-22 16:03:24.000000000 +0100
@@ -546,7 +546,6 @@
        # separate genotypes into 'chromosomes'
        for my $marker_name( @marker_names ) {
           my ($genotype) = $ind->get_Genotypes(-marker => $marker_name);
-           next unless defined($genotype); #FIXME -- is this correct?
           my $i =0;
           for my $allele ( $genotype->get_Alleles ) {
               push @{$chromosomes[$i]},

avilella at magneto:~$ diff -u
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm.modif
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm
---
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm.modif
2007-04-24 15:04:51.000000000 +0100
+++
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm
2007-04-22 16:03:24.000000000 +0100
@@ -656,8 +656,6 @@
                return 0;
            }
            foreach my $m ( @marker_names ) {
-              my $genotype = $ind->get_Genotypes($m);
-              next unless defined($genotype); #FIXME -- is this correct?
                foreach my $allele (map { $_->get_Alleles}
                               $ind->get_Genotypes($m) ) {
                    $data{$m}->{$allele}++;


From MEC at stowers-institute.org  Thu Apr 26 12:48:45 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 26 Apr 2007 11:48:45 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
Message-ID: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>

Lincoln, et al,

I find that the gff3_string for Bio::DB::SeqFeature objects retreived
from a Bio::DB::SeqFeature::Store that were initially created with
-seqments (i.e. whose location was discontiguous) does not display any
other attributes in column 9 than "Name".

What do you think of the following patch to Bio::Graphics::FeatureBase,
whose effect is to "contrive to return (duplicated) common group values"
(which otherwise get lost when "collapsing" "homogenous" parent/child
features) 

Another approach would be to copy the attributes from the parent to the
children when the -seqments are first created.

Another approach would be to use Bio::SeqFeature::Generic  as the db's
-seqfeature_class and save with -location being a Bio::Location::Split,
but this was wrougth with other problems.

Any other suggestions?  Do you want me to commit this patch?

Cheers,

Malcolm
 
Patch follows:


Index: FeatureBase.pm
===================================================================
RCS file:
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
retrieving revision 1.29
diff -c -r1.29 FeatureBase.pm
*** FeatureBase.pm	16 Apr 2007 19:55:33 -0000	1.29
--- FeatureBase.pm	26 Apr 2007 16:30:23 -0000
***************
*** 581,587 ****
      foreach (@children) { 
        s/Parent=/ID=/g; 
      } # replace Parent tag with ID
!     return join "\n", at children;
    }
  
    return join("\n",$p, at children);
--- 581,589 ----
      foreach (@children) { 
        s/Parent=/ID=/g; 
      } # replace Parent tag with ID
!     #return join "\n", at children;
!     # Instead of above, additionally, contrive to return (duplicated)
common group values
!     return(join("$group\n", at children) . $group);
    }
  
    return join("\n",$p, at children);


From emeric.sevin at univ-rennes1.fr  Thu Apr 26 04:48:37 2007
From: emeric.sevin at univ-rennes1.fr (Emeric Sevin)
Date: Thu, 26 Apr 2007 10:48:37 +0200
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
	<7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>
Message-ID: <4ef54906af35b3cbf231303285527055@univ-rennes1.fr>

hi! sorry for the delay, took a little vacation ;-)

indeed I don't see any trouble in coding a supplementary test, I'm just 
not at all familiar with the patch release/bioperl package update and 
would prefer leave that to you. For that purpouse I'll take care of 
that bug post in the coming hours!
Thank you very much
Emeric

Le 13 avr. 07, ? 22:13, Jason Stajich a ?crit :

> I think it just needs an edit the code in the to_string which checks
> for the type of algorithm.  You'd need to add to the if/elsif cascade
> and add something for the RPSBLAST type and codes the query and
> target dbs and query and target sequence types properly.  This would
> be very trivial to code in, have you tried adding this to see if it
> works?
>
> if you submit a bug with and example report we'd be able to make
> appropriate changes faster.
>
> -jason
> On Apr 11, 2007, at 6:32 AM, Emeric Sevin wrote:
>
>> Hi everybody,
>>
>> I'm sorry to bug, but either I missed something so obvious nobody
>> bothered to answer, either I'm being a little boycotted here...
>> A little help would be very much appreciated
>>
>> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>>
>>> Hello,
>>>
>>> I am new to this community, and apologize if this subject has been
>>> posted before.
>>>
>>> I want to print out only selected results from a multiple blast-
>>> alignments results file. Problem is, the algorithm used is
>>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the
>>> actual writing task yields "unclean" warnings. Although an ouput
>>> is actually written, the writer
>>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by
>>> the fact rpsblast DBs are not labeled with
>>> "protein"/"nucleic"/"translated".
>>> Does anybody know of an easy fix to that bug, or of another way to
>>> come around it?
>>>
>>> Thank you very much
>>>
>>> Emeric SEVIN
>>> Universit? de Rennes 1_______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From budd at embl-heidelberg.de  Thu Apr 26 06:18:11 2007
From: budd at embl-heidelberg.de (Aidan Budd)
Date: Thu, 26 Apr 2007 12:18:11 +0200 (CEST)
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
Message-ID: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>

Hi Bioperlers,

I'm trying to parse a FASTA search output file (see attached .out file) 
using Bioperl 1.4. My Bioperl installation has otherwise been working 
fine, however I currently get the following error when running a simple 
script that attempts to access result from this outfile via bioperl.

Is this a problem with the parser?
Or have I executed FASTA wrongly creating output that isn't covered by the 
parser?

Any suggestions on how to deal with this much appreciated.

Best wishes,

Aidan

Script:

#!/usr/bin/perl -w
$^W=1;
use strict;
use Bio::SearchIO;

my $fasta_report = new Bio::SearchIO ('-format' => 'fasta',
                                      '-file'   => $ARGV[0]);
                                      
my $result = $fasta_report->next_result();            

Errors:

Use of uninitialized value in concatenation (.) or string at 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/GenericHSP.pm 
line 231, <GEN3> line 47.

------------- EXCEPTION  -------------
MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm 
FASTP -score 62.4 -hit_frame 0 -hsp_length 180 -hit_seq  -hit_length 0 
-query_length 128 -query_frame 0 -swscore 122 -rank 1 -query_seq 
GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS--PALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD-LYCHKSD 
-homology_seq                              
MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKRAKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RKCSL...LL.SVNL.K.ADHE.A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTMEPATLSPKSMR 
-hit_name YFL031W -bits 19.4 -query_name CREB1_MONKEY -evalue 1.1 (qs='
STACK Bio::Search::HSP::GenericHSP::new 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/GenericHSP.pm:231
STACK Bio::Search::HSP::FastaHSP::new 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/FastaHSP.pm:97
STACK Bio::Factory::ObjectFactory::create_object 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Factory/ObjectFactory.pm:150
STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/SearchResultEventBuilder.pm:275
STACK Bio::SearchIO::fasta::end_element 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/fasta.pm:872
STACK Bio::SearchIO::fasta::next_result 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/fasta.pm:403
STACK toplevel 
/Users/budd/scripts/test_scripts/test_parsing_fasta_output.pl:22

--------------------------------------

-- 
----------------------------------------------------------------------
Aidan Budd, PhD                               tel:+49 (0)6221 387 8530
EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
Meyerhofstr. 1, 69117 Heidelberg, Germany

URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html
-------------- next part --------------
# fasta34 -m 2 creb1_human.fasta yeast_bzips_from_ensembl.fasta
FASTA searches a protein or DNA sequence data bank
 version 34.26 January 12, 2007
Please cite:
 W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

Query library creb1_human.fasta vs yeast_bzips_from_ensembl.fasta library
searching yeast_bzips_from_ensembl.fasta library

  1>>>CREB1_MONKEY 341 aa - 341 aa
 vs  yeast_bzips_from_ensembl.fasta library

   3683 residues in    10 sequences
 MLE_cen statistics: Lambda= 0.0338;  K=8.757e-05 (cen=0)

FASTA (3.5 Sept 2006) function [optimized, BL50 matrix (15:-5)] ktup: 2
 join: 37, opt: 25, open/ext: -10/-2, width:  16
 Scan time:  0.000
The best scores are:                                      opt bits E(10)
YFL031W                                            ( 238)  122 19.4     1.1
YEL009C                                            ( 281)  121 19.4     1.3
YIL036W                                            ( 587)  129 19.8       2
YIR017C                                            ( 187)   83 17.5     2.9
YVNL167C                                           ( 647)  119 19.3     2.9
YIR018W                                            ( 245)   67 16.7     5.3
YER045C                                            ( 489)   73 17.0     7.1
YDR259C                                            ( 383)   62 16.5     7.5
YOR028C                                            ( 296)   41 15.5     8.9
YHL009C                                            ( 330)   33 15.1     9.6

>>YFL031W                                                 (238 aa)
 initn: 107 init1: 107 opt: 122  Z-score: 62.4  bits: 19.4 E():  1.1
Smith-Waterman score: 122;  27.660% identity (63.830% similar) in 94 aa overlap (248-337:2-95)

       220       230       240       250       260       270       
CREB1_ GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS--PALP
YFL031                              MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKR

         280       290       300       310       320        330    
CREB1_ TQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD
YFL031 AKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RKCSL...LL.SVNL.K.ADHE.

           340                                                     
CREB1_ -LYCHKSD                                                    
YFL031 A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTMEPATLSPKSMR

>>YEL009C                                                 (281 aa)
 initn: 138 init1:  83 opt: 121  Z-score: 60.8  bits: 19.4 E():  1.3
Smith-Waterman score: 121;  29.412% identity (55.462% similar) in 119 aa overlap (219-335:165-277)

      190       200       210       220       230       240        
CREB1_ GAIQLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGD
YEL009 VSLADKAIESTEEVSLVPSNLEVSTTSFLP.PV.ED.KL.QTRKVKK.NS--..KKSHHV

      250       260       270         280       290       300      
CREB1_ VQTYQIRTAPTSTIAPGVVMASSPALPTQP--AEEAARKREVRLMKNREAARECRRKKKE
YEL009 GKDDES.LDHLGVV.YNRKQR.I.LS.IV.ESSDP..L..----AR.T....RS.AR.LQ

        310       320       330       340 
CREB1_ YVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
YEL009 RM.Q..DK.EE.LSK.YH.EN.VAR..K.VGER  

>>YIL036W                                                 (587 aa)
 initn: 132 init1:  70 opt: 129  Z-score: 57.2  bits: 19.8 E():    2
Smith-Waterman score: 129;  18.750% identity (55.682% similar) in 352 aa overlap (2-335:137-477)

                                            10        20           
CREB1_                              MTMESGAENQQSGDAAVTEAENQQM--TVQA
YIL036 RVVKPSANSNYQQAAYLRQQQQQDQRQQSPS.KTEE.S.LY..ILMNSGVV.D.HQNLAT

      30        40        50        60        70        80         
CREB1_ QPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQSS
YIL036 HTNLSQ.SSTRKS.PNDSTT...-NASNIA.--.AS.NKQMYFMNMNMNNN.HALNDP.I

      90         100       110         120       130       140     
CREB1_ CKDLKRLFS--GTQISTIAESEDS--QESVDSVTDSQKRREILSRRPSYRKILNDL----
YIL036 LET.SPF.QPF.VDVAHLPMTNPPIF.S.LPGCDEPIR..R.SISNGQISQLGE.IETLE

                150       160          170        180       190    
CREB1_ ---SSDAPGVPRIEEEKSEEET---SAPAITTVTVP-TPIYQTSSGQYIAITQGGAIQLA
YIL036 NLHNTQP.PM.NFHNYNGLSQ.RNV.NKPVFNQA..VSS.P.YNAKKV.NP.KDS.--.G

          200       210       220       230       240       250    
CREB1_ NNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQI
YIL036 DQSVIYSKSQ.RNFVNAPSKNT.AES.----SDLE.MTTFA.TTGGENRGK.ALRESHSN

           260       270       280       290       300       310   
CREB1_ RT-APTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLEN
YIL036 PSFT.K.QGSHLNLA.NTQGN.I-.GT-T.W..ARL.ER..I..SK..QR..VAQLQ.QK

           320       330       340                                 
CREB1_ RVAVLENQNKTLIEELKALKDLYCHKSD                                
YIL036 EFNEIKDE.RI.LKK.NYYEK.ISKFKKFSKIHLREHEKLNKDSDNNVNGTNSSNKNESM

>>YIR017C                                                 (187 aa)
 initn:  43 init1:  43 opt:  83  Z-score: 54.0  bits: 17.5 E():  2.9
Smith-Waterman score: 84;  22.785% identity (56.962% similar) in 158 aa overlap (176-330:9-148)

         150       160       170       180       190       200     
CREB1_ PGVPRIEEEKSEEETSAPAITTVTVPTPIYQTSSGQYIAITQGGAIQLANNGTDGVQGLQ
YIR017                       MSAKQGWEKK.TNID..SRK.MNV---..LSEHL.N.I

         210       220       230       240        250       260    
CREB1_ TLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASG-DVQTYQIRTAPTS--TI
YIR017 S------SDSEL.SRL.SLLLVSS.N-----AEELISMINN.Q..SQFKKLRE.RKGKVA

            270       280       290       300       310       320  
CREB1_ APGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQN
YIR017 .TTA.VVKEEEA.VSTSN.LDKIKQE.RR..T..SQRF.IR..Q--.NF..-MNK.Q.L.

            330       340                             
CREB1_ KTLIEELKALKDLYCHKSD                            
YIR017 -.Q.NK.RDRIEQLNKENEFWKAKLNDINEIKSLKLLNDIKRRNMGR

>>YVNL167C                                                (647 aa)
 initn: 142 init1: 119 opt: 119  Z-score: 53.8  bits: 19.3 E():  2.9
Smith-Waterman score: 119;  39.623% identity (62.264% similar) in 53 aa overlap (280-332:426-478)

     250       260       270       280       290       300         
CREB1_ QTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVK
YVNL16 RKNSAVTTAPAQKDDVENNKISNNVTLDEN..QE...KEF.ER..V..SKF.KR....I.

     310       320       330       340                             
CREB1_ CLENRVAVLENQNKTLIEELKALKDLYCHKSD                            
YVNL16 KI..DLQFY.SEYDD.TQVIGK.CGIIPSSSSNSQFNVNVSTPSSSSPPSTSLIALLESS

>>YIR018W                                                 (245 aa)
 initn:  61 init1:  61 opt:  67  Z-score: 47.6  bits: 16.7 E():  5.3
Smith-Waterman score: 67;  25.455% identity (61.818% similar) in 55 aa overlap (280-334:55-109)

     250       260       270       280       290       300         
CREB1_ QTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVK
YIR018 SKNWKLPPRLPHRAAQRRKRVHRLHEDYET..NDEELQKKKRQ..D.Q.AY.ER.NNKLQ

     310       320       330       340                             
CREB1_ CLENRVAVLENQNKTLIEELKALKDLYCHKSD                            
YIR018 V..ETIES.SKVV.NYETK.NR.QNELQAKESENHALKQKLETLTLKQASVPAQDPILQN

>>YER045C                                                 (489 aa)
 initn: 111 init1:  70 opt:  73  Z-score: 43.8  bits: 17.0 E():  7.1
Smith-Waterman score: 97;  22.826% identity (67.391% similar) in 92 aa overlap (3-92:210-300)

                                           10        20         30 
CREB1_                             MTMESGAENQQSGDAAVTEAE-NQQMTVQAQP
YER045 QTGSKNIYAAMTPYDSNIKLNIPAVAATCDIP.ATPSIP...STMNQ.YI.M.LRL...M

              40        50        60         70        80        90
CREB1_ QIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGV-IQAAQPSVIQSPQVQTVQSSC
YER045 .TKAWKNAQL-NV.PCTP.SNSSVSSSSSC.NIND.NIEN.SVHS.ISHGVNHH..NN..

              100       110       120       130       140       150
CREB1_ KDLKRLFSGTQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPR
YER045 QNAELNISSSLPYESKCPDVNLTHANSKPQYKDATSALKNNINSEKDVHTAPFSSMHTTA

>>YDR259C                                                 (383 aa)
 initn:  84 init1:  52 opt:  62  Z-score: 42.8  bits: 16.5 E():  7.5
Smith-Waterman score: 81;  33.333% identity (64.583% similar) in 48 aa overlap (289-330:227-274)

      260       270       280       290       300       310        
CREB1_ TSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVL
YDR259 NDNNDNVTKPVPDKDTQLISSSGKTLRNTR.AAQ..T.QKAF.QR.EK.I.N..QKSKIF

           320        330       340                                
CREB1_ -----ENQN-KTLIEELKALKDLYCHKSD                               
YDR259 DDLLA..N.F.S.NDS.RNDNNILIAQHEAIRNAITMLRSEYDVLCNENNMLKNENSIIK

>>YOR028C                                                 (296 aa)
 initn:  35 init1:  35 opt:  41  Z-score: 39.3  bits: 15.5 E():  8.9
Smith-Waterman score: 80;  33.962% identity (66.038% similar) in 53 aa overlap (289-334:243-295)

      260       270       280       290       300       310        
CREB1_ TSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVL
YOR028 LSEQVFNEGERYNNDGQLIGKTGKPLRNTK.AAQ..S.QKAF.QRREK.I.N..EKSKLF

           320        330        340 
CREB1_ -----ENQN-KTLIEELKA-LKDLYCHKSD
YOR028 DGLMK..SEL.KM..S..SK..E*      

>>YHL009C                                                 (330 aa)
 initn:  33 init1:  33 opt:  33  Z-score: 36.4  bits: 15.1 E():  9.6
Smith-Waterman score: 91;  21.667% identity (57.500% similar) in 120 aa overlap (222-333:79-194)

             200       210       220       230             240     
CREB1_ QLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQI-LVP-----SNQVVVQAA
YHL009 EQTAPFPILEDQCPALNLDRSNNDLLLQNNISFPKGS.L.A.Q.T.ISGDY.TY.MADNN

         250         260       270       280       290       300   
CREB1_ SGDVQTYQIRT--APTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRK
YHL009 NN.NDS.SNTNYFSKNNG.S.SSRSP.VAHNENV.DDSK.K.KA----Q..A.QKAF.ER

           310       320       330       340                       
CREB1_ KKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD                      
YHL009 .EARM.E.QDKLLES.RNRQS.LK.IEE.RKANTEINAENRLLLRSGNENFSKDIEDDTN


341 residues in 1 query   sequences
3683 residues in 10 library sequences
 Scomplib [34.26]
 start: Thu Apr 26 11:52:16 2007 done: Thu Apr 26 11:52:16 2007
 Total Scan time:  0.000 Total Display time:  0.010

Function used was FASTA [version 34.26 January 12, 2007]
-------------- next part --------------
>CREB1_MONKEY
MTMESGAENQQSGDAAVTEAENQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPN
GQTVQVHGVIQAAQPSVIQSPQVQTVQSSCKDLKRLFSGTQISTIAESEDSQESVDSVTD
SQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETSAPAITTVTVPTPIYQTSSG
QYIAITQGGAIQLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQV
VVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAAREC
RRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
-------------- next part --------------
>YIL036W
MFTGQEYHSVDSNSNKQKDNNKRGIDDTSKILNNKIPHSVSDTSAAATTTSTMNNSALSR
SLDPTDINYSTNMAGVVDQIHDYTTSNRNSLTPQYSIAAGNVNSHDRVVKPSANSNYQQA
AYLRQQQQQDQRQQSPSMKTEEESQLYGDILMNSGVVQDMHQNLATHTNLSQLSSTRKSA
PNDSTTAPTNASNIANTASVNKQMYFMNMNMNNNPHALNDPSILETLSPFFQPFGVDVAH
LPMTNPPIFQSSLPGCDEPIRRRRISISNGQISQLGEDIETLENLHNTQPPPMPNFHNYN
GLSQTRNVSNKPVFNQAVPVSSIPQYNAKKVINPTKDSALGDQSVIYSKSQQRNFVNAPS
KNTPAESISDLEGMTTFAPTTGGENRGKSALRESHSNPSFTPKSQGSHLNLAANTQGNPI
PGTTAWKRARLLERNRIAASKCRQRKKVAQLQLQKEFNEIKDENRILLKKLNYYEKLISK
FKKFSKIHLREHEKLNKDSDNNVNGTNSSNKNESMTVDSLKIIEELLMIDSDVTEVDKDT
GKIIAIKHEPYSQRFGSDTDDDDIDLKPVEGGKDPDNQSLPNSEKIK
>YIR017C
MSAKQGWEKKSTNIDIASRKGMNVNNLSEHLQNLISSDSELGSRLLSLLLVSSGNAEELI
SMINNGQDVSQFKKLREPRKGKVAATTAVVVKEEEAPVSTSNELDKIKQERRRKNTEASQ
RFRIRKKQKNFENMNKLQNLNTQINKLRDRIEQLNKENEFWKAKLNDINEIKSLKLLNDI
KRRNMGR
>YVNL167C
MSSEERSRQPSTVSTFDLEPNPFEQSFASSKKALSLPGTISHPSLPKELSRNNSTSTITQ
HSQRSTHSLNSIPEENGNSTVTDNSNHNDVKKDSPSFLPGQQRPTIISPPILTPGGSKRL
PPLLLSPSILYQANSTTNPSQNSHSVSVSNSNPSAIGVSSTSGSLYPNSSSPSGTSLIRQ
PRNSNVTTSNSGNGFPTNDSQMPGFLLNLSKSGLTPNESNIRTGLTPGILTQSYNYPVLP
SINKNTITGSKNVNKSVTVNGSIENHPHVNIMHPTVNGTPLTPGLSSLLNLPSTGVLANP
VFKSTPTTNTTDGTVNNSISNSNFSPNTSTKAAVKMDNPAEFNAIEHSAHNHKENENLTT
QIENNDQFNNKTRKRKRRMSSTSSTSKASRKNSISRKNSAVTTAPAQKDDVENNKISNNV
TLDENEEQERKRKEFLERNRVAASKFRKRKKEYIKKIENDLQFYESEYDDLTQVIGKLCG
IIPSSSSNSQFNVNVSTPSSSSPPSTSLIALLESSISRSDYSSAMSVLSNMKQLICETNF
YRRGGKNPRDDMDGQEDSFNKDTNVVKSENAGYPSVNSRPIILDKKYSLNSGANISKSNT
TTNNVGNSAQNIINSCYSVTNPLVINANSDTHDTNKHDVLSTLPHNN
>YER045C
MDYKHNFATSPDSFLDGRQNPLLYTDFLSSNKELIYKQPSGPGLVDSAYNFHHQNSLHDR
SVQENLGPMFQPFGVDISHLPITNPPIFQSSLPAFDQPVYKRRISISNGQISQLGEDLET
VENLYNCQPPILSSKAQQNPNPQQVANPSAAIYPSFSSNELQNVPQPHEQATVIPEAAPQ
TGSKNIYAAMTPYDSNIKLNIPAVAATCDIPSATPSIPSGDSTMNQAYINMQLRLQAQMQ
TKAWKNAQLNVHPCTPASNSSVSSSSSCQNINDHNIENQSVHSSISHGVNHHTVNNSCQN
AELNISSSLPYESKCPDVNLTHANSKPQYKDATSALKNNINSEKDVHTAPFSSMHTTATF
QIKQEARPQKIENNTAGLKDGAKAWKRARLLERNRIAASKCRQRKKMSQLQLQREFDQIS
KENTMMKKKIENYEKLVQKMKKISRLHMQECTINGGNNSYQSLQNKDSDVNGFLKMIEEM
IRSSSLYDE
>YIR018W
MALPLIKPKESEESHLALLSKIHVSKNWKLPPRLPHRAAQRRKRVHRLHEDYETEENDEE
LQKKKRQNRDAQRAYRERKNNKLQVLEETIESLSKVVKNYETKLNRLQNELQAKESENHA
LKQKLETLTLKQASVPAQDPILQNLIENFKPMKAIPIKYNTAIKRHQHSTELPSSVKCGF
CNDNTTCVCKELETDHRKSDDGVATEQKDMSMPHAECNNKDNPNGLCSNCTNIDKSCIDI
RSIIH
>YHL009C
MTPSNMDDNTSGFMKFINPQCQEEDCCIRNSLFQEDSKCIKQQPDLLSEQTAPFPILEDQ
CPALNLDRSNNDLLLQNNISFPKGSDLQAIQLTPISGDYSTYVMADNNNNDNDSYSNTNY
FSKNNGISPSSRSPSVAHNENVPDDSKAKKKAQNRAAQKAFRERKEARMKELQDKLLESE
RNRQSLLKEIEELRKANTEINAENRLLLRSGNENFSKDIEDDTNYKYSFPTKDEFFTSMV
LESKLNHKGKYSLKDNEIMKRNTQYTDEAGRHVLTVPATWEYLYKLSEERDFDVTYVMSK
LQGQECCHTHGPAYPRSLIDFLVEEATLNE
>YOR028C
MLMQIKMDNHPFNFQPILASHSMTRDSTKPKKMTDTAFVPSPPVGFIKEENKADLHTISV
VASNVTLPQIQLPKIATLEEPGYESRTGSLTDLSGRRNSVNIGALCEDVPNTAGPHIARP
VTINNLIPPSLPRLNTYQLRPQLSDTHLNCHFNSNPYTTASHAPFESSYTTASTFTSQPA
ASYFPSNSTPATRKNSATTNLPSEERRRVSVSLSEQVFNEGERYNNDGQLIGKTGKPLRN
TKRAAQNRSAQKAFRQRREKYIKNLEEKSKLFDGLMKENSELKKMIESLKSKLKE*
>YEL009C
MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD
TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW
TSLFDNDIPVTTDDVSLADKAIESTEEVSLVPSNLEVSTTSFLPTPVLEDAKLTQTRKVK
KPNSVVKKSHHVGKDDESRLDHLGVVAYNRKQRSIPLSPIVPESSDPAALKRARNTEAAR
RSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
>YDR259C
MQNPPLIRPDMYNQGSSSMATYNASEKNLNEHPSPQIAQPSTSQKLPYRINPTTTNGDTD
ISVNSNPIQPPLPNLMHLSGPSDYRSMHQSPIHPSYIIPPHSNERKQSASYNRPQNAHVS
IQPSVVFPPKSYSISYAPYQINPPLPNGLPNQSISLNKEYIAEEQLSTLPSRNTSVTTAP
PSFQNSADTAKNSADNNDNNDNVTKPVPDKDTQLISSSGKTLRNTRRAAQNRTAQKAFRQ
RKEKYIKNLEQKSKIFDDLLAENNNFKSLNDSLRNDNNILIAQHEAIRNAITMLRSEYDV
LCNENNMLKNENSIIKNEHNMSRNENENLKLENKRFHAEYIRMIEDIENTKRKEQEQRDE
IEQLKKKIRSLEEIVGRHSDSAT
>YFL031W
MEMTDFELTSNSQSNLAIPTNFKSTLPPRKRAKTKEEKEQRRIERILRNRRAAHQSREKK
RLHLQYLERKCSLLENLLNSVNLEKLADHEDALTCSHDAFVASLDEYRDFQSTRGASLDT
RASSHSSSDTFTPSPLNCTMEPATLSPKSMRDSASDQETSWELQMFKTENVPESTTLPAV
DNNNLFDAVASPLADPLCDDIAGNSLPFDNSIDLDNWRNPEAQSGLNSFELNDFFITS

From jason at bioperl.org  Thu Apr 26 15:27:24 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Apr 2007 12:27:24 -0700
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
In-Reply-To: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
References: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
Message-ID: <7C782DA2-5A80-413A-9B5A-94EEBEA9EF6E@bioperl.org>

Unfortunately there are some changes in the FASTA output in that  
version. The latest version of Bioperl 1.52 can handle it though so  
you'll need to upgrade Bioperl.

-jason
On Apr 26, 2007, at 3:18 AM, Aidan Budd wrote:

> Hi Bioperlers,
>
> I'm trying to parse a FASTA search output file (see attached .out  
> file)
> using Bioperl 1.4. My Bioperl installation has otherwise been working
> fine, however I currently get the following error when running a  
> simple
> script that attempts to access result from this outfile via bioperl.
>
> Is this a problem with the parser?
> Or have I executed FASTA wrongly creating output that isn't covered  
> by the
> parser?
>
> Any suggestions on how to deal with this much appreciated.
>
> Best wishes,
>
> Aidan
>
> Script:
>
> #!/usr/bin/perl -w
> $^W=1;
> use strict;
> use Bio::SearchIO;
>
> my $fasta_report = new Bio::SearchIO ('-format' => 'fasta',
>                                       '-file'   => $ARGV[0]);
>
> my $result = $fasta_report->next_result();
>
> Errors:
>
> Use of uninitialized value in concatenation (.) or string at
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> GenericHSP.pm
> line 231, <GEN3> line 47.
>
> ------------- EXCEPTION  -------------
> MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm
> FASTP -score 62.4 -hit_frame 0 -hsp_length 180 -hit_seq  -hit_length 0
> -query_length 128 -query_frame 0 -swscore 122 -rank 1 -query_seq
> GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS-- 
> PALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD- 
> LYCHKSD
> -homology_seq
> MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKRAKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RK 
> CSL...LL.SVNL.K.ADHE.A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTM 
> EPATLSPKSMR
> -hit_name YFL031W -bits 19.4 -query_name CREB1_MONKEY -evalue 1.1  
> (qs='
> STACK Bio::Search::HSP::GenericHSP::new
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> GenericHSP.pm:231
> STACK Bio::Search::HSP::FastaHSP::new
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> FastaHSP.pm:97
> STACK Bio::Factory::ObjectFactory::create_object
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Factory/ 
> ObjectFactory.pm:150
> STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> SearchResultEventBuilder.pm:275
> STACK Bio::SearchIO::fasta::end_element
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> fasta.pm:872
> STACK Bio::SearchIO::fasta::next_result
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> fasta.pm:403
> STACK toplevel
> /Users/budd/scripts/test_scripts/test_parsing_fasta_output.pl:22
>
> --------------------------------------
>
> -- 
> ----------------------------------------------------------------------
> Aidan Budd, PhD                               tel:+49 (0)6221 387 8530
> EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
> Meyerhofstr. 1, 69117 Heidelberg, Germany
>
> URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html
> <creb_vs_yeast_manual_fasta_changed_infile_formats.out>
> <creb1_human.fasta>
> <yeast_bzips_from_ensembl.fasta>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From dmessina at wustl.edu  Thu Apr 26 15:42:02 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 26 Apr 2007 14:42:02 -0500
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
In-Reply-To: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
References: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
Message-ID: <D41F5BDD-B992-4787-91C5-732B41683908@wustl.edu>

Hi Aidan,

Bioperl 1.4 is ~3 years old now, and FASTA output has probably  
changed since then. Your code should work if you install Bioperl  
1.5.2, the latest release.

	http://www.bioperl.org/wiki/Installing_BioPerl

Please let us know if that doesn't solve the problem.

Dave


From gopu_36 at yahoo.com  Thu Apr 26 21:29:03 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Thu, 26 Apr 2007 18:29:03 -0700 (PDT)
Subject: [Bioperl-l] check for the continous segments to extract the
	sequences
Message-ID: <10211951.post@talk.nabble.com>


As a newbee to programming, thx for the support from this group. Please
ignore the message if this message is not relevant to this group as my
problem may be a typical computer science recursive one! (as I am not aware)

I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000, 5001, 6000,
6001, 7000, 7001, 8000, 12001, 13000);
The above array gives the posiiton of sequences like '1' shows the start
position and the second element '1000' gives the end of the sequence and so
on. All the even positions like 0,2,4... shows the starting points of the
sequence and odd positions like 1000, 2000, 5000 gives the END positions of
the sequences to be retrieved. basically I have to see whwther any continous
segments lie in the list and add them together to form a one whole chunk.
For example 1-1000 and 1001-2000 can be joined together to extract sequences
from 1-2000. In the same way 4001-8000 should be extracted and 12001-13000
and so on. As I said earlier, after checking the position, I will be able to
extract that part of sequence from a whole genome. Thanks for taking ur
time. Any tip or help would be greatly appreciated.

Regards
Gopu 
-- 
View this message in context: http://www.nabble.com/check-for-the-continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Thu Apr 26 21:54:59 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Apr 2007 18:54:59 -0700
Subject: [Bioperl-l] check for the continous segments to extract the
	sequences
In-Reply-To: <10211951.post@talk.nabble.com>
References: <10211951.post@talk.nabble.com>
Message-ID: <EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>

You want a connectivity algorithm.  One can be found on perlmonks.org  
as well as in Bio::Search::SearchUtils the method collapse_nums().  
You'll have to modify aspects of it to deal with ranges.

Good luck.
-jason
On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:

>
> As a newbee to programming, thx for the support from this group.  
> Please
> ignore the message if this message is not relevant to this group as my
> problem may be a typical computer science recursive one! (as I am  
> not aware)
>
> I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,  
> 5001, 6000,
> 6001, 7000, 7001, 8000, 12001, 13000);
> The above array gives the posiiton of sequences like '1' shows the  
> start
> position and the second element '1000' gives the end of the  
> sequence and so
> on. All the even positions like 0,2,4... shows the starting points  
> of the
> sequence and odd positions like 1000, 2000, 5000 gives the END  
> positions of
> the sequences to be retrieved. basically I have to see whwther any  
> continous
> segments lie in the list and add them together to form a one whole  
> chunk.
> For example 1-1000 and 1001-2000 can be joined together to extract  
> sequences
> from 1-2000. In the same way 4001-8000 should be extracted and  
> 12001-13000
> and so on. As I said earlier, after checking the position, I will  
> be able to
> extract that part of sequence from a whole genome. Thanks for  
> taking ur
> time. Any tip or help would be greatly appreciated.
>
> Regards
> Gopu
> -- 
> View this message in context: http://www.nabble.com/check-for-the- 
> continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From MEC at stowers-institute.org  Fri Apr 27 09:52:10 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 27 Apr 2007 08:52:10 -0500
Subject: [Bioperl-l] check for the continous segments to extract
	thesequences
In-Reply-To: <EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>
References: <10211951.post@talk.nabble.com>
	<EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>
Message-ID: <CED81D34E37D5043A1211565277A51E507E22F28@exchkc02.stowers-institute.org>

Gopu/Jason,

Another option is Set::IntSpan, available on CPAN at
http://search.cpan.org/~swmcd/Set-IntSpan-1.11/IntSpan.pm

Here's a perl one-liner that shows you how easy it is:

perl -MSet::IntSpan -e 'my @array = ( 1, 1000, 1001, 2000, 4001, 5000,
5001, 6000, 6001, 7000, 7001, 8000, 12001, 13000); my $is =
Set::IntSpan->new;  while (@array) {$is->U(shift(@array) . "-" .
shift(@array))}; print $is;'
1-2000,4001-8000,12001-13000

I use it all the time to great effect and have utility functions that
convert between bioperl split locations and IntSpans.

There is another module which extends it nicely, Set::IntSpan::Island,
worth a gander.

Cheers,

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Thursday, April 26, 2007 8:55 PM
> To: gopu_36
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] check for the continous segments to 
> extract thesequences
> 
> You want a connectivity algorithm.  One can be found on 
> perlmonks.org  
> as well as in Bio::Search::SearchUtils the method collapse_nums().  
> You'll have to modify aspects of it to deal with ranges.
> 
> Good luck.
> -jason
> On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:
> 
> >
> > As a newbee to programming, thx for the support from this group.  
> > Please
> > ignore the message if this message is not relevant to this 
> group as my
> > problem may be a typical computer science recursive one! (as I am  
> > not aware)
> >
> > I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,  
> > 5001, 6000,
> > 6001, 7000, 7001, 8000, 12001, 13000);
> > The above array gives the posiiton of sequences like '1' shows the  
> > start
> > position and the second element '1000' gives the end of the  
> > sequence and so
> > on. All the even positions like 0,2,4... shows the starting points  
> > of the
> > sequence and odd positions like 1000, 2000, 5000 gives the END  
> > positions of
> > the sequences to be retrieved. basically I have to see whwther any  
> > continous
> > segments lie in the list and add them together to form a one whole  
> > chunk.
> > For example 1-1000 and 1001-2000 can be joined together to extract  
> > sequences
> > from 1-2000. In the same way 4001-8000 should be extracted and  
> > 12001-13000
> > and so on. As I said earlier, after checking the position, I will  
> > be able to
> > extract that part of sequence from a whole genome. Thanks for  
> > taking ur
> > time. Any tip or help would be greatly appreciated.
> >
> > Regards
> > Gopu
> > -- 
> > View this message in context: http://www.nabble.com/check-for-the- 
> > continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From lstein at cshl.edu  Fri Apr 27 13:44:59 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 27 Apr 2007 13:44:59 -0400
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>

Hi Malcom,

This is absolutely ok and you can go ahead and commit. Thanks for figuring
this out!

Lincoln

On 4/26/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, et al,
>
> I find that the gff3_string for Bio::DB::SeqFeature objects retreived
> from a Bio::DB::SeqFeature::Store that were initially created with
> -seqments (i.e. whose location was discontiguous) does not display any
> other attributes in column 9 than "Name".
>
> What do you think of the following patch to Bio::Graphics::FeatureBase,
> whose effect is to "contrive to return (duplicated) common group values"
> (which otherwise get lost when "collapsing" "homogenous" parent/child
> features)
>
> Another approach would be to copy the attributes from the parent to the
> children when the -seqments are first created.
>
> Another approach would be to use Bio::SeqFeature::Generic  as the db's
> -seqfeature_class and save with -location being a Bio::Location::Split,
> but this was wrougth with other problems.
>
> Any other suggestions?  Do you want me to commit this patch?
>
> Cheers,
>
> Malcolm
>
> Patch follows:
>
>
>
>
> Index: FeatureBase.pm
> ===================================================================
> RCS file:
> /home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
> retrieving revision 1.29
> diff -c -r1.29 FeatureBase.pm
> *** FeatureBase.pm      16 Apr 2007 19:55:33 -0000      1.29
> --- FeatureBase.pm      26 Apr 2007 16:30:23 -0000
> ***************
> *** 581,587 ****
>       foreach (@children) {
>         s/Parent=/ID=/g;
>       } # replace Parent tag with ID
> !     return join "\n", at children;
>     }
>
>     return join("\n",$p, at children);
> --- 581,589 ----
>       foreach (@children) {
>         s/Parent=/ID=/g;
>       } # replace Parent tag with ID
> !     #return join "\n", at children;
> !     # Instead of above, additionally, contrive to return (duplicated)
> common group values
> !     return(join("$group\n", at children) . $group);
>     }
>
>     return join("\n",$p, at children);
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From MEC at stowers-institute.org  Fri Apr 27 14:45:04 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 27 Apr 2007 13:45:04 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
	<6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E507E22F59@exchkc02.stowers-institute.org>

Hi Lincoln,
 
Cool.
 
The principal of what I figured out I still think holds but the
implementation is slightly broke.  Improved patch forthoming next week.
 

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Friday, April 27, 2007 12:45 PM
	To: Cook, Malcolm
	Cc: lstein at cshl.org; bioperl list
	Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Malcom,
	
	This is absolutely ok and you can go ahead and commit. Thanks
for figuring this out!
	
	Lincoln
	
	
	On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

		Lincoln, et al,
		
		I find that the gff3_string for Bio::DB::SeqFeature
objects retreived 
		from a Bio::DB::SeqFeature::Store that were initially
created with
		-seqments (i.e. whose location was discontiguous) does
not display any
		other attributes in column 9 than "Name".
		
		What do you think of the following patch to
Bio::Graphics::FeatureBase, 
		whose effect is to "contrive to return (duplicated)
common group values"
		(which otherwise get lost when "collapsing" "homogenous"
parent/child
		features)
		
		Another approach would be to copy the attributes from
the parent to the 
		children when the -seqments are first created.
		
		Another approach would be to use
Bio::SeqFeature::Generic  as the db's
		-seqfeature_class and save with -location being a
Bio::Location::Split,
		but this was wrougth with other problems. 
		
		Any other suggestions?  Do you want me to commit this
patch?
		
		Cheers,
		
		Malcolm
		
		Patch follows:
		
		
		Index: FeatureBase.pm
	
=================================================================== 
		RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
		retrieving revision 1.29
		diff -c -r1.29 FeatureBase.pm
		*** FeatureBase.pm      16 Apr 2007 19:55:33 -0000
1.29
		--- FeatureBase.pm       26 Apr 2007 16:30:23 -0000
		***************
		*** 581,587 ****
		      foreach (@children) {
		        s/Parent=/ID=/g;
		      } # replace Parent tag with ID
		!     return join "\n", at children;
		    }
		
		    return join("\n",$p, at children);
		--- 581,589 ----
		      foreach (@children) {
		        s/Parent=/ID=/g;
		      } # replace Parent tag with ID
		!     #return join "\n", at children; 
		!     # Instead of above, additionally, contrive to
return (duplicated)
		common group values
		!     return(join("$group\n", at children) . $group);
		    }
		
		    return join("\n",$p, at children); 
		

	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From bernd at kirx.de  Sat Apr 28 10:36:07 2007
From: bernd at kirx.de (Bernd Mueller)
Date: Sat, 28 Apr 2007 16:36:07 +0200
Subject: [Bioperl-l] bioperl::db
Message-ID: <46335BD7.8040306@kirx.de>

Hi,

I followed those instructions on bioperl.org for installing bioperl via 
cpan. But actually it is impossible for me to install the bioperl::db 
module.

How does this work?

Moreover none of these Birney distribution are installable on my system. 
After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always 
fail. So I have to install the CRAFFI bundle but it does not seem that 
Bio::DB module is included in this bundle because my programs using that 
module do not work.

Help would be appreciated :)

Cheers,
Bernd

Appendix:

cpan[6]> d /bioperl/
Distribution    BIRNEY/bioperl-1.2.1.tar.gz
Distribution    BIRNEY/bioperl-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-1.2.3.tar.gz
Distribution    BIRNEY/bioperl-1.2.tar.gz
Distribution    BIRNEY/bioperl-1.4.tar.gz
Distribution    BIRNEY/bioperl-db-0.1.tar.gz
Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-run-1.4.tar.gz
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
12 items found


-- 
Dipl.-Inform.(FH)
Bernd Mueller
phone: +49 179 2336692
email: bernd at kirx.de


From cydeweys at gmail.com  Sun Apr 29 09:43:55 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 09:43:55 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
Message-ID: <4634A11B.6090809@umd.edu>

I'm trying to load up a table of codon usage frequencies I've downloaded
from the web using Bio::CodonUsage::IO.  My code looks like this:

    use Bio::CodonUsage::Table;
    use Bio::CodonUsage::IO;
    # ...
    my $io = Bio::CodonUsage::IO->new(-file=>$freqFile);
    my $codonTable = $io->next_data();

Unfortunately, I can't seem to find any documentation on what format the
codon usage table file is expected to be in, and all of my best guesses
seem to be invalid, yielding the following error message:

-------------------- WARNING ---------------------
MSG: probable parsing error - should be 21 entries for 20aa + stop codon
---------------------------------------------------

I've tried using both formats that are available from the Codon Usage
Database (easily the largest source of codon usage frequencies),
available here: http://www.kazusa.or.jp/codon/

The two formats I've tried and failed look like this:

UUU 32.5( 45732)  UCU 15.3( 21588)  UAU 27.8( 39146)  UGU  6.3(  8796)
UUC 14.3( 20101)  UCC  3.2(  4458)  UAC  9.3( 13016)  UGC  2.1(  2971)
...


AND

AmAcid  Codon      Number    /1000     Fraction   ..

Gly     GGG     13198.00      9.38      0.14
Gly     GGA     34123.00     24.26      0.36
...


So, anyone know how to get this downloaded codon usage data loaded up
into a Bio::CodonUsage::Table object?  Bio::CodonUsage::IO doesn't seem
to like parsing the standard formats.  Thanks.


From cjfields at uiuc.edu  Sun Apr 29 10:05:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 09:05:59 -0500
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <4634A11B.6090809@umd.edu>
References: <4634A11B.6090809@umd.edu>
Message-ID: <469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>

One example file (MmCT) can be found in the test data directory in  
the bioperl distribution (t/data directory) and some tests relevant  
to codon table usage are found in DBCUTG.t.

chris

On Apr 29, 2007, at 8:43 AM, Ben McIlwain wrote:

> I'm trying to load up a table of codon usage frequencies I've  
> downloaded
> from the web using Bio::CodonUsage::IO.  My code looks like this:
>
>     use Bio::CodonUsage::Table;
>     use Bio::CodonUsage::IO;
>     # ...
>     my $io = Bio::CodonUsage::IO->new(-file=>$freqFile);
>     my $codonTable = $io->next_data();
>
> Unfortunately, I can't seem to find any documentation on what  
> format the
> codon usage table file is expected to be in, and all of my best  
> guesses
> seem to be invalid, yielding the following error message:
>
> -------------------- WARNING ---------------------
> MSG: probable parsing error - should be 21 entries for 20aa + stop  
> codon
> ---------------------------------------------------
>
> I've tried using both formats that are available from the Codon Usage
> Database (easily the largest source of codon usage frequencies),
> available here: http://www.kazusa.or.jp/codon/
>
> The two formats I've tried and failed look like this:
>
> UUU 32.5( 45732)  UCU 15.3( 21588)  UAU 27.8( 39146)  UGU  6.3(  8796)
> UUC 14.3( 20101)  UCC  3.2(  4458)  UAC  9.3( 13016)  UGC  2.1(  2971)
> ...
>
>
> AND
>
> AmAcid  Codon      Number    /1000     Fraction   ..
>
> Gly     GGG     13198.00      9.38      0.14
> Gly     GGA     34123.00     24.26      0.36
> ...
>
>
> So, anyone know how to get this downloaded codon usage data loaded up
> into a Bio::CodonUsage::Table object?  Bio::CodonUsage::IO doesn't  
> seem
> to like parsing the standard formats.  Thanks.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cydeweys at gmail.com  Sun Apr 29 10:06:12 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 10:06:12 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
Message-ID: <4634A654.7010708@gmail.com>

Chris Fields wrote:
> One example file (MmCT) can be found in the test data directory in the
> bioperl distribution (t/data directory) and some tests relevant to codon
> table usage are found in DBCUTG.t.

I still get the same warning message even when running on the given test
data?  That doesn't sound right.

-------------------- WARNING ---------------------
MSG: probable parsing error - should be 21 entries for 20aa + stop codon
---------------------------------------------------


From cjfields at uiuc.edu  Sun Apr 29 17:50:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 16:50:15 -0500
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <4634A654.7010708@gmail.com>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
	<4634A654.7010708@gmail.com>
Message-ID: <DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>

Odd, when I run 'perl -I. t/DBCUTG.t' from CVS it works fine.  Of  
course, I am assuming that you are running the latest release (1.5.2).

Could you post a bug report with a script that generates the error?

chris

On Apr 29, 2007, at 9:06 AM, Ben McIlwain wrote:

> Chris Fields wrote:
>> One example file (MmCT) can be found in the test data directory in  
>> the
>> bioperl distribution (t/data directory) and some tests relevant to  
>> codon
>> table usage are found in DBCUTG.t.
>
> I still get the same warning message even when running on the given  
> test
> data?  That doesn't sound right.
>
> -------------------- WARNING ---------------------
> MSG: probable parsing error - should be 21 entries for 20aa + stop  
> codon
> ---------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cydeweys at gmail.com  Sun Apr 29 18:15:32 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 18:15:32 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
	<4634A654.7010708@gmail.com>
	<DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>
Message-ID: <46351904.4070202@gmail.com>

Chris Fields wrote:
> Odd, when I run 'perl -I. t/DBCUTG.t' from CVS it works fine.  Of
> course, I am assuming that you are running the latest release (1.5.2).
> 
> Could you post a bug report with a script that generates the error?

Sorry, it was my mistake.  I had turned off warnings and strict earlier
for debugging purposes and then forgot to turn them back on.  It turns
out I was trying to read in the codon frequencies when the filename was
an uninitialized string variable (I typoed the name).  Whoops.  Now that
I've spelled the variable name correctly, it is working.


From bernd at kirx.de  Sun Apr 29 18:57:53 2007
From: bernd at kirx.de (Bernd Mueller)
Date: Mon, 30 Apr 2007 00:57:53 +0200
Subject: [Bioperl-l] bioperl::db
In-Reply-To: <46335BD7.8040306@kirx.de>
References: <46335BD7.8040306@kirx.de>
Message-ID: <463522F1.2010406@kirx.de>

Hello list,

I figured out my problem. Actually it was because of problems in the 
versioning of bioperl. It is described to figure out the available 
versions of bioperl in CPAN but afterwards it is described to install a 
much higher version wich is not listed as distribution in CPAN. So it 
works fine now. Thanks anyway. Proficiency in reading results in success ;-)

But I have another question: Does anyone know how to retrieve free 
fulltext documents with EUtilities from Pubmed Central? All my queries 
result in a corpora of free and non-free articles.

Thanks and regards,

Bernd


Bernd Mueller wrote:
> Hi,
> 
> I followed those instructions on bioperl.org for installing bioperl via 
> cpan. But actually it is impossible for me to install the bioperl::db 
> module.
> 
> How does this work?
> 
> Moreover none of these Birney distribution are installable on my system. 
> After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always 
> fail. So I have to install the CRAFFI bundle but it does not seem that 
> Bio::DB module is included in this bundle because my programs using that 
> module do not work.
> 
> Help would be appreciated :)
> 
> Cheers,
> Bernd
> 
> Appendix:
> 
> cpan[6]> d /bioperl/
> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    BIRNEY/bioperl-1.2.tar.gz
> Distribution    BIRNEY/bioperl-1.4.tar.gz
> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
> 12 items found
> 
> 

-- 
Dipl.-Inform.(FH)
Bernd Mueller
phone: +49 179 2336692
email: bernd at kirx.de


From cjfields at uiuc.edu  Sun Apr 29 20:16:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:16:11 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
Message-ID: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>

Allen (or anyone),

What is the status of this module?  It requires a module not listed  
in the dependencies (WWW:Mechanize) and has no tests.

chris


From allenday at ucla.edu  Sun Apr 29 20:21:19 2007
From: allenday at ucla.edu (Allen Day)
Date: Sun, 29 Apr 2007 17:21:19 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
Message-ID: <5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>

Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
ago.  I only implemented for a few journals, so it never worked for a
large fraction of publications.  Probably it barely works or does not
work at all now b/c of how the PDF are scraped out of the HTML.  The
publisher sites are always modifying their HTML, presumably trying to
prevent automated download like this.

-Allen

On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Allen (or anyone),
>
> What is the status of this module?  It requires a module not listed
> in the dependencies (WWW:Mechanize) and has no tests.
>
> chris
>


From cjfields at uiuc.edu  Sun Apr 29 20:28:47 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:28:47 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
Message-ID: <29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>

Quick response!  Yep, I've run into this with a few publishers.   
Though they're supposed to have 'permanent' links for those of us who  
like to link to our pubs they frequently change (scary if that's  
their definition of permanent).

Did you want us to remove the code from CVS?

chris

On Apr 29, 2007, at 7:21 PM, Allen Day wrote:

> Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
> ago.  I only implemented for a few journals, so it never worked for a
> large fraction of publications.  Probably it barely works or does not
> work at all now b/c of how the PDF are scraped out of the HTML.  The
> publisher sites are always modifying their HTML, presumably trying to
> prevent automated download like this.
>
> -Allen
>
> On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> Allen (or anyone),
>>
>> What is the status of this module?  It requires a module not listed
>> in the dependencies (WWW:Mechanize) and has no tests.
>>
>> chris
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Sun Apr 29 20:31:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:31:15 -0500
Subject: [Bioperl-l] PMC and EUtilities, was  bioperl::db
In-Reply-To: <463522F1.2010406@kirx.de>
References: <46335BD7.8040306@kirx.de> <463522F1.2010406@kirx.de>
Message-ID: <01DC1D72-8AFA-4C11-8795-D4787506C602@uiuc.edu>

There may be a way to limit the initial query to full text docs from  
esearch, then use the history to retrieve only the XML docs you  
want.  Is that what you mean?

BioPerl-based access to PMC is limited at best.  Bio::DB::EUtilities  
only returns raw PMC XML with no post-processing of raw data (for  
good reason, as EUtilities is meant to be an intermediate step).   
Allen Day's Bio::DB::Biblio::eutils module supposedly allows PMC  
queries.  I'm also pretty sure that PubMedXML != PMC XML, in other  
words the Bio::Biblio XML format parsers may not work on PMC XML.

chris

On Apr 29, 2007, at 5:57 PM, Bernd Mueller wrote:

> Hello list,
>
> I figured out my problem. Actually it was because of problems in the
> versioning of bioperl. It is described to figure out the available
> versions of bioperl in CPAN but afterwards it is described to  
> install a
> much higher version wich is not listed as distribution in CPAN. So it
> works fine now. Thanks anyway. Proficiency in reading results in  
> success ;-)
>
> But I have another question: Does anyone know how to retrieve free
> fulltext documents with EUtilities from Pubmed Central? All my queries
> result in a corpora of free and non-free articles.
>
> Thanks and regards,
>
> Bernd
>
>
> Bernd Mueller wrote:
>> Hi,
>>
>> I followed those instructions on bioperl.org for installing  
>> bioperl via
>> cpan. But actually it is impossible for me to install the bioperl::db
>> module.
>>
>> How does this work?
>>
>> Moreover none of these Birney distribution are installable on my  
>> system.
>> After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always
>> fail. So I have to install the CRAFFI bundle but it does not seem  
>> that
>> Bio::DB module is included in this bundle because my programs  
>> using that
>> module do not work.
>>
>> Help would be appreciated :)
>>
>> Cheers,
>> Bernd
>>
>> Appendix:
>>
>> cpan[6]> d /bioperl/
>> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.tar.gz
>> Distribution    BIRNEY/bioperl-1.4.tar.gz
>> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
>> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
>> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
>> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
>> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
>> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
>> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
>> 12 items found
>>
>>
>
> -- 
> Dipl.-Inform.(FH)
> Bernd Mueller
> phone: +49 179 2336692
> email: bernd at kirx.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From allenday at ucla.edu  Sun Apr 29 20:57:55 2007
From: allenday at ucla.edu (Allen Day)
Date: Sun, 29 Apr 2007 17:57:55 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
Message-ID: <5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>

Doesn't matter to me if it stays or not.  If you're cleaning house
feel free to get rid of it.

-Allen

On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Quick response!  Yep, I've run into this with a few publishers.
> Though they're supposed to have 'permanent' links for those of us who
> like to link to our pubs they frequently change (scary if that's
> their definition of permanent).
>
> Did you want us to remove the code from CVS?
>
> chris
>
> On Apr 29, 2007, at 7:21 PM, Allen Day wrote:
>
> > Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
> > ago.  I only implemented for a few journals, so it never worked for a
> > large fraction of publications.  Probably it barely works or does not
> > work at all now b/c of how the PDF are scraped out of the HTML.  The
> > publisher sites are always modifying their HTML, presumably trying to
> > prevent automated download like this.
> >
> > -Allen
> >
> > On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> Allen (or anyone),
> >>
> >> What is the status of this module?  It requires a module not listed
> >> in the dependencies (WWW:Mechanize) and has no tests.
> >>
> >> chris
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Mon Apr 30 11:15:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Apr 2007 10:15:16 -0500
Subject: [Bioperl-l] PMC and EUtilities, was  bioperl::db
In-Reply-To: <4635B1BD.9030402@kirx.de>
References: <46335BD7.8040306@kirx.de> <463522F1.2010406@kirx.de>
	<01DC1D72-8AFA-4C11-8795-D4787506C602@uiuc.edu>
	<4635B1BD.9030402@kirx.de>
Message-ID: <D11CE380-EDEC-4F7F-80EA-09D915EA79F0@uiuc.edu>

Bernd,

As a pretext to this discussion, I am in the middle of refactoring  
EUtilities; the next incarnation should have a similar API but will  
likely set parameters via simpler methods (no need for all the getter/ 
setters).

You'll likely have to parse out the tags yourself, AFAIK there is no  
BioPerl XML parser for PMC XML and a quick grep search turns up  
nothing but PubMed parsers.  If you aren't familiar with XML parsing  
you could try XML::Simple to get at what you want.  I would pass the  
XML in as small chunks (maybe by retrieving them in batches of 100 or  
less) and initially use Data::Dumper to determine the data structure  
XML::Simple returns (PMC XML has attributes and elements, so the  
structure will be more complex).  Then just iterate through articles  
and grab what you want.

I think the predominant portion of articles in PubMed Central are  
free full-text access (if not all):

http://www.pubmedcentral.nih.gov/about/faq.html#q9

You can retrieve them via ftp:

ftp://ftp.ncbi.nlm.nih.gov/pub/pmc

which contains an index file of all articles and their dir. location  
(the readme gives more info).

chris

On Apr 30, 2007, at 4:07 AM, Bernd Mueller wrote:

> Hello,
>
> I think so. The ids from my wanted articles are retrieved by  
> Bio::DB::EUtilities::esearch. Afterwards I download the articles  
> with Bio::DB::EUtilities::efetch. It is only possible to download  
> in XML format from PMC. So post processing is actually needed  
> because I want the articles in plain format.
>
> But I don't know why I have results of non-free articles, i.e.  
> abstracts where full articles should be found with a query  
> constraining to only free fulltext. In the query I limit the search  
> with the filter "AND free fulltext[filter]".Probably this is a  
> matter concerning not directly bioperl but the eutilities interface  
> of PMC.
>
> Regards,
> Bernd


From allenday at ucla.edu  Mon Apr 30 12:44:12 2007
From: allenday at ucla.edu (Allen Day)
Date: Mon, 30 Apr 2007 09:44:12 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <4635FDD8.8030704@jouy.inra.fr>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
	<5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>
	<4635FDD8.8030704@jouy.inra.fr>
Message-ID: <5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>

DOI is definitely the right way to do this.  It wasn't implemented
widely at the time I wrote this module.

-Allen

On 4/30/07, St?phane T?letch?a <steletch at jouy.inra.fr> wrote:
> Allen Day a ?crit :
> > Doesn't matter to me if it stays or not.  If you're cleaning house
> > feel free to get rid of it.
> >
> > -Allen
> >
>
> I've worked on something on the other way around: get information about
> a pdf from the DOI if present. Most recent publications do have a doi,
> and i use this as a target for my request.
>
> This does not solve the problem, but may help others, feel free to ask
> if it can help the ongoing work, the code is quite dirty ...
>
> St?phane
>
>
> --
> St?phane T?letch?a, PhD.                  http://www.steletch.org
> Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig
> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>


From cjfields at uiuc.edu  Mon Apr 30 13:55:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Apr 2007 12:55:01 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
	<5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>
	<4635FDD8.8030704@jouy.inra.fr>
	<5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>
Message-ID: <34F19F02-1B7B-41A1-90B1-F373C49BC012@uiuc.edu>

Agreed; even some seq. records may have DOI now.  PubMed and PMC XML  
contain this, so it is possible to parse the DOI out if one were  
inclined to incorporate this into Bio::Biblio (I added a doi() getter/ 
setter into Bio::Annotation::Reference a few months back).

chris

On Apr 30, 2007, at 11:44 AM, Allen Day wrote:

> DOI is definitely the right way to do this.  It wasn't implemented
> widely at the time I wrote this module.
>
> -Allen
>
> On 4/30/07, St?phane T?letch?a <steletch at jouy.inra.fr> wrote:
>> Allen Day a ?crit :
>>> Doesn't matter to me if it stays or not.  If you're cleaning house
>>> feel free to get rid of it.
>>>
>>> -Allen
>>>
>>
>> I've worked on something on the other way around: get information  
>> about
>> a pdf from the DOI if present. Most recent publications do have a  
>> doi,
>> and i use this as a target for my request.
>>
>> This does not solve the problem, but may help others, feel free to  
>> ask
>> if it can help the ongoing work, the code is quite dirty ...
>>
>> St?phane
>>
>>
>> --
>> St?phane T?letch?a, PhD.                  http://www.steletch.org
>> Unit? Math?matique Informatique et G?nome http:// 
>> migale.jouy.inra.fr/mig
>> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
>> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr 30 16:05:45 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 30 Apr 2007 13:05:45 -0700 (PDT)
Subject: [Bioperl-l] generate a fasta file from the blast report
Message-ID: <10259461.post@talk.nabble.com>


hi all,
if i have the following script working on my blast report, can anyone plz
tell me how can i generate a fasta format file of just the hits (subject)
sequence.
thanks alot.
 
use strict;
 use Bio::SearchIO;
   
    my $in = new Bio::SearchIO(-format => 'blast', 
                               -file   => 'report.bls');
    while( my $result = $in->next_result ) {
      while( my $hit = $result->next_hit ) {
        while( my $hsp = $hit->next_hsp ) {
          if( $hsp->length('total') > 100 &&
              $hsp->percent_identity >= 75 ) {
              print "Hit= ", $hit->name, 
                    ", len=",$hsp->length('total'), 
                    ", percent_id=", $hsp->percent_identity, "\n";
          }
        }  
      }
    }
-- 
View this message in context: http://www.nabble.com/generate-a-fasta-file-from-the-blast-report-tf3671549.html#a10259461
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From Francoise.LECOMTE at biogemma.com  Mon Apr 30 06:35:03 2007
From: Francoise.LECOMTE at biogemma.com (Francoise.LECOMTE at biogemma.com)
Date: Mon, 30 Apr 2007 12:35:03 +0200
Subject: [Bioperl-l] Pb makefile
Message-ID: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>

Hi
I try to install biopoerl1.4 on Tru64 plateform and I've got a message 
"make:line too long" when I run the command make install
How can I solve it ? How disable man pages installaton in Makefile.PL if 
it can sove this problem 

Best regards 

Fran?oise Lecomte 


From torsten.seemann at infotech.monash.edu.au  Mon Apr 30 20:22:35 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Tue, 1 May 2007 10:22:35 +1000
Subject: [Bioperl-l] generate a fasta file from the blast report
In-Reply-To: <10259461.post@talk.nabble.com>
References: <10259461.post@talk.nabble.com>
Message-ID: <a79f6a4b0704301722s6b20c216if262ea9747f7d03f@mail.gmail.com>

> if i have the following script working on my blast report, can anyone plz
> tell me how can i generate a fasta format file of just the hits (subject)
> sequence.

Do you want the WHOLE subject sequence, or just the region that hit the query?

The hit is available as $hsp->hit_string();
http://doc.bioperl.org/bioperl-live/Bio/Search/HSP/GenericHSP.html#CODE11

The whole subject sequence would require the original Fasta input file.

By the way, are your questions for work related issues, or is this
your homework or assignment for a study course?

--Torsten


From dmessina at wustl.edu  Mon Apr  2 02:54:58 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 1 Apr 2007 21:54:58 -0500
Subject: [Bioperl-l] installation bioperl
Message-ID: <6EFFF13A-66E7-418F-8B8E-A8AA8826DE83@wustl.edu>

We need more information to be able to help you. Could you please  
show us the actual output you see when trying to install Bioperl?

Also, we need to know:

- what operating system you have
- what version of Bioperl you are trying to install

See

http://www.bioperl.org/wiki/HOWTO:Beginners#Getting_Assistance

and please read the rest of the document, too.

Dave


From aharry2001 at yahoo.com  Mon Apr  2 10:09:25 2007
From: aharry2001 at yahoo.com (Ambrose)
Date: Mon, 2 Apr 2007 03:09:25 -0700 (PDT)
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <B04E1B58-9BE1-407A-91D2-6EA9C0BA2A38@uiuc.edu>
Message-ID: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>

Hello All,
             I have some problems parsing KEGG using bioperl. I get out of memory problem.I current have 1G RAM.Can some tell me why this is happening and how it can be solved.It is beacuse the objects passed to bioiperl are so big or what?

best regrads
Ambrose

 
---------------------------------
TV dinner still cooling?
Check out "Tonight's Picks" on Yahoo! TV.


From cjfields at uiuc.edu  Mon Apr  2 12:43:18 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 07:43:18 -0500
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>
References: <20070402100925.40498.qmail@web52001.mail.re2.yahoo.com>
Message-ID: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu>

This doesn't really explain much beyond stating you are having  
problems.  You need to post some code (to the mail list!) and let us  
know what version of BioPerl you are using.

chris

On Apr 2, 2007, at 5:09 AM, Ambrose wrote:

> Hello All,
>              I have some problems parsing KEGG using bioperl. I get  
> out of memory problem.I current have 1G RAM.Can some tell me why  
> this is happening and how it can be solved.It is beacuse the  
> objects passed to bioiperl are so big or what?
>
> best regrads
> Ambrose
>
>
> ---------------------------------
> TV dinner still cooling?
> Check out "Tonight's Picks" on Yahoo! TV.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From aharry2001 at yahoo.com  Mon Apr  2 13:56:33 2007
From: aharry2001 at yahoo.com (Ambrose)
Date: Mon, 2 Apr 2007 06:56:33 -0700 (PDT)
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <7259B658-A58D-4F97-B90B-E23D3C924D3F@uiuc.edu>
Message-ID: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>


Hello ALL,

I have the code below,which parses my kegg files.A host of the files are parsed and the information is inserted into my databases but unfortunate after the program runs for some hours it stops showing the message out of memory.I assume that this happens because the bioperl object is too big.Please just check the code below

best regards Ambrose


#!/usr/local/ActivePerl/bin/perl
#
#

use strict;
use Bio::SeqIO;
use Bio::FASTASequence;
use DBI;
use Benchmark  qw(:all) ;

my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1,$pathway_id1,$pathway_name1,$ec_num);
my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,%dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
my( @kg_id);
my $db="gbdb";
my $host="localhost";
my $userid="root";
my $passwd="ubuntu";
my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/mysqld.sock";
my ($t1,$t2);
my $dbh = DBI->connect($connectionInfo,$userid,$passwd);
my $time_used;
 
 
 eval { $dbh->do("DROP TABLE kegginfo") };
 print "Dropping kegginfo failed: $@\n" if $@;
 $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL AUTO_INCREMENT,
                                   up_id INT UNSIGNED REFERENCES uniprotinfo(up_id),
                                                                  filename VARCHAR(50),
                                                    kegg_id VARCHAR(50),
                                   keggaccn VARCHAR(50),
                                                                  description VARCHAR(250),
                                   ec_numbers VARCHAR(250),
                                              pathway_id VARCHAR(250),
                                              pathway_name VARCHAR(250),
                                              crc64 VARCHAR(50),
                                   ko_id VARCHAR(50),
                                   pfam_id VARCHAR(50),
                                   ncbigi_id VARCHAR(50),
                                   ncbigeneid_id VARCHAR(50),
                                   uniprot_id VARCHAR(50),
                                   prosite_id VARCHAR(50),
                                   PRIMARY KEY (kg_id)
                                 )");
                                 

eval { $dbh->do("DROP TABLE keggntsequence") };
print "Dropping keggntsequence failed: $@\n" if $@;
$dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id),
                                                    keggaccn VARCHAR(50),
                                  nucleotidesequence text
                                 )");

eval { $dbh->do("DROP TABLE keggaasequence") };
print "Dropping keggaasequence failed: $@\n" if $@;
$dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED REFERENCES uniprotinfo(kg_id),
                                                    keggaccn VARCHAR(50),
                                                    crc64 VARCHAR(50),
                                  aminoacidsequence text
                                 )");
eval { $dbh->do("DROP TABLE timestable") };
print "Dropping timestable failed: $@\n" if $@;
$dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT NULL AUTO_INCREMENT,
                                   genome VARCHAR(100),
                                    totaltime_seconds int(100),
                                                                  PRIMARY KEY(aut_id))");


open (LIST, "genomes.list") || die "Cannot open input kegg genomes file genomes.list\n $! \n";
$t1=new Benchmark;
my @genomelist = ();
while (my $line=<LIST>) {
    #ignore comment lines
    if ($line !~ /^#/) {
        chomp $line;
                
        push (@genomelist, $line); #store the filename
    }
}

close LIST;
my $count=0;
foreach my $genomefile (@genomelist) {

    #in case the user fails to remove some strange files from
    #the genomes.list file.. check for the KEGG format
    my $check=checkKeggFormat($genomefile);
    if ($check==0) {
        #if file is not kegg, start with the next one...
        print "ERROR: $genomefile doesn't look like a KEGG file to me! \n";
        #<stdin>;
        next;
    }
#print $genomefile,"\n";
    my $stream = Bio::SeqIO->new(-file => $genomefile, -format => 'KEGG');

    while ( my $seq = $stream->next_seq() ) {

        my $primary_id = $seq->primary_id;
        my $display_id = $seq->display_id; #name
        my $keggaccn   = $seq->accession; #accn
        my @description = $seq->annotation->get_Annotations('description');
        
        my @dblinks     = $seq->annotation->get_Annotations('dblink');
        my @orthologs   = $seq->annotation->get_Annotations('ortholog');
        my @orthologs   = grep {$_->database eq 'KO'} $seq->annotation->get_Annotations('dblink');
        my @class       = $seq->annotation->get_Annotations('pathway');
         $ntseq{$keggaccn} = $seq->seq;
         $aaseq{$keggaccn} = $seq->translate->seq; 
         $aaseq{$keggaccn} =~s /\*$//;
                 my $fasta = ">".$count."\n".$aaseq{$keggaccn};
         my $newseq = Bio::FASTASequence->new($fasta);
         $crc64{$keggaccn}=$newseq->getCrc64();
#print $keggaccn,"crc64:$crc64{$keggaccn}\n";
        
        $count++;
        if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no keggaccn\n";
        next;}    

        if(@dblinks)
        {
                my @dblink_KO=();
                my @dblink_Pfam=();
                my @dblink_PROSITE=();
                my @dblink_NCBIGI=();
                my @dblink_NCBIGENEID=();
                my @dblink_UniProt=();
        
                foreach my $ele (@dblinks) {
                    if ($ele =~ /^KO:/){
                        $ele=~s/KO://;
                        push (@dblink_KO,$ele);
                        $dblink_KO{$keggaccn}=$ele;
                        next;
                    }
                        #parse Pfam: dblink
                    if ($ele =~ /^Pfam:/){
                        $ele=~s/Pfam://;
                        push (@dblink_Pfam,$ele);
                        $dblink_Pfam{$keggaccn}=$ele;
                        next;
                    }
                        #parse PROSITE: dblink
                    if ($ele =~ /^PROSITE:/){
                        $ele=~s/PROSITE://;
                        push (@dblink_PROSITE,$ele);
                        $dblink_PROSITE{$keggaccn}=$ele;
                        next;
                    }
                        #parse NCBI-GI: dblink
                    if ($ele =~ /^NCBI-GI:/){
                        $ele=~s/NCBI-GI://;
                        push (@dblink_NCBIGI,$ele);
                        $dblink_NCBIGI{$keggaccn}=$ele;
                        next;
                    }
                        #parse NCBI-GeneID: dblink
                    if ($ele =~ /^NCBI-GeneID:/){
                        $ele=~s/NCBI-GeneID://;
                        push (@dblink_NCBIGENEID,$ele);
                        $dblink_NCBIGENEID{$keggaccn}=$ele;
                        next;
                        }
                        #parse UniProt: dblink
                    if ($ele =~ /^UniProt:/){
                        $ele=~s/UniProt://;
                        push (@dblink_UniProt,$ele);
                        $dblink_UniProt{$keggaccn}=$ele;
                        next;
                    }
            
                }#end foreach     #finished parsing all dblinks    
        }#end if @dblinks
        if(@class)
        {
            foreach my $pathway (@class) {
    
                $pathway=~s/^\s+|\s+$//;
                my @arr = split (/\s+/,$pathway);
                my $pathway_id = $arr[0];
                shift @arr;
                my $pathway_name = join(" ", at arr);
                $pathway_name{$keggaccn}=$pathway_name;
                $pathway_id{$keggaccn}=$pathway_id;
                #print $pathway_id{$keggaccn},"\t",$pathway_name{$keggaccn},"\n";
                                    
            }
            
        }
        
        my @ecnumbers=();
        @ecnumbers = extractECnumbers(@description);
        if(@ecnumbers)
        {
                if (@ecnumbers!=0) 
                {
                    foreach my $ecn (@ecnumbers) 
                    {
                       $ecnumbers{$keggaccn}=$ecn;
                    }#end foreach
                }
                else {
                    #print "ECnumbers:\n";
                     }
        }
        
        
#         print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t",$dblink_NCBIGENEID{$keggaccn},
#                 "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers{$keggaccn}","\t",
#                  "p1:$pathway_id{$keggaccn}","\t","p2:$pathway_name{$keggaccn}","\n";
#         
                $dbh->do("INSERT INTO kegginfo VALUES (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)",
         undef,"NULL","NULL",$genomefile,$display_id,$keggaccn, at description,$ecnumbers{$keggaccn},
                  $pathway_id{$keggaccn},$pathway_name{$keggaccn},$crc64{$keggaccn},$dblink_KO{$keggaccn},
                 $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn},$dblink_NCBIGENEID{$keggaccn},
                 $dblink_UniProt{$keggaccn},$dblink_PROSITE{$keggaccn});
         

        $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)",
            undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn});
                        

        $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)",
            undef,"",$keggaccn,$ntseq{$keggaccn});
                
               
    }
     $t2=new Benchmark;
    $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile");
    $dbh->do("INSERT INTO timestable VALUES (?,?,?)",
    undef,"NULL",$genomefile,$time_used);
 
}


$dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)");
print "Index created on kegginfo\n";

$dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence (kg_id,keggaccn)");
print "Index created on keggaasequence\n";

$dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence (kg_id,keggaccn)");
print "Index created on keggntsequence\n";


print"Updating the tables................\n";

    
$dbh->do("update kegginfo,keggaasequence set keggaasequence.kg_id=kegginfo.kg_id 
         where 
                kegginfo.keggaccn=keggaasequence.keggaccn");
        print " keggaasequence kg_id\n";

$dbh->do("update kegginfo,keggntsequence set keggntsequence.kg_id=kegginfo.kg_id 
         where 
                kegginfo.keggaccn=keggntsequence.keggaccn");
        print " keggaasequence kg_id\n";


sub extractECnumbers ($) {
    #sample description lines
     #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 2.7.7.2]
    #ATP synthase F0 subunit c [EC:3.6.3.14]

    my @desc=shift;
    my $description = join ("", at desc);
    my @ecnumbers=();
    #print "parsing ec for $description..\n";
    #check if EC number exists
    if ($description=~/\[EC:/) {
        
        my @array = split (/\[EC:/,$description);
        $array[1]=~s/]//g;
        shift @array; #remove the annotation , only EC numbers remain
        foreach my $ele (@array) {
            $ele=~s/^\s+|\s+$//g;
            $ele= "EC:".$ele;
            push (@ecnumbers,$ele);
        }    
        return @ecnumbers;
    }
    else {
        #return an empty value
        return ;

    }

}


sub checkKeggFormat ($) {
=head2

checkKeggFormat

make sure that the file is a valid KEGG file
function checks the first two lines,
1st must start with ENTRY
2nd must start with DEFINITION

returns 0 or 1

=cut
    my $genomefile=shift;

    open (TEST,$genomefile) || die "Cannot open file $genomefile for reading \n";
    my $testline=<TEST>;
#print "$testline\n";
    if ($testline=~/^ENTRY/) {
        #continue
        #$testline=<TEST>;#double check
        #if ($testline=~/^NAME/) {
            #this looks like a valid kegg file
            return 1;
        #}
        #else {
        #    close TEST;
        #    return 0;
        #}
    }
    else {
        close TEST;    
        return 0;
    }

}

sub timeThis ($$$) 
{
    my ($start,$end,$message) = @_;
    my $td = timediff($end, $start);
    my $t = timestr($td);    
        print "$message : ",$t,"\n";
        my @array = split (/\s+/,$t);
#20 wallclock secs (14.23 usr +  0.84 sys = 15.07 CPU)
        return $array[0]; #return the no. of seconds.
}

   
---------------------------------
Looking for earth-friendly autos? 
 Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.  


From e-just at northwestern.edu  Mon Apr  2 14:12:33 2007
From: e-just at northwestern.edu (Eric Just)
Date: Mon, 2 Apr 2007 09:12:33 -0500
Subject: [Bioperl-l] Can't locate object method "seq_start" via package
	"Bio::DB::GenBank"
Message-ID: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>

Hello,

I am getting this error while running a bioperl script that I had been using
in bioperl 1.4.  On upgradeing to bioperl 1.5.2 I get the following fatal
error

Can't locate object method "seq_start" via package "Bio::DB::GenBank"

My script is as follows:


use Bio::DB::GenBank;
use Bio::DB::Query::GenBank;

my $gb = new Bio::DB::GenBank();

my $query = Bio::DB::Query::GenBank->new(
      -query   =>'txid44689[Organism:noexp]',
      -reldate => 60,
      -db      => 'nucleotide'

);

my $in = $gb->get_Stream_by_query($query);

while ( my $seq = $in->next_seq()) {
      print "do something";
      #....
}


I noticed that seq_start is created in the begin block of
Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have
expericence troubleshooting this kind of autoloaded method.  Any idea where
to start?

Thanks

Eric


From e-just at northwestern.edu  Mon Apr  2 14:15:28 2007
From: e-just at northwestern.edu (Eric Just)
Date: Mon, 2 Apr 2007 09:15:28 -0500
Subject: [Bioperl-l] Can't locate object method "seq_start" via package
	"Bio::DB::GenBank"
In-Reply-To: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>
References: <fa1fe35c0704020712tbf3c62aw1f15551fbb4afb60@mail.gmail.com>
Message-ID: <fa1fe35c0704020715u1f14f273n100d4e21f848603d@mail.gmail.com>

Sorry about that.

As soon as I sent the email I found my problem ( an old NCBIHelper in my
inheritance path ).   There is no bug here.

Eric


On 4/2/07, Eric Just <e-just at northwestern.edu> wrote:
>
> Hello,
>
> I am getting this error while running a bioperl script that I had been
> using in bioperl 1.4.  On upgradeing to bioperl 1.5.2 I get the following
> fatal error
>
> Can't locate object method "seq_start" via package "Bio::DB::GenBank"
>
> My script is as follows:
>
>
> use Bio::DB::GenBank;
> use Bio::DB::Query::GenBank;
>
> my $gb = new Bio::DB::GenBank();
>
> my $query = Bio::DB::Query::GenBank->new(
>       -query   =>'txid44689[Organism:noexp]',
>       -reldate => 60,
>       -db      => 'nucleotide'
>
> );
>
> my $in = $gb->get_Stream_by_query($query);
>
> while ( my $seq = $in->next_seq()) {
>       print "do something";
>       #....
> }
>
>
>
> I noticed that seq_start is created in the begin block of
> Bio::DB::NCBIHelper (inherited by Bio::DB::GenBank), but I do not have
> expericence troubleshooting this kind of autoloaded method.  Any idea where
> to start?
>
> Thanks
>
> Eric
>


From cjfields at uiuc.edu  Mon Apr  2 15:32:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 10:32:59 -0500
Subject: [Bioperl-l] bioperl and kegg(out of memory problem )
In-Reply-To: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>
References: <20070402135633.85882.qmail@web52002.mail.re2.yahoo.com>
Message-ID: <38475C93-FB21-4BC4-BF5D-7F48493E8EE2@uiuc.edu>

Ambrose,

Data is persisting in your hashes (in particular DBLink objects),  
which is eating away at your memory.  If I take a sample KEGG gene  
file and simply parse it:

while (my $seq = $io->next_seq) {
     print $seq->accession,"\n";
}

there are no memory issues, but if I store the data in hashes  
declared outside the loop:

my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);

while (my $seq = $io->next_seq) {
     # store Bio::Seq data in hashes
}

I see problems with only one genome file with KEGG records.  You'll  
definitely run into memory issues if you are parsing many genome  
files, which you appear to be:

my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
dblink_NCBIGENEID,%dblink_UniProt);
my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);

for my $genomefile (@genomelist) {
     while (my $seq = $io->next_seq) {
         # store Bio::Seq data in hashes
     }
}

Localizing the hashes to the genome or sequence loops should prevent  
the memory problem.

Note that the DBLink Annotation objects are overloaded so they act  
like a string ($ele =~ /^KO:/) but are actually  
Bio::Annotation::DBLink objects, something we will likely get rid of  
in the near future.

chris

On Apr 2, 2007, at 8:56 AM, Ambrose wrote:

>
>
> Hello ALL,
>
> I have the code below,which parses my kegg files.A host of the  
> files are parsed and the information is inserted into my databases  
> but unfortunate after the program runs for some hours it stops  
> showing the message out of memory.I assume that this happens  
> because the bioperl object is too big.Please just check the code below
>
> best regards Ambrose
>
>
> #!/usr/local/ActivePerl/bin/perl
> #
> #
>
> use strict;
> use Bio::SeqIO;
> use Bio::FASTASequence;
> use DBI;
> use Benchmark  qw(:all) ;
>
> my($ko,$prosite,$ncbigi,$ncbigeneid,$pfam,$uniprot,$ecn1, 
> $pathway_id1,$pathway_name1,$ec_num);
> my(%dblink_KO,%dblink_Pfam,%dblink_PROSITE,%dblink_NCBIGI,% 
> dblink_NCBIGENEID,%dblink_UniProt);
> my(%pathway_name,%pathway_id,%ecnumbers,%crc64,%ntseq,%aaseq);
> my( @kg_id);
> my $db="gbdb";
> my $host="localhost";
> my $userid="root";
> my $passwd="ubuntu";
> my $connectionInfo="dbi:mysql:$db;"."mysql_socket=/var/run/mysqld/ 
> mysqld.sock";
> my ($t1,$t2);
> my $dbh = DBI->connect($connectionInfo,$userid,$passwd);
> my $time_used;
>
>
>
>  eval { $dbh->do("DROP TABLE kegginfo") };
>  print "Dropping kegginfo failed: $@\n" if $@;
>  $dbh->do("CREATE TABLE kegginfo (kg_id BIGINT NOT NULL  
> AUTO_INCREMENT,
>                                    up_id INT UNSIGNED REFERENCES  
> uniprotinfo(up_id),
>                                                                    
> filename VARCHAR(50),
>                                                     kegg_id VARCHAR 
> (50),
>                                    keggaccn VARCHAR(50),
>                                                                    
> description VARCHAR(250),
>                                    ec_numbers VARCHAR(250),
>                                               pathway_id VARCHAR(250),
>                                               pathway_name VARCHAR 
> (250),
>                                               crc64 VARCHAR(50),
>                                    ko_id VARCHAR(50),
>                                    pfam_id VARCHAR(50),
>                                    ncbigi_id VARCHAR(50),
>                                    ncbigeneid_id VARCHAR(50),
>                                    uniprot_id VARCHAR(50),
>                                    prosite_id VARCHAR(50),
>                                    PRIMARY KEY (kg_id)
>                                  )");
>
>
> eval { $dbh->do("DROP TABLE keggntsequence") };
> print "Dropping keggntsequence failed: $@\n" if $@;
> $dbh->do("CREATE TABLE keggntsequence (kg_id BIGINT(15) UNSIGNED  
> REFERENCES uniprotinfo(kg_id),
>                                                     keggaccn VARCHAR 
> (50),
>                                   nucleotidesequence text
>                                  )");
>
> eval { $dbh->do("DROP TABLE keggaasequence") };
> print "Dropping keggaasequence failed: $@\n" if $@;
> $dbh->do("CREATE TABLE keggaasequence (kg_id BIGINT(15) UNSIGNED  
> REFERENCES uniprotinfo(kg_id),
>                                                     keggaccn VARCHAR 
> (50),
>                                                     crc64 VARCHAR(50),
>                                   aminoacidsequence text
>                                  )");
> eval { $dbh->do("DROP TABLE timestable") };
> print "Dropping timestable failed: $@\n" if $@;
> $dbh->do("CREATE TABLE timestable (aut_id BIGINT(15) UNSIGNED NOT  
> NULL AUTO_INCREMENT,
>                                    genome VARCHAR(100),
>                                     totaltime_seconds int(100),
>                                                                    
> PRIMARY KEY(aut_id))");
>
>
>
> open (LIST, "genomes.list") || die "Cannot open input kegg genomes  
> file genomes.list\n $! \n";
> $t1=new Benchmark;
> my @genomelist = ();
> while (my $line=<LIST>) {
>     #ignore comment lines
>     if ($line !~ /^#/) {
>         chomp $line;
>
>         push (@genomelist, $line); #store the filename
>     }
> }
>
> close LIST;
> my $count=0;
> foreach my $genomefile (@genomelist) {
>
>     #in case the user fails to remove some strange files from
>     #the genomes.list file.. check for the KEGG format
>     my $check=checkKeggFormat($genomefile);
>     if ($check==0) {
>         #if file is not kegg, start with the next one...
>         print "ERROR: $genomefile doesn't look like a KEGG file to  
> me! \n";
>         #<stdin>;
>         next;
>     }
> #print $genomefile,"\n";
>     my $stream = Bio::SeqIO->new(-file => $genomefile, -format =>  
> 'KEGG');
>
>     while ( my $seq = $stream->next_seq() ) {
>
>         my $primary_id = $seq->primary_id;
>         my $display_id = $seq->display_id; #name
>         my $keggaccn   = $seq->accession; #accn
>         my @description = $seq->annotation->get_Annotations 
> ('description');
>
>         my @dblinks     = $seq->annotation->get_Annotations('dblink');
>         my @orthologs   = $seq->annotation->get_Annotations 
> ('ortholog');
>         my @orthologs   = grep {$_->database eq 'KO'} $seq- 
> >annotation->get_Annotations('dblink');
>         my @class       = $seq->annotation->get_Annotations 
> ('pathway');
>          $ntseq{$keggaccn} = $seq->seq;
>          $aaseq{$keggaccn} = $seq->translate->seq;
>          $aaseq{$keggaccn} =~s /\*$//;
>                  my $fasta = ">".$count."\n".$aaseq{$keggaccn};
>          my $newseq = Bio::FASTASequence->new($fasta);
>          $crc64{$keggaccn}=$newseq->getCrc64();
> #print $keggaccn,"crc64:$crc64{$keggaccn}\n";
>
>         $count++;
>         if ($keggaccn eq "") { print "PRIMARY KEY NOT FOUND no  
> keggaccn\n";
>         next;}
>
>         if(@dblinks)
>         {
>                 my @dblink_KO=();
>                 my @dblink_Pfam=();
>                 my @dblink_PROSITE=();
>                 my @dblink_NCBIGI=();
>                 my @dblink_NCBIGENEID=();
>                 my @dblink_UniProt=();
>
>                 foreach my $ele (@dblinks) {
>                     if ($ele =~ /^KO:/){
>                         $ele=~s/KO://;
>                         push (@dblink_KO,$ele);
>                         $dblink_KO{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse Pfam: dblink
>                     if ($ele =~ /^Pfam:/){
>                         $ele=~s/Pfam://;
>                         push (@dblink_Pfam,$ele);
>                         $dblink_Pfam{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse PROSITE: dblink
>                     if ($ele =~ /^PROSITE:/){
>                         $ele=~s/PROSITE://;
>                         push (@dblink_PROSITE,$ele);
>                         $dblink_PROSITE{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse NCBI-GI: dblink
>                     if ($ele =~ /^NCBI-GI:/){
>                         $ele=~s/NCBI-GI://;
>                         push (@dblink_NCBIGI,$ele);
>                         $dblink_NCBIGI{$keggaccn}=$ele;
>                         next;
>                     }
>                         #parse NCBI-GeneID: dblink
>                     if ($ele =~ /^NCBI-GeneID:/){
>                         $ele=~s/NCBI-GeneID://;
>                         push (@dblink_NCBIGENEID,$ele);
>                         $dblink_NCBIGENEID{$keggaccn}=$ele;
>                         next;
>                         }
>                         #parse UniProt: dblink
>                     if ($ele =~ /^UniProt:/){
>                         $ele=~s/UniProt://;
>                         push (@dblink_UniProt,$ele);
>                         $dblink_UniProt{$keggaccn}=$ele;
>                         next;
>                     }
>
>                 }#end foreach     #finished parsing all dblinks
>         }#end if @dblinks
>         if(@class)
>         {
>             foreach my $pathway (@class) {
>
>                 $pathway=~s/^\s+|\s+$//;
>                 my @arr = split (/\s+/,$pathway);
>                 my $pathway_id = $arr[0];
>                 shift @arr;
>                 my $pathway_name = join(" ", at arr);
>                 $pathway_name{$keggaccn}=$pathway_name;
>                 $pathway_id{$keggaccn}=$pathway_id;
>                 #print $pathway_id{$keggaccn},"\t",$pathway_name 
> {$keggaccn},"\n";
>
>             }
>
>         }
>
>         my @ecnumbers=();
>         @ecnumbers = extractECnumbers(@description);
>         if(@ecnumbers)
>         {
>                 if (@ecnumbers!=0)
>                 {
>                     foreach my $ecn (@ecnumbers)
>                     {
>                        $ecnumbers{$keggaccn}=$ecn;
>                     }#end foreach
>                 }
>                 else {
>                     #print "ECnumbers:\n";
>                      }
>         }
>
>
> #         print $keggaccn,"\t",$dblink_UniProt{$keggaccn},"\t", 
> $dblink_NCBIGENEID{$keggaccn},
> #                 "\t",$dblink_NCBIGI{$keggaccn},"\t","ec:$ecnumbers 
> {$keggaccn}","\t",
> #                  "p1:$pathway_id{$keggaccn}","\t","p2: 
> $pathway_name{$keggaccn}","\n";
> #
>                 $dbh->do("INSERT INTO kegginfo VALUES  
> (?,?, ?, ?, ?, ?,?,?,?,?,?,?,?,?,?,?)",
>          undef,"NULL","NULL",$genomefile,$display_id, 
> $keggaccn, at description,$ecnumbers{$keggaccn},
>                   $pathway_id{$keggaccn},$pathway_name{$keggaccn}, 
> $crc64{$keggaccn},$dblink_KO{$keggaccn},
>                  $dblink_Pfam{$keggaccn},$dblink_NCBIGI{$keggaccn}, 
> $dblink_NCBIGENEID{$keggaccn},
>                  $dblink_UniProt{$keggaccn},$dblink_PROSITE 
> {$keggaccn});
>
>
>         $dbh->do("INSERT INTO keggaasequence VALUES (?,?,?,?)",
>             undef,"",$keggaccn,$crc64{$keggaccn},$aaseq{$keggaccn});
>
>
>         $dbh->do("INSERT INTO keggntsequence VALUES (?,?,?)",
>             undef,"",$keggaccn,$ntseq{$keggaccn});
>
>
>     }
>      $t2=new Benchmark;
>     $time_used=timeThis($t1,$t2,"Finished parsing file $genomefile");
>     $dbh->do("INSERT INTO timestable VALUES (?,?,?)",
>     undef,"NULL",$genomefile,$time_used);
>
> }
>
>
> $dbh->do("CREATE INDEX keggIindex ON kegginfo (kg_id,keggaccn)");
> print "Index created on kegginfo\n";
>
> $dbh->do("CREATE INDEX keggaasequence1 ON keggaasequence  
> (kg_id,keggaccn)");
> print "Index created on keggaasequence\n";
>
> $dbh->do("CREATE INDEX keggntsequence1 ON keggntsequence  
> (kg_id,keggaccn)");
> print "Index created on keggntsequence\n";
>
>
> print"Updating the tables................\n";
>
>
> $dbh->do("update kegginfo,keggaasequence set  
> keggaasequence.kg_id=kegginfo.kg_id
>          where
>                 kegginfo.keggaccn=keggaasequence.keggaccn");
>         print " keggaasequence kg_id\n";
>
> $dbh->do("update kegginfo,keggntsequence set  
> keggntsequence.kg_id=kegginfo.kg_id
>          where
>                 kegginfo.keggaccn=keggntsequence.keggaccn");
>         print " keggaasequence kg_id\n";
>
>
>
> sub extractECnumbers ($) {
>     #sample description lines
>      #riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26  
> 2.7.7.2]
>     #ATP synthase F0 subunit c [EC:3.6.3.14]
>
>     my @desc=shift;
>     my $description = join ("", at desc);
>     my @ecnumbers=();
>     #print "parsing ec for $description..\n";
>     #check if EC number exists
>     if ($description=~/\[EC:/) {
>
>         my @array = split (/\[EC:/,$description);
>         $array[1]=~s/]//g;
>         shift @array; #remove the annotation , only EC numbers remain
>         foreach my $ele (@array) {
>             $ele=~s/^\s+|\s+$//g;
>             $ele= "EC:".$ele;
>             push (@ecnumbers,$ele);
>         }
>         return @ecnumbers;
>     }
>     else {
>         #return an empty value
>         return ;
>
>     }
>
> }
>
>
>
>
>
>
>
> sub checkKeggFormat ($) {
> =head2
>
> checkKeggFormat
>
> make sure that the file is a valid KEGG file
> function checks the first two lines,
> 1st must start with ENTRY
> 2nd must start with DEFINITION
>
> returns 0 or 1
>
> =cut
>     my $genomefile=shift;
>
>     open (TEST,$genomefile) || die "Cannot open file $genomefile  
> for reading \n";
>     my $testline=<TEST>;
> #print "$testline\n";
>     if ($testline=~/^ENTRY/) {
>         #continue
>         #$testline=<TEST>;#double check
>         #if ($testline=~/^NAME/) {
>             #this looks like a valid kegg file
>             return 1;
>         #}
>         #else {
>         #    close TEST;
>         #    return 0;
>         #}
>     }
>     else {
>         close TEST;
>         return 0;
>     }
>
> }
>
> sub timeThis ($$$)
> {
>     my ($start,$end,$message) = @_;
>     my $td = timediff($end, $start);
>     my $t = timestr($td);
>         print "$message : ",$t,"\n";
>         my @array = split (/\s+/,$t);
> #20 wallclock secs (14.23 usr +  0.84 sys = 15.07 CPU)
>         return $array[0]; #return the no. of seconds.
> }
>
>
>
>
> ---------------------------------
> Looking for earth-friendly autos?
>  Browse Top Cars by "Green Rating" at Yahoo! Autos' Green Center.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dmessina at wustl.edu  Mon Apr  2 16:19:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 2 Apr 2007 11:19:51 -0500
Subject: [Bioperl-l] installation bioperl
Message-ID: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>

Hi Fahmi,

Please include the list on the reply so that others can comment, too.

Yes, it appears the machine you are installing on does not have an  
internet connection. You probably will want to resolve that problem  
before dealing with Bioperl. Alternatively, you could simply install  
and use Bioperl  on the machine which does have an internet connection.

If you really need to get Bioperl installed on that machine, however,  
probably the easiest way would be to find a machine that does have an  
internet connection, install CPAN::Mini, and use it to make a local  
mirror of CPAN. You could then copy that local mirror over to the  
machine without the internet connection and point that machine's cpan  
at the local mirror (read the CPAN documentation to find out how to  
do this). Also, the BioPerl install instructions list several  
external packages that you will need to use some parts of Bioperl  
(e.g. GD). Again, you can download those distributions using the  
machine with the internet connection and copy them over.

Dave


On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote:

> thank you for answer. I will give you the maximum of informations  
> inorder to be able to diagnostic the problem:
>
> i use the linux mandriva 2006
> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained  
> from the url:
> http://www.bioperl.org/wiki/Release_1.5.2
> afetr that i made these commands which i found in the url
> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph  
> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ')
>
> >gunzip bioperl-1.5.2_102.tar.gz
> >tar xvf bioperl-1.5.2_102.tar
> >cd bioperl-1.5.2_102
> after that i made the command
> >perl Build.PL
> i obtained the text
> this package requires Module::Build v0.2805 or greater to install  
> itself
> install Module::Build now from CPAN?[y]
> i pushed enter and i obtained many lines such as
> System call"/usr/bin/wget -0-"ftp://.perl.org/pub/CPAN/modules/ 
> modlist.data.gz">home/fahmi/.cpan/sources/modules/03modlist.data
> Not connected
> cant access URL ftp://ftp.perl.org/CPAN/modules/modlist.data.gz
> ...
> i'm trying to install bioperl whithout having internet connection  
> beacause i don't know whay linux didn't detect my ethernet card.
> please tell me what should i do.
> tahnk you for your collaboration.


From cjfields at uiuc.edu  Mon Apr  2 18:10:30 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 2 Apr 2007 13:10:30 -0500
Subject: [Bioperl-l] Fwd: BLAST beta, URLAPI, and BioPerl (RemoteBlast users)
References: <CD04BF03C87B6240A342461CDE1DEC0304091DB4@NIHCESMLBX8.nih.gov>
Message-ID: <002E7937-10DF-43CE-96F6-71DC743C1314@uiuc.edu>

This may be of interest to anyone using RemoteBlast.

For anyone who uses RemoteBlast, the new changes to NCBI's BLAST  
interface shouldn't affect anything (Scott tested it out).  If there  
are any abnormalities with RemoteBlast queries over the next few  
weeks let us know.

chris

Begin forwarded message:

> From: "Mcginnis, Scott \(NIH/NLM/NCBI\) [E]"  
> <mcginnis at ncbi.nlm.nih.gov>
> Date: April 2, 2007 12:53:33 PM CDT
> To: "Chris Fields" <cjfields at uiuc.edu>
> Subject: RE: BLAST beta, URLAPI, and BioPerl
>
> Hi Chris:
>
> We are ready to make the new pages the defaults come April 16th. An  
> announcement is going out shortly. There are some very minor  
> changes to the URL API and I have listed them below. IT will be  
> part of the announcements. Please note we actually tested BioPerl  
> and it seems to me fine with the new pages. If you have a news on  
> your site or a mailing list you might want to pass this on.
>
> A Note About URLAPI
>
> The new BLAST pages support URLAPI, a protocol that scripts and
> programs use to run BLAST searches and retrieve results over
> HTTP. (For more on URLAPI, see
> http://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html). The following
> information only applies to you if you develop or are responsible
> for software that uses URLAPI.
>
> The new pages have been tested and produce correct results with
> the following URLAPI client programs:
>
> * the BioPERL RemoteBlast module
> * the NCBI demo script http://ncbi.nlm.nih.gov/blast/docs/web_blast.pl
> * various scripts used in-house at NCBI
>
> Users of URLAPI should be aware of the following minor
> changes. In the new interface:
>
> 1. The Request ID (RID) format will be shorter.  The new format
>     is 11 alphanumeric characters (e.g. RDEFEA5012) and will have no
>     internal structure. The previous RID format was 36 or more
>     characters long, including punctuation (e.g.,
>     1175172712-21345-42512597310.BLASTQ3).
>
> 2. BLAST reports will show masked regions as lower-case letters
>     by default (see
>     http://nar.oxfordjournals.org/cgi/content/full/34/suppl_2/W6,
>     figure 2. The current default behavior is to show masked
>     regions as N's or X's. Users may recover the current behavior
>     by adding &MASK_CHAR=0 to the query string for a URLAPI
>     request.
>
> 3. BLAST reports will show alignments for 100 database sequences
>     by default. The current reports show only 50 alignments by
>     default.
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at uiuc.edu]
> Sent: Mon 3/5/2007 11:50 AM
> To: Mcginnis, Scott (NIH/NLM/NCBI) [E]
> Subject: BLAST beta, URLAPI, and BioPerl
>
> The BioPerl project has several have several modules and parsers
> which currently parse XML/text/tabular BLAST output, as well as a
> module which is capable of posting BLAST queries via the URLAPI
> interface.  Will any of the BLAST changes affect these (particularly
> URLAPI)?
>
> Thanks!
>
> chris
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From steletch at jouy.inra.fr  Tue Apr  3 12:28:39 2007
From: steletch at jouy.inra.fr (=?ISO-8859-1?Q?St=E9phane_T=E9letch=E9a?=)
Date: Tue, 03 Apr 2007 14:28:39 +0200
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
Message-ID: <46124877.4020605@jouy.inra.fr>

Alex Lancaster a ?crit :
> Hello bioperl,
> 
> I'm new to the bioperl world, having just started a research position
> in which I need to manage a large bioperl-based codebase.  To this
> end, I'm working on packaging bioperl as an official Fedora Package
> (formerly "Fedora Extras") and I'm currently wading through and
> packaging the long laundry list of Perl dependencies (then I'm going
> to try and do the same for biopython).  You can see my some of my
> progress (including links to the reviews) here:
> 
> http://fedoraproject.org/wiki/AlexLancaster
> 
> Several issues have arisen during the packaging that I hope the
>

Nice, i was on my way to do it :-)
I'm a Mandriva packager and have been kindly "spushed" for maintaining 
the bioperl package for Mandriva.

You can have a look at the work already done by Mandriva at the addresses:
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl/current/
http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-bioperl-run/current/

(Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).

Feel free to contact me if you need more input for dependencies, since 
they are quite a lot.

Cheers,
St?phane

-- 
St?phane T?letch?a, PhD.                  http://www.steletch.org
Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig
INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901


From cjfields at uiuc.edu  Tue Apr  3 14:58:44 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 3 Apr 2007 09:58:44 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <46124877.4020605@jouy.inra.fr>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
Message-ID: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>

Once these are set up we should add a page to the bioperl wiki to  
describe them in more detail (along with Allen's Biopackages).

chris

On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:

> Alex Lancaster a ?crit :
>> Hello bioperl,
>>
>> I'm new to the bioperl world, having just started a research position
>> in which I need to manage a large bioperl-based codebase.  To this
>> end, I'm working on packaging bioperl as an official Fedora Package
>> (formerly "Fedora Extras") and I'm currently wading through and
>> packaging the long laundry list of Perl dependencies (then I'm going
>> to try and do the same for biopython).  You can see my some of my
>> progress (including links to the reviews) here:
>>
>> http://fedoraproject.org/wiki/AlexLancaster
>>
>> Several issues have arisen during the packaging that I hope the
>>
>
> Nice, i was on my way to do it :-)
> I'm a Mandriva packager and have been kindly "spushed" for maintaining
> the bioperl package for Mandriva.
>
> You can have a look at the work already done by Mandriva at the  
> addresses:
> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- 
> bioperl/current/
> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl- 
> bioperl-run/current/
>
> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
>
> Feel free to contact me if you need more input for dependencies, since
> they are quite a lot.
>
> Cheers,
> St?phane
>
> -- 
> St?phane T?letch?a, PhD.                  http://www.steletch.org
> Unit? Math?matique Informatique et G?nome http:// 
> migale.jouy.inra.fr/mig
> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From allenday at gmail.com  Tue Apr  3 17:54:51 2007
From: allenday at gmail.com (Allen Day)
Date: Tue, 3 Apr 2007 10:54:51 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
	<67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
Message-ID: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>

You can link Biopackages now, it's been done for nearly 2 years.

-Allen

On 4/3/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Once these are set up we should add a page to the bioperl wiki to
> describe them in more detail (along with Allen's Biopackages).
>
> chris
>
> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:
>
> > Alex Lancaster a ?crit :
> >> Hello bioperl,
> >>
> >> I'm new to the bioperl world, having just started a research position
> >> in which I need to manage a large bioperl-based codebase.  To this
> >> end, I'm working on packaging bioperl as an official Fedora Package
> >> (formerly "Fedora Extras") and I'm currently wading through and
> >> packaging the long laundry list of Perl dependencies (then I'm going
> >> to try and do the same for biopython).  You can see my some of my
> >> progress (including links to the reviews) here:
> >>
> >> http://fedoraproject.org/wiki/AlexLancaster
> >>
> >> Several issues have arisen during the packaging that I hope the
> >>
> >
> > Nice, i was on my way to do it :-)
> > I'm a Mandriva packager and have been kindly "spushed" for maintaining
> > the bioperl package for Mandriva.
> >
> > You can have a look at the work already done by Mandriva at the
> > addresses:
> > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
> > bioperl/current/
> > http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
> > bioperl-run/current/
> >
> > (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
> >
> > Feel free to contact me if you need more input for dependencies, since
> > they are quite a lot.
> >
> > Cheers,
> > St?phane
> >
> > --
> > St?phane T?letch?a, PhD.                  http://www.steletch.org
> > Unit? Math?matique Informatique et G?nome http://
> > migale.jouy.inra.fr/mig
> > INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> > 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Tue Apr  3 18:11:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 3 Apr 2007 13:11:19 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<46124877.4020605@jouy.inra.fr>
	<67AD2CBC-C1F6-4C04-B9B3-BEAB93A2A4A3@uiuc.edu>
	<5c24dcc30704031054p756bd974ucab98c7283ef7a61@mail.gmail.com>
Message-ID: <0802E2EB-5E94-42D2-9CE1-B82DC103A5D1@uiuc.edu>

I added a small piece on Biopackages to the wiki installation page:

http://www.bioperl.org/wiki/Installing_BioPerl

We can move links to RPM (or similar) installations to their own page  
or section in the INSTALL docs when we have time.

chris

On Apr 3, 2007, at 12:54 PM, Allen Day wrote:

> You can link Biopackages now, it's been done for nearly 2 years.
>
> -Allen
>
> On 4/3/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> Once these are set up we should add a page to the bioperl wiki to
>> describe them in more detail (along with Allen's Biopackages).
>>
>> chris
>>
>> On Apr 3, 2007, at 7:28 AM, St?phane T?letch?a wrote:
>>
>>> Alex Lancaster a ?crit :
>>>> Hello bioperl,
>>>>
>>>> I'm new to the bioperl world, having just started a research  
>>>> position
>>>> in which I need to manage a large bioperl-based codebase.  To this
>>>> end, I'm working on packaging bioperl as an official Fedora Package
>>>> (formerly "Fedora Extras") and I'm currently wading through and
>>>> packaging the long laundry list of Perl dependencies (then I'm  
>>>> going
>>>> to try and do the same for biopython).  You can see my some of my
>>>> progress (including links to the reviews) here:
>>>>
>>>> http://fedoraproject.org/wiki/AlexLancaster
>>>>
>>>> Several issues have arisen during the packaging that I hope the
>>>>
>>>
>>> Nice, i was on my way to do it :-)
>>> I'm a Mandriva packager and have been kindly "spushed" for  
>>> maintaining
>>> the bioperl package for Mandriva.
>>>
>>> You can have a look at the work already done by Mandriva at the
>>> addresses:
>>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
>>> bioperl/current/
>>> http://svn.mandriva.com/cgi-bin/viewvc.cgi/packages/cooker/perl-
>>> bioperl-run/current/
>>>
>>> (Happy users of Mandriva do 'urpmi perl-bioperl, et voil? :-).
>>>
>>> Feel free to contact me if you need more input for dependencies,  
>>> since
>>> they are quite a lot.
>>>
>>> Cheers,
>>> St?phane
>>>
>>> --
>>> St?phane T?letch?a, PhD.                  http://www.steletch.org
>>> Unit? Math?matique Informatique et G?nome http://
>>> migale.jouy.inra.fr/mig
>>> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
>>> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Tue Apr  3 22:18:56 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 03 Apr 2007 23:18:56 +0100
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>	<1175258897.2668.21.camel@localhost.localdomain>	<6d648ierkz.fsf@delpy.biol.berkeley.edu>	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
Message-ID: <4612D2D0.7030202@sendu.me.uk>

Chris Fields wrote:
> On Mar 30, 2007, at 11:02 PM, Allen Day wrote:
> 
>> The majority of the Bioperl classes are file parsers, or manipulate
>> data that comes from the file parsers.  Yes there are exceptions like
>> the Eutils and Ensembl-intefacing classes, but they are the minority.
>> The types of files that are worked with are generally either A)
>> primary data sets such as genome data, or B) derivative data, such as
>> sequence alignments that are derived from primary data using an
>> algorithm.
>>
>> If we're in agreement that the primary data sets and
>> libraries/applications for producing derivative data should not be
>> present in Fedora Extras, then it follows that the Bioperl classes for
>> manipulating these primary and derivative data  should also not be
>> present in Fedora Extras as they are of little use without data to
>> manipulate.
>
> I respectfully disagree.

Likewise, but in a slightly different way: for myself and surely many 
others the primary data used either isn't publicly released or isn't in 
some major database and therefore won't be in any kind of repository. 
That doesn't mean I wouldn't want the parser for my files to be 
somewhere convenient.


From bix at sendu.me.uk  Tue Apr  3 22:09:27 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 03 Apr 2007 23:09:27 +0100
Subject: [Bioperl-l] installation bioperl
In-Reply-To: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>
References: <4CF82AFF-CB24-4939-9F80-9AB907BE5822@wustl.edu>
Message-ID: <4612D097.9060400@sendu.me.uk>

> On Apr 2, 2007, at 9:22 AM, fahmi derbali wrote:
> 
>> thank you for answer. I will give you the maximum of informations  
>> inorder to be able to diagnostic the problem:
>>
>> i use the linux mandriva 2006
>> i'm traying to install bioperl-1.5.2_102.tar.gz which i obtained  
>> from the url:
>> http://www.bioperl.org/wiki/Release_1.5.2
>> afetr that i made these commands which i found in the url
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix (paragraph  
>> INSTALLING BIOPERL THE EASY WAY USING 'Build.PL ')
[snip]
>> i'm trying to install bioperl whithout having internet connection  
>> beacause i don't know whay linux didn't detect my ethernet card.
>> please tell me what should i do.
>> tahnk you for your collaboration.

David's suggestion was a good one, but quite a lot (and possibly all you 
need) of BioPerl is usable just with the bioperl-1.5.2_102.tar.gz file 
you already have.

Just follow the 'hard way' instructions:
http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix#INSTALLING_BIOPERL_MODULES_THE_HARD_WAY

Actually, its not that hard. Just extract the files from the .tat.gz and 
  have your perl lib point at the resulting Bio directory.


From t.r-a_ckright1 at tiscali.co.uk  Wed Apr  4 12:00:12 2007
From: t.r-a_ckright1 at tiscali.co.uk (Michael Pain)
Date: Wed, 4 Apr 2007 13:00:12 +0100
Subject: [Bioperl-l]  Re: read it immediately
Message-ID: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>

I have received three dics but i can not access the files as no ID or pasword was included in the package,I have paid for all this! Can you sort it out.

Regards Michael Pain


From thiago.venancio at gmail.com  Wed Apr  4 18:14:04 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Wed, 4 Apr 2007 15:14:04 -0300
Subject: [Bioperl-l] read it immediately
In-Reply-To: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>
References: <000501c776b0$cd5dd9b0$a7d42d54@122882420315>
Message-ID: <44255ea80704041114pc284522tef2d3a3944763b90@mail.gmail.com>

I think you emailed the wrong list...

On 4/4/07, Michael Pain <t.r-a_ckright1 at tiscali.co.uk> wrote:
>
> I have received three dics but i can not access the files as no ID or
> pasword was included in the package,I have paid for all this! Can you sort
> it out.
>
> Regards Michael Pain
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gdorjee at hotmail.com  Wed Apr  4 18:17:57 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 11:17:57 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
Message-ID: <9842643.post@talk.nabble.com>


hi all,
can anyone plz help me out with this problem that i've been dealing with for
quite a while now. following is a part of my script that's not working for
some reason. it is suppose to get the sequence from 'result/fasta.faa' and
do the blast.

###my script ###########
......
my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/database/nr',
                                                 _READMETHOD => 'Blast'
                                                  );
$factory->outfile("result/out.blast");
my $blastreport = $factory->blastall($queryin);
.....

when i paste the protein sequence into the textarea of my html page and save
the same as 'result/fasta.faa', so that the above script would do the blast,
i get the following error: 

Software error:
------------- EXCEPTION  -------------
MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
STACK Bio::Tools::Run::StandAloneBlast::blastpgp
/usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
--------------------------------------
i would appreciate your help.
i would also like to add that the 'result/fasta.faa' has the sequence saved
in it.

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9842643
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gowthaman.ramasamy at sbri.org  Wed Apr  4 18:57:09 2007
From: gowthaman.ramasamy at sbri.org (Gowthaman Ramasamy)
Date: Wed, 4 Apr 2007 11:57:09 -0700
Subject: [Bioperl-l] How to patch something in installed bioperl module
Message-ID: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>


Hi List,
I am advised to patch (comment out some lines and add some) GFF.pm bioperl module.
How do i go about it?.
I have the latest Bioperl 1.5.2 version installed....via CPAN

I find GFF.pm in the following location...
/root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm


Do i have to recompile it after editing........
I am completely clue less......I have not done this earlier.....
Can any one help me to do this.

Many thanks in advance........

gowthaman


From dmessina at wustl.edu  Wed Apr  4 19:42:43 2007
From: dmessina at wustl.edu (David Messina)
Date: Wed, 4 Apr 2007 14:42:43 -0500
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>

The code snippet worked fine for me. I believe the problem is that  
'result/fasta.faa' is not getting passed to your code properly. You  
might try specifying a complete path to your input and output file --  
relative paths, especially through a web app, can be tricky.

> when i paste the protein sequence into the textarea of my html page  
> and save
> the same as 'result/fasta.faa', so that the above script would do  
> the blast,

I'm not sure from what you wrote -- did you try running your script  
on the command line (having created 'result/fasta.faa' manually  
first)? If that is working for you, then the problem is with getting  
the data from the webpage into the script, not with the blasting part.

Dave

This is what I did:

  % ls test.pl testp*
test.pl       testp.fa

% formatdb -i testp.fa

% ls test.pl testp*
test.pl       testp.fa      testp.fa.phr  testp.fa.pin  testp.fa.psq

% perl test.pl testp.fa
%  head -10 out.blast
BLASTP 2.2.10 [Oct-19-2004]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens]
          (235 letters)


Your code: I changed only the input filename and the input database  
name, and saved the script as test.pl
-----------------------
#!/usr/bin/perl

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Tools::Run::StandAloneBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' =>
'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
'blastp',
                                                  'database' =>
'testp.fa',
                                                  _READMETHOD => 'Blast'
                                                   );
$factory->outfile("out.blast");
my $blastreport = $factory->blastall($queryin);
------------------------------------------------------------------------ 
-----------


From gdorjee at hotmail.com  Wed Apr  4 21:44:27 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 14:44:27 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>
References: <9842643.post@talk.nabble.com>
	<35EE39CF-4A25-4453-8073-48CA0E9317EB@wustl.edu>
Message-ID: <9846257.post@talk.nabble.com>


Thanks for your reply Dave. I don't think that there's anything wrong with
the open(OUTPUT,">result/fasta.faa"); line as I could get the 'fasta.faa'
file with the sequence in it. I see it. It looks like the blast is not being
able to read from the result/fasta.faa. 
^ ^* 


Dave Messina-2 wrote:
> 
> The code snippet worked fine for me. I believe the problem is that  
> 'result/fasta.faa' is not getting passed to your code properly. You  
> might try specifying a complete path to your input and output file --  
> relative paths, especially through a web app, can be tricky.
> 
>> when i paste the protein sequence into the textarea of my html page  
>> and save
>> the same as 'result/fasta.faa', so that the above script would do  
>> the blast,
> 
> I'm not sure from what you wrote -- did you try running your script  
> on the command line (having created 'result/fasta.faa' manually  
> first)? If that is working for you, then the problem is with getting  
> the data from the webpage into the script, not with the blasting part.
> 
> Dave
> 
> This is what I did:
> 
>   % ls test.pl testp*
> test.pl       testp.fa
> 
> % formatdb -i testp.fa
> 
> % ls test.pl testp*
> test.pl       testp.fa      testp.fa.phr  testp.fa.pin  testp.fa.psq
> 
> % perl test.pl testp.fa
> %  head -10 out.blast
> BLASTP 2.2.10 [Oct-19-2004]
> 
> 
> Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.  
> Schaffer,
> Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
> "Gapped BLAST and PSI-BLAST: a new generation of protein database search
> programs",  Nucleic Acids Res. 25:3389-3402.
> 
> Query= gi|64654269|gb|AAH96193.1| HOXB1 protein [Homo sapiens]
>           (235 letters)
> 
> 
> Your code: I changed only the input filename and the input database  
> name, and saved the script as test.pl
> -----------------------
> #!/usr/bin/perl
> 
> use strict;
> use warnings;
> use Bio::SeqIO;
> use Bio::Tools::Run::StandAloneBlast;
> 
> my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], '-format' =>
> 'Fasta');
> my $queryin = $Seq_in->next_seq();
> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
> 'blastp',
>                                                   'database' =>
> 'testp.fa',
>                                                   _READMETHOD => 'Blast'
>                                                    );
> $factory->outfile("out.blast");
> my $blastreport = $factory->blastall($queryin);
> ------------------------------------------------------------------------ 
> -----------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9846257
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From torsten.seemann at infotech.monash.edu.au  Thu Apr  5 00:17:10 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 10:17:10 +1000
Subject: [Bioperl-l] How to patch something in installed bioperl module
In-Reply-To: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>
References: <A4D285B054CE4641A93F1B2046B2B3CD0762C9@mail01.sbri.org>
Message-ID: <a79f6a4b0704041717q160be28eu472d32d3cd704eba@mail.gmail.com>

> I am advised to patch (comment out some lines and add some) GFF.pm bioperl module.
> How do i go about it?.

First, make a backup of the original file.
Then just edit the original (add/remove lines).

> I have the latest Bioperl 1.5.2 version installed....via CPAN
> I find GFF.pm in the following location...
> /root/.cpan/build/bioperl-1.5.2_102/Bio/Tools/GFF.pm

This is not where it is installed. That is where the CPAN program
uncompressed it to before installing. It is more likely in a directory
like this:
/usr/lib/perl5/site_perl/5.8.5/Bio/Tools/GFF.pm
But it depends on how your Perl setup arranges things!

> Do i have to recompile it after editing........

No.

--Torsten


From torsten.seemann at infotech.monash.edu.au  Thu Apr  5 00:22:37 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 10:22:37 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>

> Software error:
> ------------- EXCEPTION  -------------
> MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50

> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' => 'Fasta');

Does this still happen if you give the full path to the FASTA file?
eg. -file => /usr/local/apache2/htdocs/result/fasta.faa
(I'm guessing what the full path is here)

--Torsten


From gilbertd at cricket.bio.indiana.edu  Thu Apr  5 00:59:23 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Wed, 4 Apr 2007 19:59:23 -0500 (EST)
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
Message-ID: <200704050059.l350xNF07452@cricket.bio.indiana.edu>


Dear Bioperl list,

There is a small bug in what I think is the current Bio::Tools::GFF.pm,
that blocks output of Target attributes (in gff3 at least).  See a patch
here

http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF

-- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


From torsten.seemann at infotech.monash.edu.au  Thu Apr  5 01:34:17 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 11:34:17 +1000
Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports
Message-ID: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>

Dear all,

I have been migrating all our BLAST infrastructure to use the XML
output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
before, and encountered some issues I hope you can help me with:

1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
Bio::Search::Result::GenericResult object. This means I can not use
the PSI-BLAST functions like iterations() and psiblast() provided by
Bio::Search::Result::BlastResult. I'm guessing this is because the the
XML output reports itself as a plain BLASTP output:
<BlastOutput_program>blastp</BlastOutput_program>

How do I determine if it is a PSI-BLAST report?

2. Usually a PSI-BLAST report has multiple Iterations. The XML output
has <Iteration> tags but it took me a while to figure out that these
get mapped to Bio::SearchIO::Result objects accessible via
Bio::SearchIO->next_result().

Is this the proper way to process the iterations?

3. I also notice that only the first result (iteration) has the
query_name set. Subsequent ones are empty:
RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
query=MyProtein , db=uniprot_sprot
RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
, db=uniprot_sprot

Is this a bug or expected?

I'm guessing a lot of these problems are simply due to limitations of
the NCBI BLAST XML DTD?

--Torsten


From gdorjee at hotmail.com  Thu Apr  5 00:59:08 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Wed, 4 Apr 2007 17:59:08 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041722je9ad150gb0f0685248d728e2@mail.gmail.com>
Message-ID: <9848412.post@talk.nabble.com>


hi Torsten,
Yes, it still gives me the same error even if I give the full path to the
fasta file. Following is how I did: 

####### part of my script #######
my $Seq_in = Bio::SeqIO->new (-file =>
'/export/home/local/apache2/htdocs/result/fasta.faa', -format => 'Fasta');
my $queryin = $Seq_in->next_seq();
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/dorjee/database/nrpart',
                                                 _READMETHOD => 'Blast'
                                                   );
$factory->outfile("/export/home/local/apache2/htdocs/result/out.blast");
my $blastreport = $factory->blastall($queryin);
....

thanks man.


Torsten Seemann wrote:
> 
>> Software error:
>> ------------- EXCEPTION  -------------
>> MSG:    not Bio::Seq object or array of Bio::Seq objects or file name!
>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
>> /usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:611
>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
> 
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
>> 'Fasta');
> 
> Does this still happen if you give the full path to the FASTA file?
> eg. -file => /usr/local/apache2/htdocs/result/fasta.faa
> (I'm guessing what the full path is here)
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9848412
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From torsten.seemann at infotech.monash.edu.au  Thu Apr  5 02:57:09 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 5 Apr 2007 12:57:09 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9842643.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
Message-ID: <a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>

DeeGee,

Please add the following lines to help deduce the problem:

> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
> 'Fasta');

die "could not open fasta" if not defined $Seq_in;

> my $queryin = $Seq_in->next_seq();

die "could not get seq" if not defined $queryin;

Does anything happen now?

...

Some other comments:

> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
> STACK Bio::Tools::Run::StandAloneBlast::blastpgp

I'm not sure why it is in the blastpgp() method when you chose
$factory->blastall() ?

>                                                  _READMETHOD => 'Blast'

I don't think this is required anymore in modern Bioperl. Are you
using 1.5.x or bioperl-live ?

> when i paste the protein sequence into the textarea of my html page and
> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50

So this is a CGI script?
Does the script run as user 'apache' or 'httpd', or as yourself via SuEXEC?
Does 'apache' have permissions to READ/WRITE the result/ directory?

--Torsten


From cjfields at uiuc.edu  Thu Apr  5 04:14:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 4 Apr 2007 23:14:46 -0500
Subject: [Bioperl-l] Help parsing PSI-BLAST XML reports
In-Reply-To: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>
References: <a79f6a4b0704041834h68fc48c4w791b2cc0434edfb3@mail.gmail.com>
Message-ID: <8EA4D933-9B99-485E-9CEA-AB39297F90B4@uiuc.edu>

On Apr 4, 2007, at 8:34 PM, Torsten Seemann wrote:

> Dear all,
>
> I have been migrating all our BLAST infrastructure to use the XML
> output mode, the "blastpgp -m 7" option, referred to 'blastxml' format
> in Bioperl. I had never used SearchIO to parse a PSI-BLAST XML report
> before, and encountered some issues I hope you can help me with:
>
> 1. When loading with Bio::SearchIO(-format=>'blastxml') I get back a
> Bio::Search::Result::GenericResult object. This means I can not use
> the PSI-BLAST functions like iterations() and psiblast() provided by
> Bio::Search::Result::BlastResult. I'm guessing this is because the the
> XML output reports itself as a plain BLASTP output:
> <BlastOutput_program>blastp</BlastOutput_program>
>
> How do I determine if it is a PSI-BLAST report?

I don't know if you can very easily, though I haven't tried myself.   
If I remember correctly there wasn't a substantial difference in the  
XML output between regular BLAST XML and PSI-BLAST XML.  We could add  
a parameter to the parser to treat the report as PSI-BLAST.

> 2. Usually a PSI-BLAST report has multiple Iterations. The XML output
> has <Iteration> tags but it took me a while to figure out that these
> get mapped to Bio::SearchIO::Result objects accessible via
> Bio::SearchIO->next_result().
>
> Is this the proper way to process the iterations?

The problem is in the way that NCBI now outputs multiple-query BLAST  
XML reports, which apparently changed sometime in the last year w/o  
notice.  This was also a problem with other Bio* parsers (I remember  
seeing something about it on the BioPython list).  Previously  
multiquery BLAST requests were output like single XML reports  
concatenated together, each with their own XML declaration, etc.  Now  
they are treated like iterations (query 1 = iteration 1, query 2 =  
iteration 2, etc) all in one long BLAST report.  There's an example  
of one in the SearchIO tests which I added to CVS in Jan-Feb,  
post-1.5.2.  The current parser handles both old and new cases.

The current behavior of the parser is to parse everything up front,  
building up the ResultI's and then returning them one-by-one upon  
next_result(), which is horrible on memory if you have tons of XML to  
wade through.  I will probably change that to carve the data up into  
report-sized chunks of XML and parse them piecemeal, but I haven't  
had time to work on it yet.

> 3. I also notice that only the first result (iteration) has the
> query_name set. Subsequent ones are empty:
> RESULT 1 Bio::Search::Result::GenericResult, algorithm= BLASTP,
> query=MyProtein , db=uniprot_sprot
> RESULT 2 Bio::Search::Result::GenericResult, algorithm= BLASTP, query=
> , db=uniprot_sprot
>
> Is this a bug or expected?

If you are using 1.5.2 then there is a bug related to that which was  
fixed in CVS a few months back (related to the multiquery issue  
above).  If it isn't let me know.

> I'm guessing a lot of these problems are simply due to limitations of
> the NCBI BLAST XML DTD?
>
> --Torsten

To tell the truth I'm not sure.  One would think they could add some  
designation to the report for PSI-BLAST!

chris


From cjfields at uiuc.edu  Thu Apr  5 17:40:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 12:40:41 -0500
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
Message-ID: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>

Roy Chaudhuri has raised an interesting question in a bug report  
filed regarding 'bless'-ing objects into another (similar) class.   
The bug report on this is here:

http://bugzilla.open-bio.org/show_bug.cgi?id=2262

The following code (from the bug report) illustrates the problem.   
Note some of this is taken from the Bio::Seq::Meta::Array POD, though  
the example sequence object is a LocatableSeq (PrimarySeqI) and not a  
SeqI:

use Bio::SeqIO;
use Bio::Seq::Meta::Array;
# $seq isa Bio::SeqI
my $seq=Bio::SeqIO->new(-fh=>\*ARGV, -format=>'genbank')->next_seq;
# $seq is still a Bio::SeqI
bless $seq, 'Bio::Seq::Meta::Array';
Bio::SeqIO->new(-format=>'genbank')->write_seq($seq);

This produces sequence output missing sequence data, a definition,  
and other odds and ends.  $seq is first a Bio::Seq::RichSeq and is  
blessed into a Bio::Seq::Meta::Array; both times $seq remains  
Bio::SeqI.  However, Bio::Seq::Meta::Array has an odd inheritance  
tree which also makes it a Bio::PrimarySeqI and a Bio::Seq::MetaI (ick):

use base qw(Bio::LocatableSeq Bio::Seq Bio::Seq::MetaI);

Bio::LocatableSeq has a seq() method inherited from Bio::PrimarySeq,  
for instance, so using $seq->seq() invokes Bio::PrimarySeq::seq()  
instead of Bio::Seq::seq().  No problem in most cases as long as  
PrimarySeqI is blessed into another PrimarySeqI, but if one blesses a  
Bio::SeqI into a Bio::Seq::Meta::Array (as in the example) then  
PrimarySeq::seq() expects a raw sequence and gets none (since the  
data is stored internally as a PrimarySeq in a different location)  
and no sequence is output.  This happens similarly for other stored  
object data.

I'm not sure why Bio::Seq::Meta::Array is set up this way.  Do we  
want to support using 'bless $obj, Class' with Bio::SeqI/PrimarySeqI,  
or should Bio::Seq::Meta::Array be changed so that it follows one  
interface or the other?

chris


From hlapp at gmx.net  Thu Apr  5 18:27:39 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 5 Apr 2007 14:27:39 -0400
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
In-Reply-To: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
Message-ID: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>


On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:

> Do we want to support using 'bless $obj, Class'

This smacks of over-clever programming and looks like a sure way to  
obfuscate what you're doing. I'm not sure why we need to support this  
construct.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Thu Apr  5 18:44:38 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 13:44:38 -0500
Subject: [Bioperl-l] Mixed bless-ings with Bio::Seq/Bio::PrimarySeq
	(Bio::Seq::Meta::Array)
In-Reply-To: <421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>
	<421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
Message-ID: <F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>

I tend to agree on that front as it seems too prone to subtle issues  
with inheritance (as the bug demonstrates).

Related to that, do we want to have Bio::Seq::Meta::Array implement  
either PrimarySeqI or SeqI?  Having it implement both is definitely  
not working as expected.

chris

On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote:

>
> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:
>
>> Do we want to support using 'bless $obj, Class'
>
> This smacks of over-clever programming and looks like a sure way to  
> obfuscate what you're doing. I'm not sure why we need to support  
> this construct.
>
> 	-hilmar
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From mkiwala at watson.wustl.edu  Thu Apr  5 19:11:22 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Thu, 05 Apr 2007 14:11:22 -0500
Subject: [Bioperl-l] Mixed bless-ings with
	Bio::Seq/Bio::PrimarySeq	(Bio::Seq::Meta::Array)
In-Reply-To: <F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>
References: <24D227C7-F6DC-47FA-AAA8-7565DD5931A6@uiuc.edu>	<421D1A5B-4F4A-46D9-8829-2DCB1D8E7DE5@gmx.net>
	<F8DA6473-7B29-4B66-BF41-28CD365894A5@uiuc.edu>
Message-ID: <461549DA.90709@watson.wustl.edu>

My vote is for SeqI.

I was using the SeqWithQuality class and more recently switched over to 
Bio::Seq::Quality as we are upgrading from 1.4 to 1.5.2. The sequences 
I'm working with are destined for GenBank and have features and quality 
values. I've written a module (that I call GenBank::Tbl2Asn) that 
accepts a Bio::Seq::Quality with features and runs tbl2asn on it to 
produce a file that we send to GenBank. I don't know of any other class 
that suites my needs better than Bio::Seq::Quality inheriting from 
Bio::SeqI.


Chris Fields wrote:
> I tend to agree on that front as it seems too prone to subtle issues  
> with inheritance (as the bug demonstrates).
>
> Related to that, do we want to have Bio::Seq::Meta::Array implement  
> either PrimarySeqI or SeqI?  Having it implement both is definitely  
> not working as expected.
>
> chris
>
> On Apr 5, 2007, at 1:27 PM, Hilmar Lapp wrote:
>
>   
>> On Apr 5, 2007, at 1:40 PM, Chris Fields wrote:
>>
>>     
>>> Do we want to support using 'bless $obj, Class'
>>>       
>> This smacks of over-clever programming and looks like a sure way to  
>> obfuscate what you're doing. I'm not sure why we need to support  
>> this construct.
>>
>> 	-hilmar
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>   


From gdorjee at hotmail.com  Thu Apr  5 21:09:14 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Thu, 5 Apr 2007 14:09:14 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
Message-ID: <9864004.post@talk.nabble.com>


Thanks again, Torsten. I tried (die "could not get seq" if not defined
$queryin;) as you suggested, and now I get the following error message:

Software error:
could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.

Does this mean that next_seq() method in 'my $queryin =
$Seq_in->next_seq();' has some problem? How can I fix it? I would appreciate
your help.
Cheers!


Torsten Seemann wrote:
> 
> DeeGee,
> 
> Please add the following lines to help deduce the problem:
> 
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '-format' =>
>> 'Fasta');
> 
> die "could not open fasta" if not defined $Seq_in;
> 
>> my $queryin = $Seq_in->next_seq();
> 
> die "could not get seq" if not defined $queryin;
> 
> Does anything happen now?
> 
> ...
> 
> Some other comments:
> 
>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>
>> 'blastp',
>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
> 
> I'm not sure why it is in the blastpgp() method when you chose
> $factory->blastall() ?
> 
>>                                                  _READMETHOD => 'Blast'
> 
> I don't think this is required anymore in modern Bioperl. Are you
> using 1.5.x or bioperl-live ?
> 
>> when i paste the protein sequence into the textarea of my html page and
>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
> 
> So this is a CGI script?
> Does the script run as user 'apache' or 'httpd', or as yourself via
> SuEXEC?
> Does 'apache' have permissions to READ/WRITE the result/ directory?
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9864004
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Thu Apr  5 23:32:55 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 5 Apr 2007 18:32:55 -0500
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9864004.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
Message-ID: <3ED7F1E9-FE21-4796-99AC-0CD0EA418563@uiuc.edu>


On Apr 5, 2007, at 4:09 PM, DeeGee wrote:

>
> Thanks again, Torsten. I tried (die "could not get seq" if not defined
> $queryin;) as you suggested, and now I get the following error  
> message:
>
> Software error:
> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
>
> Does this mean that next_seq() method in 'my $queryin =
> $Seq_in->next_seq();' has some problem? How can I fix it? I would  
> appreciate
> your help.
> Cheers!

This indicates there is likely some problem with your sequence file  
(either it isn't fasta or something else is wrong), but w/o actually  
seeing it we can't be sure.  I can't be sure but I don't think it is  
a next_seq() issue.  Also, if there are problems accessing the file  
the stream object should throw an error so I don't think it is that  
either...

chris

>
> Torsten Seemann wrote:
>>
>> DeeGee,
>>
>> Please add the following lines to help deduce the problem:
>>
>>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', '- 
>>> format' =>
>>> 'Fasta');
>>
>> die "could not open fasta" if not defined $Seq_in;
>>
>>> my $queryin = $Seq_in->next_seq();
>>
>> die "could not get seq" if not defined $queryin;
>>
>> Does anything happen now?
>>
>> ...
>>
>> Some other comments:
>>
>>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>
>>> 'blastp',
>>> STACK Bio::Tools::Run::StandAloneBlast::blastpgp
>>
>> I'm not sure why it is in the blastpgp() method when you chose
>> $factory->blastall() ?
>>
>>>                                                  _READMETHOD =>  
>>> 'Blast'
>>
>> I don't think this is required anymore in modern Bioperl. Are you
>> using 1.5.x or bioperl-live ?
>>
>>> when i paste the protein sequence into the textarea of my html  
>>> page and
>>> STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:50
>>
>> So this is a CGI script?
>> Does the script run as user 'apache' or 'httpd', or as yourself via
>> SuEXEC?
>> Does 'apache' have permissions to READ/WRITE the result/ directory?
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9864004
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From torsten.seemann at infotech.monash.edu.au  Fri Apr  6 00:40:32 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Fri, 6 Apr 2007 10:40:32 +1000
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9864004.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
Message-ID: <a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>

Dorjee,

> thanks alot for your reply again. as per your suggestion (using 'die "could
> not get seq" if not defined $queryin;'), i now get the following error
> message:
> Software error:
> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
> i've attached the script. could you plz have a look at it and see where am i
> going wrong.
> cheers mate!

This strongly suggests that your FASTA file is not actually in FASTA format.
http://en.wikipedia.org/wiki/Fasta_format

Does it work if you pass it to blastall on the command line?
eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr

> Saier Lab.
> 858-534-2457

Are you working at UCSD?

--Torsten


From gdorjee at hotmail.com  Fri Apr  6 03:26:16 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Thu, 5 Apr 2007 20:26:16 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
Message-ID: <9867402.post@talk.nabble.com>


hi Torsten,  
blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works
perfectly fine on the command line, and the 'fasta.faa' is in fasta format:

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

it seems like i'm just one bloody step away from success. ^ ^* can't figure
out the prob. 
thanks for your help.


Torsten Seemann wrote:
> 
> Dorjee,
> 
>> thanks alot for your reply again. as per your suggestion (using 'die
>> "could
>> not get seq" if not defined $queryin;'), i now get the following error
>> message:
>> Software error:
>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
>> i've attached the script. could you plz have a look at it and see where
>> am i
>> going wrong.
>> cheers mate!
> 
> This strongly suggests that your FASTA file is not actually in FASTA
> format.
> http://en.wikipedia.org/wiki/Fasta_format
> 
> Does it work if you pass it to blastall on the command line?
> eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
> 
>> Saier Lab.
>> 858-534-2457
> 
> Are you working at UCSD?
> 
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9867402
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From tuco at pasteur.fr  Fri Apr  6 13:33:08 2007
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Fri, 06 Apr 2007 15:33:08 +0200
Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior
Message-ID: <46164C14.8040701@pasteur.fr>

Hi folks,

I have a strange behavior from Bio::SeqIO::embl.
When I read an EMBL file as an input and write to another one, the tags
in the output file (EMBL format) are not in the same order as the original
file.
Is it a normal and expecting result ?

I anyone want to test it as a perl on line here is the code :

perl -MBio::SeqIO -e '$i = Bio::SeqIO->new(-file => "file.embl", -format 
=> "EMBL"); $o = Bio::SeqIO->new(-file => ">new.embl", -format => 
"EMBL"); while($e = $i->next_seq()){ $o->write_seq($e);  }'

I checked in the embl.pm code but was enable to find where this behavior 
came from.

If someone has the solution or any clue.

Thanks

Regards

Emmanuel

-- 
-------------------------
Emmanuel Quevillon
Softwares and data banks
Pasteur Insititue
tuco at_ pasteur dot fr	
-------------------------


From dmessina at wustl.edu  Fri Apr  6 15:09:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Fri, 6 Apr 2007 10:09:51 -0500
Subject: [Bioperl-l] Bio::Annotation::Collection strange behavior
In-Reply-To: <46164C14.8040701@pasteur.fr>
References: <46164C14.8040701@pasteur.fr>
Message-ID: <7C67D287-DE2A-488A-8636-01EFF468368D@wustl.edu>

> Is it a normal and expecting result ?

Yes, unfortunately. Due to the complexity of the parsing, it is  
surprisingly difficult to "round-trip" some sequence file formats.

http://www.bioperl.org/wiki/HOWTO:SeqIO#Caveats


Dave


From jason at bioperl.org  Fri Apr  6 15:42:41 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 6 Apr 2007 08:42:41 -0700
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9867402.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
Message-ID: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>

When/How are are you writing your sequences to this file result.faa?   
are you using seqIO or bioperl to write the sequence  to a file?
I'm wondering if this is I/O buffering problem.

On Apr 5, 2007, at 8:26 PM, DeeGee wrote:

>
> hi Torsten,
> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr  
> works
> perfectly fine on the command line, and the 'fasta.faa' is in fasta  
> format:
>
>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV 
> SPSMTVASSQ
> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA 
> GTAPGAEGPA
> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF 
> RRKEHLRRHR
> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH 
> QRIHGRAAAS
> AQGAVAPGPDGGGPFPPWPLG
>
> it seems like i'm just one bloody step away from success. ^ ^*  
> can't figure
> out the prob.
> thanks for your help.
>
>
> Torsten Seemann wrote:
>>
>> Dorjee,
>>
>>> thanks alot for your reply again. as per your suggestion (using 'die
>>> "could
>>> not get seq" if not defined $queryin;'), i now get the following  
>>> error
>>> message:
>>> Software error:
>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl  
>>> line 50.
>>> i've attached the script. could you plz have a look at it and see  
>>> where
>>> am i
>>> going wrong.
>>> cheers mate!
>>
>> This strongly suggests that your FASTA file is not actually in FASTA
>> format.
>> http://en.wikipedia.org/wiki/Fasta_format
>>
>> Does it work if you pass it to blastall on the command line?
>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ 
>> database/nr
>>
>>> Saier Lab.
>>> 858-534-2457
>>
>> Are you working at UCSD?
>>
>> --Torsten
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9867402
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/0c70723e/attachment.p7s>

From bernd.web at gmail.com  Fri Apr  6 18:00:18 2007
From: bernd.web at gmail.com (Bernd Web)
Date: Fri, 6 Apr 2007 20:00:18 +0200
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
Message-ID: <716af09c0704061100n1555915bw18050639d25cbf89@mail.gmail.com>

Hi Dorjee,

Do you now use complete file paths everywhere (instead of some
relative paths that were in your script).  Did you check all read and
execute permission (turn r, x on for group and others)? And regarding
the fasta file I'd suggest closing the filehandle after you printed
the fasta sequence to the file.

open(OUTPUT,">result/fasta.faa"); #don't use this relative path and
use the "die" as was suggested earlier.
.... your other code lines
print OUTPUT
"$desc\n$seqo\n";
close(OUTPUT); #close the file.

Also check if your complete script runs from the command-line as to be
sure your problems are not related to the webserver enviroment.


BTW I do think you do not want to parse your fasta file like you do:
if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;}
$fasta_file=~s/[\n\r]//g;
if ($fasta_file =~ /([A-Z]{10}.+)/){$seqo=$1;}

$seqo will contain the description as well, so your sequence starts
with the description.
BioPerl provides code for fasta file parsing too ;-) If you really
want to stick to your code you can catch the $desc and $seqo in one
RegExp, or replace this line:
if ($fasta_file =~ /^(\>.+)\s+/){$desc=$1;}
with
if ($fasta_file =~ s/^(\>.+)\s+//){$desc=$1;}


I hope you will get your script working now.

Regards,
Bernd

On 4/6/07, Jason Stajich <jason at bioperl.org> wrote:
> When/How are are you writing your sequences to this file result.faa?  are
> you using seqIO or bioperl to write the sequence  to a file?
> I'm wondering if this is I/O buffering problem.
>
>
>
> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>
>
> hi Torsten,
> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr works
> perfectly fine on the command line, and the 'fasta.faa' is in fasta format:
>
>
> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
> AQGAVAPGPDGGGPFPPWPLG
>
> it seems like i'm just one bloody step away from success. ^ ^* can't figure
> out the prob.
> thanks for your help.
>
>
> Torsten Seemann wrote:
>
> Dorjee,
>
>
> thanks alot for your reply again. as per your suggestion (using 'die
> "could
> not get seq" if not defined $queryin;'), i now get the following error
> message:
> Software error:
> could not get seq at
> /usr/local/apache2/htdocs/remote_ncbi.pl line 50.
> i've attached the script. could you plz have a look at it and see where
> am i
> going wrong.
> cheers mate!
>
> This strongly suggests that your FASTA file is not actually in FASTA
> format.
> http://en.wikipedia.org/wiki/Fasta_format
>
> Does it work if you pass it to blastall on the command line?
> eg. blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>
>
> Saier Lab.
> 858-534-2457
>
> Are you working at UCSD?
>
> --Torsten
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> --
> View this message in context:
> http://www.nabble.com/blastall-problem-tf3527412.html#a9867402
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.htmlhttp://fungalgenomes.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From gdorjee at hotmail.com  Fri Apr  6 17:39:38 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Fri, 6 Apr 2007 10:39:38 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
Message-ID: <9875685.post@talk.nabble.com>


Following is the part of my script, which is in the 'htdocs' directory:

####### part of my script #############
#generate a new CGI object from the input to the CGI script
my $query=new CGI;

open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");

print STDOUT $query->header();
print STDOUT $query->start_html(-title=>"Response from blast",
-BGCOLOR=>"#FFFFFF");
print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";

#gets the sequence from the html textarea with ?post? method
my $fasta_file=$query->param('sequence');
print OUTPUT $fasta_file;

#Local blast of the input sequence against nr database
my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
'Fasta');
die "could not open fasta" if not defined $Seq_in;
my $queryin = $Seq_in->next_seq();
die "could not get seq" if not defined $queryin;
my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  => 'blastp',
                                                 'database' =>
'/export/home/dorjee/database/nr',
                                                 _READMETHOD => 'Blast'
                                                   );
$factory->outfile("result/out.blast");
my $blastreport = $factory->blastall($queryin);
.....

Thank you.


Jason Stajich-3 wrote:
> 
> When/How are are you writing your sequences to this file result.faa?   
> are you using seqIO or bioperl to write the sequence  to a file?
> I'm wondering if this is I/O buffering problem.
> 
> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
> 
>>
>> hi Torsten,
>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr  
>> works
>> perfectly fine on the command line, and the 'fasta.faa' is in fasta  
>> format:
>>
>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASV 
>> SPSMTVASSQ
>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLA 
>> GTAPGAEGPA
>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAF 
>> RRKEHLRRHR
>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRH 
>> QRIHGRAAAS
>> AQGAVAPGPDGGGPFPPWPLG
>>
>> it seems like i'm just one bloody step away from success. ^ ^*  
>> can't figure
>> out the prob.
>> thanks for your help.
>>
>>
>> Torsten Seemann wrote:
>>>
>>> Dorjee,
>>>
>>>> thanks alot for your reply again. as per your suggestion (using 'die
>>>> "could
>>>> not get seq" if not defined $queryin;'), i now get the following  
>>>> error
>>>> message:
>>>> Software error:
>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl  
>>>> line 50.
>>>> i've attached the script. could you plz have a look at it and see  
>>>> where
>>>> am i
>>>> going wrong.
>>>> cheers mate!
>>>
>>> This strongly suggests that your FASTA file is not actually in FASTA
>>> format.
>>> http://en.wikipedia.org/wiki/Fasta_format
>>>
>>> Does it work if you pass it to blastall on the command line?
>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/ 
>>> database/nr
>>>
>>>> Saier Lab.
>>>> 858-534-2457
>>>
>>> Are you working at UCSD?
>>>
>>> --Torsten
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/blastall- 
>> problem-tf3527412.html#a9867402
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
> 
> 
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9875685
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Fri Apr  6 18:40:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 6 Apr 2007 11:40:42 -0700
Subject: [Bioperl-l] blastall problem
In-Reply-To: <9875685.post@talk.nabble.com>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
	<9875685.post@talk.nabble.com>
Message-ID: <A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>

Looks like you need to deal with buffering:

http://perl.plover.com/FAQs/Buffering.html

So you need to add this:
close(OUTPUT);

Alternatively you can build a sequence object and pass that in to the  
BLAST factory, then you don't have to mess around with creating  
temporary files or run into this sort of problem.

-jason
On Apr 6, 2007, at 10:39 AM, DeeGee wrote:

>
> Following is the part of my script, which is in the 'htdocs'  
> directory:
>
> ####### part of my script #############
> #generate a new CGI object from the input to the CGI script
> my $query=new CGI;
>
> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");
>
> print STDOUT $query->header();
> print STDOUT $query->start_html(-title=>"Response from blast",
> -BGCOLOR=>"#FFFFFF");
> print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";
>
> #gets the sequence from the html textarea with ?post? method
> my $fasta_file=$query->param('sequence');
> print OUTPUT $fasta_file;
>
close(OUTPUT);
> #Local blast of the input sequence against nr database
> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
> 'Fasta');
> die "could not open fasta" if not defined $Seq_in;
> my $queryin = $Seq_in->next_seq();
> die "could not get seq" if not defined $queryin;
> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
> 'blastp',
>                                                  'database' =>
> '/export/home/dorjee/database/nr',
>                                                  _READMETHOD =>  
> 'Blast'
>                                                    );
> $factory->outfile("result/out.blast");
> my $blastreport = $factory->blastall($queryin);
> .....
>
> Thank you.
>
>
>
> Jason Stajich-3 wrote:
>>
>> When/How are are you writing your sequences to this file result.faa?
>> are you using seqIO or bioperl to write the sequence  to a file?
>> I'm wondering if this is I/O buffering problem.
>>
>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>>
>>>
>>> hi Torsten,
>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>>> works
>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta
>>> format:
>>>
>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA 
>>> SV
>>> SPSMTVASSQ
>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP 
>>> LA
>>> GTAPGAEGPA
>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK 
>>> AF
>>> RRKEHLRRHR
>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL 
>>> RH
>>> QRIHGRAAAS
>>> AQGAVAPGPDGGGPFPPWPLG
>>>
>>> it seems like i'm just one bloody step away from success. ^ ^*
>>> can't figure
>>> out the prob.
>>> thanks for your help.
>>>
>>>
>>> Torsten Seemann wrote:
>>>>
>>>> Dorjee,
>>>>
>>>>> thanks alot for your reply again. as per your suggestion (using  
>>>>> 'die
>>>>> "could
>>>>> not get seq" if not defined $queryin;'), i now get the following
>>>>> error
>>>>> message:
>>>>> Software error:
>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl
>>>>> line 50.
>>>>> i've attached the script. could you plz have a look at it and see
>>>>> where
>>>>> am i
>>>>> going wrong.
>>>>> cheers mate!
>>>>
>>>> This strongly suggests that your FASTA file is not actually in  
>>>> FASTA
>>>> format.
>>>> http://en.wikipedia.org/wiki/Fasta_format
>>>>
>>>> Does it work if you pass it to blastall on the command line?
>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/
>>>> database/nr
>>>>
>>>>> Saier Lab.
>>>>> 858-534-2457
>>>>
>>>> Are you working at UCSD?
>>>>
>>>> --Torsten
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>>> -- 
>>> View this message in context: http://www.nabble.com/blastall-
>>> problem-tf3527412.html#a9867402
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> Miller Research Fellow
>> University of California, Berkeley
>> lab: 510.642.8441
>> http://pmb.berkeley.edu/~taylor/people/js.html
>> http://fungalgenomes.org/
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> -- 
> View this message in context: http://www.nabble.com/blastall- 
> problem-tf3527412.html#a9875685
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
Miller Research Fellow
University of California, Berkeley
lab: 510.642.8441
http://pmb.berkeley.edu/~taylor/people/js.html
http://fungalgenomes.org/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070406/e9477659/attachment.p7s>

From MEC at stowers-institute.org  Fri Apr  6 20:34:37 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 6 Apr 2007 15:34:37 -0500
Subject: [Bioperl-l] Bio/DB/SeqFeature/Store/DBI/mysql.pm patched
Message-ID: <CED81D34E37D5043A1211565277A51E507E22BAF@exchkc02.stowers-institute.org>

Lincoln,

I just commited a patch to Bio/DB/SeqFeature/Store/DBI/mysql.pm which
avoids potential problem which, unless fixed, can generates warnings
that look like this:

prepare_cached(SELECT f.id,f.object
  FROM feature as f, typelist AS tl
  WHERE (   tl.id=f.typeid
   AND   (tl.tag LIKE ?)
)
  
) statement handle DBI::st=HASH(0x16f61c0) still Active at
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line
1427
DBD::mysql::st fetchrow_array failed: fetch() without execute() at
/home/mec/cvs/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm line
1416.

... as well as other downstream abberent program behaviour.  

I encounterd what the DBI manpage suggests might happen: "The results
will certainly not be what you expect"

This can happen, for example, when you open an iterator using
Bio::DB::SeqFeature::Store->get_seq_stream, and then while iterating,
perform other queries against the store.  My understanding of the DBI
doc is that this should only occur if the 2nd iterator is for the same
sql statement identically parameterized as the 1st, but I have not
proven beyond a doubt that this is what Bio::DB::SeqFeature::Store is
doing the way I am using it.  Nonetheless, the patch fixes my pipeline.

Cheers,

Malcolm


From gdorjee at hotmail.com  Fri Apr  6 22:27:54 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Fri, 6 Apr 2007 15:27:54 -0700 (PDT)
Subject: [Bioperl-l] blastall problem
In-Reply-To: <A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>
References: <9842643.post@talk.nabble.com>
	<a79f6a4b0704041957n3736f3d2j25d92a12e1601ce6@mail.gmail.com>
	<9864004.post@talk.nabble.com>
	<a79f6a4b0704051740m53fd286ara27a6b7570515a26@mail.gmail.com>
	<9867402.post@talk.nabble.com>
	<9A06EF4D-B0CA-4EC9-91A3-E516F7F5029A@bioperl.org>
	<9875685.post@talk.nabble.com>
	<A972DB11-113A-4039-B89D-242CEC001A4D@bioperl.org>
Message-ID: <9879110.post@talk.nabble.com>


I added the line: 
close(OUTPUT);
and now following error comes up, where 'out.blast' is supposed to be the
blast result file, but it not being created. 

Software error:
------------- EXCEPTION  -------------
MSG: Could not open /export/home/dorjee/result/out.blast: No such file or
directory
STACK Bio::Root::IO::_initialize_io /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new /usr/perl5/5.6.1/lib/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new /usr/perl5/5.6.1/lib/Bio/SearchIO.pm:167
STACK toplevel /usr/local/apache2/htdocs/remote_ncbi.pl:53

--------------------------------------


Jason Stajich-3 wrote:
> 
> Looks like you need to deal with buffering:
> 
> http://perl.plover.com/FAQs/Buffering.html
> 
> So you need to add this:
> close(OUTPUT);
> 
> Alternatively you can build a sequence object and pass that in to the  
> BLAST factory, then you don't have to mess around with creating  
> temporary files or run into this sort of problem.
> 
> -jason
> On Apr 6, 2007, at 10:39 AM, DeeGee wrote:
> 
>>
>> Following is the part of my script, which is in the 'htdocs'  
>> directory:
>>
>> ####### part of my script #############
>> #generate a new CGI object from the input to the CGI script
>> my $query=new CGI;
>>
>> open(OUTPUT,">/export/home/local/apache2/htdocs/result/fasta.faa");
>>
>> print STDOUT $query->header();
>> print STDOUT $query->start_html(-title=>"Response from blast",
>> -BGCOLOR=>"#FFFFFF");
>> print STDOUT "\n<h1><center>Results from the BLAST</center></h1>\n";
>>
>> #gets the sequence from the html textarea with ?post? method
>> my $fasta_file=$query->param('sequence');
>> print OUTPUT $fasta_file;
>>
> close(OUTPUT);
>> #Local blast of the input sequence against nr database
>> my $Seq_in = Bio::SeqIO->new (-file => 'result/fasta.faa', -format =>
>> 'Fasta');
>> die "could not open fasta" if not defined $Seq_in;
>> my $queryin = $Seq_in->next_seq();
>> die "could not get seq" if not defined $queryin;
>> my $factory = Bio::Tools::Run::StandAloneBlast->new('program'  =>  
>> 'blastp',
>>                                                  'database' =>
>> '/export/home/dorjee/database/nr',
>>                                                  _READMETHOD =>  
>> 'Blast'
>>                                                    );
>> $factory->outfile("result/out.blast");
>> my $blastreport = $factory->blastall($queryin);
>> .....
>>
>> Thank you.
>>
>>
>>
>> Jason Stajich-3 wrote:
>>>
>>> When/How are are you writing your sequences to this file result.faa?
>>> are you using seqIO or bioperl to write the sequence  to a file?
>>> I'm wondering if this is I/O buffering problem.
>>>
>>> On Apr 5, 2007, at 8:26 PM, DeeGee wrote:
>>>
>>>>
>>>> hi Torsten,
>>>> blastall -p blastp -i result/fasta.faa -d /export/home/database/nr
>>>> works
>>>> perfectly fine on the command line, and the 'fasta.faa' is in fasta
>>>> format:
>>>>
>>>>> gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
>>>> HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAA 
>>>> SV
>>>> SPSMTVASSQ
>>>> QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIP 
>>>> LA
>>>> GTAPGAEGPA
>>>> PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGK 
>>>> AF
>>>> RRKEHLRRHR
>>>> DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVL 
>>>> RH
>>>> QRIHGRAAAS
>>>> AQGAVAPGPDGGGPFPPWPLG
>>>>
>>>> it seems like i'm just one bloody step away from success. ^ ^*
>>>> can't figure
>>>> out the prob.
>>>> thanks for your help.
>>>>
>>>>
>>>> Torsten Seemann wrote:
>>>>>
>>>>> Dorjee,
>>>>>
>>>>>> thanks alot for your reply again. as per your suggestion (using  
>>>>>> 'die
>>>>>> "could
>>>>>> not get seq" if not defined $queryin;'), i now get the following
>>>>>> error
>>>>>> message:
>>>>>> Software error:
>>>>>> could not get seq at /usr/local/apache2/htdocs/remote_ncbi.pl
>>>>>> line 50.
>>>>>> i've attached the script. could you plz have a look at it and see
>>>>>> where
>>>>>> am i
>>>>>> going wrong.
>>>>>> cheers mate!
>>>>>
>>>>> This strongly suggests that your FASTA file is not actually in  
>>>>> FASTA
>>>>> format.
>>>>> http://en.wikipedia.org/wiki/Fasta_format
>>>>>
>>>>> Does it work if you pass it to blastall on the command line?
>>>>> eg. blastall -p blastp -i result/fasta.faa -d /export/home/
>>>>> database/nr
>>>>>
>>>>>> Saier Lab.
>>>>>> 858-534-2457
>>>>>
>>>>> Are you working at UCSD?
>>>>>
>>>>> --Torsten
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>
>>>> -- 
>>>> View this message in context: http://www.nabble.com/blastall-
>>>> problem-tf3527412.html#a9867402
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> --
>>> Jason Stajich
>>> Miller Research Fellow
>>> University of California, Berkeley
>>> lab: 510.642.8441
>>> http://pmb.berkeley.edu/~taylor/people/js.html
>>> http://fungalgenomes.org/
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/blastall- 
>> problem-tf3527412.html#a9875685
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> Miller Research Fellow
> University of California, Berkeley
> lab: 510.642.8441
> http://pmb.berkeley.edu/~taylor/people/js.html
> http://fungalgenomes.org/
> 
> 
>  
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
View this message in context: http://www.nabble.com/blastall-problem-tf3527412.html#a9879110
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gilbertd at cricket.bio.indiana.edu  Sat Apr  7 03:31:29 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Fri, 6 Apr 2007 22:31:29 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704070331.l373VTI22000@cricket.bio.indiana.edu>


Dear Bioperlers,

There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta
files have fixed line widths, but that isn't a requirement of Fasta
format. The documentation notes this package requirement, but I was
bitten by this, and I'd guess not many people check their data (esp.
if from someone else) to see it meets this requirement.

Simple tools can easily produce fasta with ragged line formatting:
e.g. genome assemblers that paste together contig fasta with spacers
to make assemblies.

It would be nice if B:D:Fasta would check and die when it can't handle
this ragged input.  Here is a suggested addition:

  package Bio::DB::Fasta;

=head1 DESCRIPTION
  
  Entries may have any line length up to 65,536 characters, and
  different line lengths are allowed in the same file.  However, within
  a sequence entry, all lines must be the same length except for the
  last.  
+ An error will be thrown if this is not the case.

=cut

  use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want 
  
  sub calculate_offsets {
  
     my ($offset,$id,$linelength,$type,$firstline,$count,$termination_length,%offsets);
  +  my ($l3_len,$l2_len,$l_len)=(0,0,0);
  
         $self->_check_linelength($linelength);
  +      ($l3_len,$l2_len,$l_len)=(0,0,0);
       } else {
  +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to check every line :(
  +      if(DIE_ON_MISSMATCHED_LINES &&
  +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
  +         my $fap= substr($_,0,20)."..";
  +         $self->throw("Each line of the fasta entry must be the same length except the last.
  +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
  +         }
  
         $linelength ||= length($_);
  
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


From hlapp at gmx.net  Sat Apr  7 16:42:13 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 12:42:13 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
References: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
Message-ID: <05D43C56-8B30-41C9-8C35-2CD77419DE7F@gmx.net>

Wouldn't it be easier (and more robust) to just reformat the file to  
meet the constant line width requirement? The code required to do  
that should be fewer lines than your addition below, I think.

For example, one could do a fast first-pass through the file simply  
checking that all sequence lines not followed by a description line  
or eof have the same length, stopping at the first line that fails  
the test. If unequal lengths, use Bio::SeqIO to read and write back  
out the fasta file, then continue as for well-formatted files.

	-hilmar

On Apr 6, 2007, at 11:31 PM, Don Gilbert wrote:

>
> Dear Bioperlers,
>
> There is a hidden issue with Bio::DB::Fasta in that it assumes Fasta
> files have fixed line widths, but that isn't a requirement of Fasta
> format. The documentation notes this package requirement, but I was
> bitten by this, and I'd guess not many people check their data (esp.
> if from someone else) to see it meets this requirement.
>
> Simple tools can easily produce fasta with ragged line formatting:
> e.g. genome assemblers that paste together contig fasta with spacers
> to make assemblies.
>
> It would be nice if B:D:Fasta would check and die when it can't handle
> this ragged input.  Here is a suggested addition:
>
>   package Bio::DB::Fasta;
>
> =head1 DESCRIPTION
>
>   Entries may have any line length up to 65,536 characters, and
>   different line lengths are allowed in the same file.  However,  
> within
>   a sequence entry, all lines must be the same length except for the
>   last.
> + An error will be thrown if this is not the case.
>
> =cut
>
>   use constant DIE_ON_MISSMATCHED_LINES => 1; # if you want
>
>   sub calculate_offsets {
>
>      my ($offset,$id,$linelength,$type,$firstline,$count, 
> $termination_length,%offsets);
>   +  my ($l3_len,$l2_len,$l_len)=(0,0,0);
>
>          $self->_check_linelength($linelength);
>   +      ($l3_len,$l2_len,$l_len)=(0,0,0);
>        } else {
>   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); #  
> need to check every line :(
>   +      if(DIE_ON_MISSMATCHED_LINES &&
>   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
>   +         my $fap= substr($_,0,20)."..";
>   +         $self->throw("Each line of the fasta entry must be the  
> same length except the last.
>   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
>   +         }
>
>          $linelength ||= length($_);
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Apr  7 21:13:24 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 17:13:24 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704071711.l37HBB823983@cricket.bio.indiana.edu>
References: <200704071711.l37HBB823983@cricket.bio.indiana.edu>
Message-ID: <8177CF47-558F-4891-97B5-69F327EF8A4A@gmx.net>

What I was suggesting was the indexer automatically does the  
reformatting, i.e., to have touch/change the input data if necessary  
(and obviously one would be able to turn this feature off when the  
correctness of the input formatting is known).

Are you suggesting that this automatic reformatting isn't possible?

	-hilmar

On Apr 7, 2007, at 1:11 PM, Don Gilbert wrote:

>
>
> Hilmar,
>
> I have added reformatting where appropriate (in code that installs the
> files for indexing by Bio::DB::Fasta).  What I'm suggesting is a patch
> to Bio::DB::Fasta to warn and die when the documented fixed width
> that Bio::DB::Fasta requires isn't met.  I.e., keep other folks from
> being bitten by this hard to identify requirement.  Then when they
> see that this indexer is failing on inappropriate inputs, they also  
> can reformat
> their Fasta to meet this requirement, and not continue to use the  
> software with
> bad results.  The operation of Bio::DB::Fasta is reading a sequence  
> stream
> and it doesn't touch/change the input data, so it would be hard to  
> patch it
> to re-format the input data.
>
> - Don
>
> -- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
> -- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sun Apr  8 01:00:51 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 7 Apr 2007 21:00:51 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>
References: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>
Message-ID: <B8009E72-30C5-479B-B7B9-456E859B80CB@gmx.net>

Since you'd have to reformat it though, how would you do it then  
(presumably offline)?

	-hilmar

On Apr 7, 2007, at 8:06 PM, Don Gilbert wrote:

>
>
> Hilmar,
>
> Yes, basically automatic reformatting isn't possible. If you are
> indexing a large genome of fasta data, I'd not want a bioperl script
> to rewrite that data, or create a new version, automatically.
>
> - Don

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From gilbertd at cricket.bio.indiana.edu  Sat Apr  7 17:11:11 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Sat, 7 Apr 2007 12:11:11 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704071711.l37HBB823983@cricket.bio.indiana.edu>


Hilmar,

I have added reformatting where appropriate (in code that installs the 
files for indexing by Bio::DB::Fasta).  What I'm suggesting is a patch
to Bio::DB::Fasta to warn and die when the documented fixed width
that Bio::DB::Fasta requires isn't met.  I.e., keep other folks from
being bitten by this hard to identify requirement.  Then when they
see that this indexer is failing on inappropriate inputs, they also can reformat 
their Fasta to meet this requirement, and not continue to use the software with
bad results.  The operation of Bio::DB::Fasta is reading a sequence stream
and it doesn't touch/change the input data, so it would be hard to patch it
to re-format the input data.

- Don

-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/


From gilbertd at cricket.bio.indiana.edu  Sun Apr  8 00:06:34 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Sat, 7 Apr 2007 19:06:34 -0500 (EST)
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
Message-ID: <200704080006.l3806Yt25235@cricket.bio.indiana.edu>


Hilmar,

Yes, basically automatic reformatting isn't possible. If you are
indexing a large genome of fasta data, I'd not want a bioperl script
to rewrite that data, or create a new version, automatically.

- Don


From gdorjee at hotmail.com  Mon Apr  9 04:18:39 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 8 Apr 2007 21:18:39 -0700 (PDT)
Subject: [Bioperl-l] parse blast report for the best evalue
Message-ID: <9898358.post@talk.nabble.com>


hi all, 
i'm trying to parse a blast report using Bio::SearchIO as follows, but since
this blast report is generated with many against many (database) fasta
sequences, there're many individual blast reports (one for each of the
sequence from the query file). i was wondering if there is a way to get only
the best hit (with best evalue) from each one of them.

##### part of my script ######
my $in = new Bio::SearchIO(-format => 'blast',  -file   => $blast_report);
while( my $result = $in->next_result ) {
        while( my $hit = $result->next_hit ) {
              ...........

thanks.


-- 
View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9898358
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From staffa at niehs.nih.gov  Mon Apr  9 15:43:19 2007
From: staffa at niehs.nih.gov (Staffa, Nick (NIH/NIEHS))
Date: Mon, 09 Apr 2007 11:43:19 -0400
Subject: [Bioperl-l] Retrieve mRNA from Genome
Message-ID: <C23FD757.3FAB%staffa@niehs.nih.gov>

I have been retrieving sub-sequence from Genbank genomic records by use of
Bio::SeqIO
and ->get_SeqFeatures, ->start ->end ,
but now I'm looking for a quick way to extract CDS or mRNA from
a multi-segmented annotation, e.g.
     mRNA          
join(72458..72791,84573..84613,93279..94419,94481..94656,
                     94719..94992,95056..95350,95438..95553,95614..96056)

Is there such a method?
Please point me to appropriate documentation.


Nick Staffa 
Telephone: 919-316-4569  (NIEHS: 6-4569)
Scientific Computing Support Group
NIEHS Information Technology Support Services Contract
(Science Task Monitor: John D. Grovenstein (grovens1 at niehs.nih.gov)
National Institute of Environmental Health Sciences
National Institutes of Health
Research Triangle Park, North Carolina


From Kevin.M.Brown at asu.edu  Mon Apr  9 16:19:19 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 9 Apr 2007 09:19:19 -0700
Subject: [Bioperl-l] Retrieve mRNA from Genome
In-Reply-To: <C23FD757.3FAB%staffa@niehs.nih.gov>
References: <C23FD757.3FAB%staffa@niehs.nih.gov>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCAED7@EX02.asurite.ad.asu.edu>

I believe that is what the spliced_seq method is for

$feat->spliced_seq    # the "joined" sequence, when there are
                      # multiple sub-locations

http://www.bioperl.org/wiki/Bptutorial.pl 

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Staffa, Nick (NIH/NIEHS)
> Sent: Monday, April 09, 2007 8:43 AM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Retrieve mRNA from Genome
> 
> I have been retrieving sub-sequence from Genbank genomic 
> records by use of Bio::SeqIO and ->get_SeqFeatures, ->start 
> ->end , but now I'm looking for a quick way to extract CDS or 
> mRNA from a multi-segmented annotation, e.g.
>      mRNA          
> join(72458..72791,84573..84613,93279..94419,94481..94656,
>                      
> 94719..94992,95056..95350,95438..95553,95614..96056)
> 
> Is there such a method?
> Please point me to appropriate documentation.


From cjfields at uiuc.edu  Mon Apr  9 16:50:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 9 Apr 2007 11:50:05 -0500
Subject: [Bioperl-l] parse blast report for the best evalue
In-Reply-To: <9898358.post@talk.nabble.com>
References: <9898358.post@talk.nabble.com>
Message-ID: <C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>

You should probably use sort_hits() with a coderef that sorts by  
evalue to ensure that you retrieve the best evalue (significance()  
for hits) (see POD for Bio::Search::Result::ResultI).  You could then  
do something like:

my $hit;

unless ($result->no_hits_found) {
    # pass coderef to sort by evalue
    $result->sort_hits(\&sort_by_evalue);
    # retrieve first (best) hit
    $hit = $result->next_hit;
}

# do whatever you want with the best Hit

If you plan on retaining data from hits over a ton of different  
reports it may be best (memory-wise) to only retain the data you want  
for each hit instead of retaining the actual object.  For instance,  
if you only care about the description and evalue set up a simple  
data structure to house what you want by the query data instead of  
retaining all the extra stuff in the Hit object you don't need (all  
the HSP data, etc).

chris

On Apr 8, 2007, at 11:18 PM, DeeGee wrote:

>
> hi all,
> i'm trying to parse a blast report using Bio::SearchIO as follows,  
> but since
> this blast report is generated with many against many (database) fasta
> sequences, there're many individual blast reports (one for each of the
> sequence from the query file). i was wondering if there is a way to  
> get only
> the best hit (with best evalue) from each one of them.
>
> ##### part of my script ######
> my $in = new Bio::SearchIO(-format => 'blast',  -file   =>  
> $blast_report);
> while( my $result = $in->next_result ) {
>         while( my $hit = $result->next_hit ) {
>               ...........
>
> thanks.
>
>
> -- 
> View this message in context: http://www.nabble.com/parse-blast- 
> report-for-the-best-evalue-tf3545784.html#a9898358
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr  9 19:40:02 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 9 Apr 2007 12:40:02 -0700 (PDT)
Subject: [Bioperl-l] parse blast report for the best evalue
In-Reply-To: <C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>
References: <9898358.post@talk.nabble.com>
	<C0BC1FCC-9BCA-45A5-9CDE-4BD366050AFE@uiuc.edu>
Message-ID: <9907757.post@talk.nabble.com>


thank you, Chris.
^ ^*

Chris Fields wrote:
> 
> You should probably use sort_hits() with a coderef that sorts by  
> evalue to ensure that you retrieve the best evalue (significance()  
> for hits) (see POD for Bio::Search::Result::ResultI).  You could then  
> do something like:
> 
> my $hit;
> 
> unless ($result->no_hits_found) {
>     # pass coderef to sort by evalue
>     $result->sort_hits(\&sort_by_evalue);
>     # retrieve first (best) hit
>     $hit = $result->next_hit;
> }
> 
> # do whatever you want with the best Hit
> 
> If you plan on retaining data from hits over a ton of different  
> reports it may be best (memory-wise) to only retain the data you want  
> for each hit instead of retaining the actual object.  For instance,  
> if you only care about the description and evalue set up a simple  
> data structure to house what you want by the query data instead of  
> retaining all the extra stuff in the Hit object you don't need (all  
> the HSP data, etc).
> 
> chris
> 
> On Apr 8, 2007, at 11:18 PM, DeeGee wrote:
> 
>>
>> hi all,
>> i'm trying to parse a blast report using Bio::SearchIO as follows,  
>> but since
>> this blast report is generated with many against many (database) fasta
>> sequences, there're many individual blast reports (one for each of the
>> sequence from the query file). i was wondering if there is a way to  
>> get only
>> the best hit (with best evalue) from each one of them.
>>
>> ##### part of my script ######
>> my $in = new Bio::SearchIO(-format => 'blast',  -file   =>  
>> $blast_report);
>> while( my $result = $in->next_result ) {
>>         while( my $hit = $result->next_hit ) {
>>               ...........
>>
>> thanks.
>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/parse-blast- 
>> report-for-the-best-evalue-tf3545784.html#a9898358
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/parse-blast-report-for-the-best-evalue-tf3545784.html#a9907757
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bosborne11 at verizon.net  Tue Apr 10 13:55:37 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Tue, 10 Apr 2007 09:55:37 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
Message-ID: <C2410F99.DA34%bosborne11@verizon.net>

OK, applied.


On 4/6/07 11:31 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu> wrote:

>   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need to
> check every line :(
>   +      if(DIE_ON_MISSMATCHED_LINES &&
>   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
>   +         my $fap= substr($_,0,20)."..";
>   +         $self->throw("Each line of the fasta entry must be the same length
> except the last.
>   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
>   +         }


From MEC at stowers-institute.org  Tue Apr 10 16:21:45 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Tue, 10 Apr 2007 11:21:45 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option
Message-ID: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>

Lincoln,

In `perldoc Bio::DB::SeqFeature::Store` I read:

"Caching requires the Tie::Cacher module to be installed. If the module
is not installed, then caching will silently be disabled."

I am wondering about the design motivation for silently disabling
caching when Tie::Cacher is not installed.  Perhaps at least emitting a
warning when -cache is requested and Tie::Cacher is not present is a
good idea?

I am writing a code that depends upon caching (i.e. upon the equality of
in-memory objects).

Do you advise that I don't depend upon Tie::Cacher working?  I
understand that it will NOT work as hoped if the cache is too small for
my application.

Thanks,

Malcolm Cook
Stowers Institute for Medical Research - Kansas City, Missouri
 

From cjfields at uiuc.edu  Tue Apr 10 16:31:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 11:31:43 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
Message-ID: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>

At the moment we do not have a comprehensive list up on the wiki.  I  
have been slowly working (alphabetically!) to switch them over, so  
any help would be appreciated.

I have CC'd this to the main mail list for anyone else interested.

chris

On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:

> Hey guys,
>
> I noticed there's an open task regarding moving testing code to use
> Test::More etc and that Chris and Nathan are already on to it. Is
> there any kind of wiki page that you keep track of which modules you
> are already working on? I am new to this and want to contribute,
> having a fair amount of unit testing from work, but don't want to step
> over other people's work and avoid duplication as well.
> Any pointers where i could get started would be much appreciated :-)
>
> Thanks,
> Spiros
>
> ps. apologies if this is not the correct list to post this, just
> seemed the most intuitive choice.
> _______________________________________________
> Bioperl-guts-l mailing list
> Bioperl-guts-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From spiros at lokku.com  Tue Apr 10 16:34:49 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Tue, 10 Apr 2007 17:34:49 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
Message-ID: <bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>

Okay, awesome, thank you for the info. I'll get started and see how it goes!

Spiros

On 4/10/07, Chris Fields <cjfields at uiuc.edu> wrote:
> At the moment we do not have a comprehensive list up on the wiki.  I
> have been slowly working (alphabetically!) to switch them over, so
> any help would be appreciated.
>
> I have CC'd this to the main mail list for anyone else interested.
>
> chris
>
> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>
> > Hey guys,
> >
> > I noticed there's an open task regarding moving testing code to use
> > Test::More etc and that Chris and Nathan are already on to it. Is
> > there any kind of wiki page that you keep track of which modules you
> > are already working on? I am new to this and want to contribute,
> > having a fair amount of unit testing from work, but don't want to step
> > over other people's work and avoid duplication as well.
> > Any pointers where i could get started would be much appreciated :-)
> >
> > Thanks,
> > Spiros
> >
> > ps. apologies if this is not the correct list to post this, just
> > seemed the most intuitive choice.
> > _______________________________________________
> > Bioperl-guts-l mailing list
> > Bioperl-guts-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Tue Apr 10 16:34:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 11:34:12 -0500
Subject: [Bioperl-l] Bio::DB::SeqFeature::Store -cache option
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E22C25@exchkc02.stowers-institute.org>
Message-ID: <0D396A53-9911-4304-88FE-CCD6884A2699@uiuc.edu>


On Apr 10, 2007, at 11:21 AM, Cook, Malcolm wrote:

> Lincoln,
>
> In `perldoc Bio::DB::SeqFeature::Store` I read:
>
> "Caching requires the Tie::Cacher module to be installed. If the  
> module
> is not installed, then caching will silently be disabled."
>
> I am wondering about the design motivation for silently disabling
> caching when Tie::Cacher is not installed.  Perhaps at least  
> emitting a
> warning when -cache is requested and Tie::Cacher is not present is a
> good idea?

...

Maybe this should be added to the optional BioPerl dependencies?   
It's not listed in Build.PL in CVS...

chris


From cjfields at uiuc.edu  Tue Apr 10 17:22:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 12:22:33 -0500
Subject: [Bioperl-l] ] moving tests to use Test::More
In-Reply-To: <bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<bba689ec0704100934i54e2933di82a1f2e597bc2b74@mail.gmail.com>
Message-ID: <DFAA7C75-BC52-4027-9816-5970404D1558@uiuc.edu>

When moving tests over be particularly careful of 'ok' tests which  
should be 'is'; a few older tests have display messages which make  
things tricky.  Use 'isa_ok', 'use_ok', 'require_ok', 'like', etc.  
where appropriate.

Also, we are not supporting TODO blocks at this time due to the  
upgrade needed for Test::Harness (which isn't necessary for BioPerl  
functionality).  Just use a skip block with a message if you run into  
something, like this (from RNA_SearchIO.t):

SKIP: {
     skip('Working on meta string building; TODO', 3);
     is($hsp->meta, 'blahblahblah', "HSP meta");
     # two more tests...
}

Thanks for helping out!

chris

On Apr 10, 2007, at 11:34 AM, Spiros Denaxas wrote:

> Okay, awesome, thank you for the info. I'll get started and see how  
> it goes!
>
> Spiros
...


From gopu_36 at yahoo.com  Tue Apr 10 07:42:26 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT)
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome
Message-ID: <9915265.post@talk.nabble.com>


Hi,
I am one of the newbee venturingout bioperl for my research purposes. I have
a whole genome sequence of a pathogen. I am trying to break them into
non-overlapping 1000bps subsequences. For example if my whole genome
sequence is 400000 bps length, then I should be beak them into 4000
subsequences of each 1000 bps and they should be non-overlapping but at the
same time continous. To be precise, my first substring would be from 1 to
1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone
help me. 
I tried with the following code but it gives me only the first substring and
rest are not! I would appreciate very much if someone could help me!
.........
.
.
my $start =1;
my $finish =100;
my $inseq  = Bio::SeqIO->new(-file => "$in_file");
while( my $seq = $inseq->next_seq ) {
	
	my $cleseq = $seq->seq();
	
	$seqlength = $seq->length();
	if ($finish<$seqlength){	
	print "The length of the sequence is $seqlength\n";	
	my $ordseq = $cleseq->subseq($start,$finish);
          push(@seq_array,$ordseq);
          $start=+100;
          $finish=+100;
          $counter++;
          next;          	             
       } 
}
-- 
View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gopu_36 at yahoo.com  Tue Apr 10 07:42:26 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Tue, 10 Apr 2007 00:42:26 -0700 (PDT)
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole genome
Message-ID: <9915265.post@talk.nabble.com>


Hi,
I am one of the newbee venturingout bioperl for my research purposes. I have
a whole genome sequence of a pathogen. I am trying to break them into
non-overlapping 1000bps subsequences. For example if my whole genome
sequence is 400000 bps length, then I should be beak them into 4000
subsequences of each 1000 bps and they should be non-overlapping but at the
same time continous. To be precise, my first substring would be from 1 to
1000 bps, second substing would be from 1001 to 2000 etcc.. Could anyone
help me. 
I tried with the following code but it gives me only the first substring and
rest are not! I would appreciate very much if someone could help me!
.........
.
.
my $start =1;
my $finish =100;
my $inseq  = Bio::SeqIO->new(-file => "$in_file");
while( my $seq = $inseq->next_seq ) {
	
	my $cleseq = $seq->seq();
	
	$seqlength = $seq->length();
	if ($finish<$seqlength){	
	print "The length of the sequence is $seqlength\n";	
	my $ordseq = $cleseq->subseq($start,$finish);
          push(@seq_array,$ordseq);
          $start=+100;
          $finish=+100;
          $counter++;
          next;          	             
       } 
}
-- 
View this message in context: http://www.nabble.com/extract-nonoverlapping-subsequences-from-a-whole-genome-tf3551560.html#a9915265
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From bix at sendu.me.uk  Tue Apr 10 20:10:35 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 10 Apr 2007 21:10:35 +0100
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
 genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <461BEF3B.3080708@sendu.me.uk>

gopu_36 wrote:
> Hi,
> I am one of the newbee venturingout bioperl for my research purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences.
[snip]
> I tried with the following code but it gives me only the first substring and
> rest are not! I would appreciate very much if someone could help me!
[snip]
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	             
>        } 
> }

Unless I've misunderstood, there are a few problems here.

I'm guessing $in_file is a file containing the entire genome sequence as 
a single sequence. This means your while loop will only loop once. To do 
what you want you then need another loop that acts on the single $seq 
object you're going to get. You don't need $cleseq, and in fact your 
script ought to crash on the $cleseq->subseq line because $cleseq is a 
string which has no subseq() method. $seq->subseq is what you want.

I didn't look at the remaining code.


Hope that helps,
Sendu.


From cjfields at uiuc.edu  Tue Apr 10 20:22:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 15:22:15 -0500
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
	genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <88E9CC63-48FD-444B-877D-12BB1D944214@uiuc.edu>

There is a script in the BioPerl scripts directory which does this,  
with optional overlaps (split_seq.PLS).  It's in /scripts/seq.

chris

On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:

>
> Hi,
> I am one of the newbee venturingout bioperl for my research  
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping  
> but at the
> same time continous. To be precise, my first substring would be  
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could  
> anyone
> help me.
> I tried with the following code but it gives me only the first  
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	
>        }
> }
> -- 
> View this message in context: http://www.nabble.com/extract- 
> nonoverlapping-subsequences-from-a-whole-genome- 
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Apr 10 20:57:20 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 10 Apr 2007 15:57:20 -0500
Subject: [Bioperl-l] extract nonoverlapping subsequences from a whole
	genome
In-Reply-To: <9915265.post@talk.nabble.com>
References: <9915265.post@talk.nabble.com>
Message-ID: <18529D36-C772-474A-9CE6-A29FA0C59ABA@uiuc.edu>

Okay, I was bored!  This is a little shorter than that script:

my $seqin = Bio::SeqIO->new(-format => 'fasta',
                             -file => shift);

my $seqout = Bio::SeqIO->new(-format => 'fasta',
                             -file => '>split.fas');

while( my $seq = $seqin->next_seq ) {
     my $seqlength = $seq->length();
     print STDERR "Length is $seqlength\n";
     my $start = 1;
     my $end = 100;
     my $desc = $seq->description;
     CHUNK:
     while ($end <= $seqlength){
         my $ordseq = $seq->trunc($start,$end);
         $ordseq->description("$start-$end $desc");
         $seqout->write_seq($ordseq);
         last CHUNK if $end >= $seqlength;
         $start += 100;
         $end = ($end + 100 > $seqlength) ? $seqlength : $end + 100;
     }
}

chris

On Apr 10, 2007, at 2:42 AM, gopu_36 wrote:

>
> Hi,
> I am one of the newbee venturingout bioperl for my research  
> purposes. I have
> a whole genome sequence of a pathogen. I am trying to break them into
> non-overlapping 1000bps subsequences. For example if my whole genome
> sequence is 400000 bps length, then I should be beak them into 4000
> subsequences of each 1000 bps and they should be non-overlapping  
> but at the
> same time continous. To be precise, my first substring would be  
> from 1 to
> 1000 bps, second substing would be from 1001 to 2000 etcc.. Could  
> anyone
> help me.
> I tried with the following code but it gives me only the first  
> substring and
> rest are not! I would appreciate very much if someone could help me!
> .........
> .
> .
> my $start =1;
> my $finish =100;
> my $inseq  = Bio::SeqIO->new(-file => "$in_file");
> while( my $seq = $inseq->next_seq ) {
> 	
> 	my $cleseq = $seq->seq();
> 	
> 	$seqlength = $seq->length();
> 	if ($finish<$seqlength){	
> 	print "The length of the sequence is $seqlength\n";	
> 	my $ordseq = $cleseq->subseq($start,$finish);
>           push(@seq_array,$ordseq);
>           $start=+100;
>           $finish=+100;
>           $counter++;
>           next;          	
>        }
> }
> -- 
> View this message in context: http://www.nabble.com/extract- 
> nonoverlapping-subsequences-from-a-whole-genome- 
> tf3551560.html#a9915265
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lstein at cshl.edu  Tue Apr 10 22:01:37 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 10 Apr 2007 18:01:37 -0400
Subject: [Bioperl-l] Bio::DB::Fasta check for ragged line widths
In-Reply-To: <C2410F99.DA34%bosborne11@verizon.net>
References: <200704070331.l373VTI22000@cricket.bio.indiana.edu>
	<C2410F99.DA34%bosborne11@verizon.net>
Message-ID: <6dce9a0b0704101501y15b96e20w89c4b9ef4abc1b48@mail.gmail.com>

I'm happy I didn't catch this thread until just now, but my preferred course
of action was to do exactly what Brian did and accept the patch.

Lincoln

On 4/10/07, Brian Osborne <bosborne11 at verizon.net> wrote:
>
> OK, applied.
>
>
> On 4/6/07 11:31 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu>
> wrote:
>
> >   +      $l3_len= $l2_len; $l2_len= $l_len; $l_len= length($_); # need
> to
> > check every line :(
> >   +      if(DIE_ON_MISSMATCHED_LINES &&
> >   +        $l3_len>0 && $l2_len>0 && $l3_len!=$l2_len) {
> >   +         my $fap= substr($_,0,20)."..";
> >   +         $self->throw("Each line of the fasta entry must be the same
> length
> > except the last.
> >   +  Line above #$. '$fap' is $l2_len != $l3_len chars.");
> >   +         }
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From heikki at sanbi.ac.za  Wed Apr 11 09:14:27 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 11 Apr 2007 11:14:27 +0200
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
Message-ID: <200704111114.27839.heikki@sanbi.ac.za>

What is going on here? Can anyone remember doing this?

	-Heikki 

Please can I ask what is the purpose of the line @pos = sort @pos; in
the select_noncont subroutine of SimpleAlign.pm.

 
In previous versions this line was not present and I could use the
function to reorder the alignment e.g in an alignment with 5 sequences I
could reorder it to put the second sequence last using
$aln->select_noncont(1,3,4,5,2). The sort function stops this, but even
if the idea is to sort numerically this dos not work since the sort
function as is will put 10 before 2, so that
->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .

 
Many thanks

 
Anthony


From cjfields at uiuc.edu  Wed Apr 11 12:33:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 07:33:42 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za>
References: <200704111114.27839.heikki@sanbi.ac.za>
Message-ID: <F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>

Don't know when this was added.  Maybe we should make the sorting  
optional?  In other words, pass an optional 'nosort' string as the  
first arg, defaulting to numerical sort.

Either way the sort needs to be changed by the looks of it.  I'll  
verify the bug and commit today.

chris

On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote:

> What is going on here? Can anyone remember doing this?
>
> 	-Heikki
>
> Please can I ask what is the purpose of the line @pos = sort @pos; in
> the select_noncont subroutine of SimpleAlign.pm.
>
>
>
> In previous versions this line was not present and I could use the
> function to reorder the alignment e.g in an alignment with 5  
> sequences I
> could reorder it to put the second sequence last using
> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but  
> even
> if the idea is to sort numerically this dos not work since the sort
> function as is will put 10 before 2, so that
> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>
>
>
> Many thanks
>
>
>
> Anthony
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lzlgboy at gmail.com  Wed Apr 11 12:48:30 2007
From: lzlgboy at gmail.com (kenzy ken)
Date: Wed, 11 Apr 2007 20:48:30 +0800
Subject: [Bioperl-l] How to Remove root node from a tree, ???
Message-ID: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>

Hi all:
    I write a script which used the Bio::Tree module. I want to remove some
nodes from the tree, so I used "$tree->remove_Node($node_object);method . It
works ok, but when I remove root node, problem happened. It seens that this
method can not remove root node, so ,if you guys have any idea about how to
remove the root ,it will be very appreciated.

-- 
???
Chen,Kenian
===========================
School of Life Science, Sun Yat-Sen University
===========================
Xingang Xilu 135
Guangzhou, Guangdong 510275
P. R. China
===========================
Phone: (86) 20-84113677; (86) 20-34474683;
Fax: (86) 20-34022356
===========================
Email:lzlgboy at gmail.com;
chenkn at mail2.sysu.edu.cn


From cjfields at uiuc.edu  Wed Apr 11 13:13:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 08:13:40 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<F42F220E-1E2A-410F-8F54-CBB6660C29F5@uiuc.edu>
Message-ID: <9DE1A554-4F33-45D1-9043-732FEB86ECD5@uiuc.edu>

I confirmed this; it is now fixed in CVS.  I have also added the  
option to prevent sorting if needed:

$aln2 = $aln->select_noncont(6,7,8,9,10,1,2,3,4,5);

sorts numerically by default.

$aln2 = $aln->select_noncont('nosort',6,7,8,9,10,1,2,3,4,5);

prevents sorting.  I have added a few tests to SimpleAlign.t for  
these.  It doesn't change the default behavior so shouldn't break  
anything.

chris

On Apr 11, 2007, at 7:33 AM, Chris Fields wrote:

> Don't know when this was added.  Maybe we should make the sorting
> optional?  In other words, pass an optional 'nosort' string as the
> first arg, defaulting to numerical sort.
>
> Either way the sort needs to be changed by the looks of it.  I'll
> verify the bug and commit today.
>
> chris
>
> On Apr 11, 2007, at 4:14 AM, Heikki Lehvaslaiho wrote:
>
>> What is going on here? Can anyone remember doing this?
>>
>> 	-Heikki
>>
>> Please can I ask what is the purpose of the line @pos = sort @pos; in
>> the select_noncont subroutine of SimpleAlign.pm.
>>
>>
>>
>> In previous versions this line was not present and I could use the
>> function to reorder the alignment e.g in an alignment with 5
>> sequences I
>> could reorder it to put the second sequence last using
>> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but
>> even
>> if the idea is to sort numerically this dos not work since the sort
>> function as is will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>
>>
>>
>> Many thanks
>>
>>
>>
>> Anthony
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 11 13:21:25 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 11 Apr 2007 14:21:25 +0100
Subject: [Bioperl-l] How to Remove root node from a tree, ???
In-Reply-To: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>
References: <d78b3d40704110548q7756d236h57c490bda6be1854@mail.gmail.com>
Message-ID: <461CE0D5.9040001@sendu.me.uk>

kenzy ken wrote:
> Hi all:
>    I write a script which used the Bio::Tree module. I want to remove some
> nodes from the tree, so I used "$tree->remove_Node($node_object);method 
> . It
> works ok, but when I remove root node, problem happened. It seens that this
> method can not remove root node, so ,if you guys have any idea about how to
> remove the root ,it will be very appreciated.

You'll have to re-root the tree to some other node in the tree. See the 
reroot() method.

(I don't think Bio::Tree::Tree objects can be unrooted.)


From emeric.sevin at univ-rennes1.fr  Wed Apr 11 13:32:38 2007
From: emeric.sevin at univ-rennes1.fr (Emeric Sevin)
Date: Wed, 11 Apr 2007 15:32:38 +0200
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
Message-ID: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>

Hi everybody,

I'm sorry to bug, but either I missed something so obvious nobody 
bothered to answer, either I'm being a little boycotted here...
A little help would be very much appreciated

Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :

> Hello,
>
> I am new to this community, and apologize if this subject has been 
> posted before.
>
> I want to print out only selected results from a multiple 
> blast-alignments results file. Problem is, the algorithm used is 
> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the actual 
> writing task yields "unclean" warnings. Although an ouput is actually 
> written, the writer (Bio::SearchIO::Writer::TextResultWriter) seems to 
> be disturbed by the fact rpsblast DBs are not labeled with 
> "protein"/"nucleic"/"translated".
> Does anybody know of an easy fix to that bug, or of another way to 
> come around it?
>
> Thank you very much
>
> Emeric SEVIN
> Universit? de Rennes 1_______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 1110 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/9784f194/attachment-0004.bin>

From cjfields at uiuc.edu  Wed Apr 11 14:44:27 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 09:44:27 -0500
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
Message-ID: <D0E54B3C-A345-4A90-9571-25144622265D@uiuc.edu>

We could ignore this post... oh the irony!  ;>

It has nothing to do with ignoring you.  Read this:

http://en.wikipedia.org/wiki/Warnock's_Dilemma

Basically your question probably fell on deaf ears b/c no one has  
time to look into it and post a fix.  Realize that BioPerl is, for  
the large part, a volunteer effort and we all have $jobs to worry  
about.  If you want you are more than welcome to file a bug on this  
(if it isn't already filed), which is the best way to make sure  
something is done:

http://www.bioperl.org/wiki/Bugs
http://bugzilla.open-bio.org/

chris


On Apr 11, 2007, at 8:32 AM, Emeric Sevin wrote:

> Hi everybody,
>
> I'm sorry to bug, but either I missed something so obvious nobody  
> bothered to answer, either I'm being a little boycotted here...
> A little help would be very much appreciated
>
> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>
>> Hello,
>>
>> I am new to this community, and apologize if this subject has been  
>> posted before.
>>
>> I want to print out only selected results from a multiple blast- 
>> alignments results file. Problem is, the algorithm used is  
>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the  
>> actual writing task yields "unclean" warnings. Although an ouput  
>> is actually written, the writer  
>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by  
>> the fact rpsblast DBs are not labeled with  
>> "protein"/"nucleic"/"translated".
>> Does anybody know of an easy fix to that bug, or of another way to  
>> come around it?
>>
>> Thank you very much
>>
>> Emeric SEVIN
>> Universit? de Rennes 1_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From n.haigh at sheffield.ac.uk  Wed Apr 11 14:30:11 2007
From: n.haigh at sheffield.ac.uk (Nathan Haigh)
Date: Wed, 11 Apr 2007 15:30:11 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
Message-ID: <461CF0F3.1010708@sheffield.ac.uk>

It should be easy enough to find those t/*.t files that have "use Test;" 
or "require Test;" This should provide a list of files still needing to 
be converted over to Test::More. As discussed previously, it may also be 
useful to use Test::Exception to test for situations where 
exceptions/warnings are thrown. If you add additional tests using this 
module, you should add the Test::Exception module to t/lib/

Good luck, and feel free to mail the list with questions/comments etc.

Nath


Chris Fields wrote:
> At the moment we do not have a comprehensive list up on the wiki.  I  
> have been slowly working (alphabetically!) to switch them over, so  
> any help would be appreciated.
>
> I have CC'd this to the main mail list for anyone else interested.
>
> chris
>
> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>
>   
>> Hey guys,
>>
>> I noticed there's an open task regarding moving testing code to use
>> Test::More etc and that Chris and Nathan are already on to it. Is
>> there any kind of wiki page that you keep track of which modules you
>> are already working on? I am new to this and want to contribute,
>> having a fair amount of unit testing from work, but don't want to step
>> over other people's work and avoid duplication as well.
>> Any pointers where i could get started would be much appreciated :-)
>>
>> Thanks,
>> Spiros
>>
>> ps. apologies if this is not the correct list to post this, just
>> seemed the most intuitive choice.
>> _______________________________________________
>> Bioperl-guts-l mailing list
>> Bioperl-guts-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>>     
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>   


From spiros at lokku.com  Wed Apr 11 14:56:22 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Wed, 11 Apr 2007 15:56:22 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <461CF0F3.1010708@sheffield.ac.uk>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
Message-ID: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>

Yep! I have some rough stats I have at home, I will post them later on
tonight. Roughly, if i remember correctly, 50% of the tests were still
using Test, all the others were using Test::More.

More to follow later on,
Spiros

On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> It should be easy enough to find those t/*.t files that have "use Test;"
> or "require Test;" This should provide a list of files still needing to
> be converted over to Test::More. As discussed previously, it may also be
> useful to use Test::Exception to test for situations where
> exceptions/warnings are thrown. If you add additional tests using this
> module, you should add the Test::Exception module to t/lib/
>
> Good luck, and feel free to mail the list with questions/comments etc.
>
> Nath
>
>
> Chris Fields wrote:
> > At the moment we do not have a comprehensive list up on the wiki.  I
> > have been slowly working (alphabetically!) to switch them over, so
> > any help would be appreciated.
> >
> > I have CC'd this to the main mail list for anyone else interested.
> >
> > chris
> >
> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> >
> >
> >> Hey guys,
> >>
> >> I noticed there's an open task regarding moving testing code to use
> >> Test::More etc and that Chris and Nathan are already on to it. Is
> >> there any kind of wiki page that you keep track of which modules you
> >> are already working on? I am new to this and want to contribute,
> >> having a fair amount of unit testing from work, but don't want to step
> >> over other people's work and avoid duplication as well.
> >> Any pointers where i could get started would be much appreciated :-)
> >>
> >> Thanks,
> >> Spiros
> >>
> >> ps. apologies if this is not the correct list to post this, just
> >> seemed the most intuitive choice.
> >> _______________________________________________
> >> Bioperl-guts-l mailing list
> >> Bioperl-guts-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> >>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>


From Kevin.M.Brown at asu.edu  Wed Apr 11 15:14:07 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 11 Apr 2007 08:14:07 -0700
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <200704111114.27839.heikki@sanbi.ac.za>
References: <200704111114.27839.heikki@sanbi.ac.za>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>

> What is going on here? Can anyone remember doing this?
> 
> 	-Heikki 
> 
> Please can I ask what is the purpose of the line @pos = sort 
> @pos; in the select_noncont subroutine of SimpleAlign.pm.
> 
>  
> 
> In previous versions this line was not present and I could 
> use the function to reorder the alignment e.g in an alignment 
> with 5 sequences I could reorder it to put the second 
> sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
> function stops this, but even if the idea is to sort 
> numerically this dos not work since the sort function as is 
> will put 10 before 2, so that
> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .

Not sure why 10 would come before 2 since perl would interpret that list
as a series of integers even if they were entered as strings and do the
sort.


From spiros at lokku.com  Wed Apr 11 15:51:27 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Wed, 11 Apr 2007 16:51:27 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <bba689ec0704110851qb1aa272m5db4e01356f28e92@mail.gmail.com>

This looks like the case of cmp vs <=> I think !

my @array = (1,10,2,3,4,5,6,7,8,9) ;
print join(",", @array), "\n";
my @sorted1 = sort(@array) ;
print join(",", @sorted1), "\n";
my @sorted2 = (sort { $a <=> $b } @array);
print join(",", @sorted2), "\n";

idaru:/tmp spiros$ perl koko.pl
1,10,2,3,4,5,6,7,8,9 # normal array
1,10,2,3,4,5,6,7,8,9 # sorted with sort
1,2,3,4,5,6,7,8,9,10 # sorted with <=>

Spiros


On 4/11/07, Kevin Brown <Kevin.M.Brown at asu.edu> wrote:
> > What is going on here? Can anyone remember doing this?
> >
> >       -Heikki
> >
> > Please can I ask what is the purpose of the line @pos = sort
> > @pos; in the select_noncont subroutine of SimpleAlign.pm.
> >
> >
> >
> > In previous versions this line was not present and I could
> > use the function to reorder the alignment e.g in an alignment
> > with 5 sequences I could reorder it to put the second
> > sequence last using $aln->select_noncont(1,3,4,5,2). The sort
> > function stops this, but even if the idea is to sort
> > numerically this dos not work since the sort function as is
> > will put 10 before 2, so that
> > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ak at ebi.ac.uk  Wed Apr 11 15:58:52 2007
From: ak at ebi.ac.uk (Andreas Kahari)
Date: Wed, 11 Apr 2007 16:58:52 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <20070411155852.GC24537@ebi.ac.uk>

On Wed, Apr 11, 2007 at 08:14:07AM -0700, Kevin Brown wrote:
> > What is going on here? Can anyone remember doing this?
> > 
> > 	-Heikki 
> > 
> > Please can I ask what is the purpose of the line @pos = sort 
> > @pos; in the select_noncont subroutine of SimpleAlign.pm.
> > 
> >  
> > 
> > In previous versions this line was not present and I could 
> > use the function to reorder the alignment e.g in an alignment 
> > with 5 sequences I could reorder it to put the second 
> > sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
> > function stops this, but even if the idea is to sort 
> > numerically this dos not work since the sort function as is 
> > will put 10 before 2, so that
> > ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
> > the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> 
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.

Really?

$ perl -e 'print join(" ", sort(1..20)), "\n"';
1 10 11 12 13 14 15 16 17 18 19 2 20 3 4 5 6 7 8 9


-- 
Andreas K?h?ri :: Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
-------------------*=<>=*-------------------


From mkiwala at watson.wustl.edu  Wed Apr 11 15:51:35 2007
From: mkiwala at watson.wustl.edu (Michael Kiwala)
Date: Wed, 11 Apr 2007 10:51:35 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <461D0407.8050105@watson.wustl.edu>

Kevin Brown wrote:
>> What is going on here? Can anyone remember doing this?
>>
>> 	-Heikki 
>>
>> Please can I ask what is the purpose of the line @pos = sort 
>> @pos; in the select_noncont subroutine of SimpleAlign.pm.
>>
>>  
>>
>> In previous versions this line was not present and I could 
>> use the function to reorder the alignment e.g in an alignment 
>> with 5 sequences I could reorder it to put the second 
>> sequence last using $aln->select_noncont(1,3,4,5,2). The sort 
>> function stops this, but even if the idea is to sort 
>> numerically this dos not work since the sort function as is 
>> will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>     
>
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.
>
>   
Because, according to the documentation for Perl's sort function, 
sorting occurs "in standard string comparison order" unless the user 
specifies another comparison function to use.


From cjfields at uiuc.edu  Wed Apr 11 16:45:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 11:45:11 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
Message-ID: <FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>

We should probably place something on the wiki to prevent overlaps  
(i.e. make sure no two devs are working on the same tests).  I  
planned on working on the G's last night but got bogged down.

Spiros, if you haven't already go ahead and create a list on a wiki  
page for tracking.  We can lay claim to them by tagging with our sigs  
and cross them off once complete.

chris

On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:

> Yep! I have some rough stats I have at home, I will post them later on
> tonight. Roughly, if i remember correctly, 50% of the tests were still
> using Test, all the others were using Test::More.
>
> More to follow later on,
> Spiros
>
> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
>> It should be easy enough to find those t/*.t files that have "use  
>> Test;"
>> or "require Test;" This should provide a list of files still  
>> needing to
>> be converted over to Test::More. As discussed previously, it may  
>> also be
>> useful to use Test::Exception to test for situations where
>> exceptions/warnings are thrown. If you add additional tests using  
>> this
>> module, you should add the Test::Exception module to t/lib/
>>
>> Good luck, and feel free to mail the list with questions/comments  
>> etc.
>>
>> Nath
>>
>>
>> Chris Fields wrote:
>> > At the moment we do not have a comprehensive list up on the  
>> wiki.  I
>> > have been slowly working (alphabetically!) to switch them over, so
>> > any help would be appreciated.
>> >
>> > I have CC'd this to the main mail list for anyone else interested.
>> >
>> > chris
>> >
>> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>> >
>> >
>> >> Hey guys,
>> >>
>> >> I noticed there's an open task regarding moving testing code to  
>> use
>> >> Test::More etc and that Chris and Nathan are already on to it. Is
>> >> there any kind of wiki page that you keep track of which  
>> modules you
>> >> are already working on? I am new to this and want to contribute,
>> >> having a fair amount of unit testing from work, but don't want  
>> to step
>> >> over other people's work and avoid duplication as well.
>> >> Any pointers where i could get started would be much  
>> appreciated :-)
>> >>
>> >> Thanks,
>> >> Spiros
>> >>
>> >> ps. apologies if this is not the correct list to post this, just
>> >> seemed the most intuitive choice.
>> >> _______________________________________________
>> >> Bioperl-guts-l mailing list
>> >> Bioperl-guts-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>> >>
>> >
>> > Christopher Fields
>> > Postdoctoral Researcher
>> > Lab of Dr. Robert Switzer
>> > Dept of Biochemistry
>> > University of Illinois Urbana-Champaign
>> >
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 11 16:09:54 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 11 Apr 2007 17:09:54 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
Message-ID: <461D0852.9070802@sendu.me.uk>

Kevin Brown wrote:
>>  but even if the idea is to sort
>> numerically this dos not work since the sort function as is 
>> will put 10 before 2, so that
>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the sequences in
>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> 
> Not sure why 10 would come before 2 since perl would interpret that list
> as a series of integers even if they were entered as strings and do the
> sort.

The default sort for sort() is { $a cmp $b } (standard string comparison 
order): 10 comes before 2.

The fix was to explicitly say sort { $a <=> $b } for a numeric sort.


From cjfields at uiuc.edu  Wed Apr 11 16:46:46 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 11:46:46 -0500
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <461D0407.8050105@watson.wustl.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
	<461D0407.8050105@watson.wustl.edu>
Message-ID: <7001A1A4-5CF4-4C70-8EFA-94AF0D16864C@uiuc.edu>

I have confirmed the bug and fixed this in CVS.  Kevin's right; sort  
defaults to string comparison if no subroutine or sort block is  
specified.

perldoc -f sort:

sort SUBNAME LIST
sort BLOCK LIST
sort LIST
...
If SUBNAME or BLOCK is omitted, "sort"s in standard string com-
parison order.
...

chris

On Apr 11, 2007, at 10:51 AM, Michael Kiwala wrote:

> Kevin Brown wrote:
>>> What is going on here? Can anyone remember doing this?
>>>
>>> 	-Heikki
>>>
>>> Please can I ask what is the purpose of the line @pos = sort
>>> @pos; in the select_noncont subroutine of SimpleAlign.pm.
>>>
>>>
>>>
>>> In previous versions this line was not present and I could
>>> use the function to reorder the alignment e.g in an alignment
>>> with 5 sequences I could reorder it to put the second
>>> sequence last using $aln->select_noncont(1,3,4,5,2). The sort
>>> function stops this, but even if the idea is to sort
>>> numerically this dos not work since the sort function as is
>>> will put 10 before 2, so that
>>> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the  
>>> sequences in
>>> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
>>>
>>
>> Not sure why 10 would come before 2 since perl would interpret  
>> that list
>> as a series of integers even if they were entered as strings and  
>> do the
>> sort.
>>
>>
> Because, according to the documentation for Perl's sort function,
> sorting occurs "in standard string comparison order" unless the user
> specifies another comparison function to use.


From heikki at sanbi.ac.za  Wed Apr 11 16:39:57 2007
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 11 Apr 2007 18:39:57 +0200
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
Message-ID: <200704111839.58940.heikki@sanbi.ac.za>

A bit more than half is still using Test:

~/src/bioperl/core/t>  perl -lne 'print $1 if /use +(Test[^\sO;]*);/' *t | 
sort | uniq -c | sort -nr
    147 Test
     97 Test::More


Feel free to add scripts and functionality into core/maintenance directory of 
bioperl-live if you want to keep track of things in modules and tests.

	-Heikki


On Wednesday 11 April 2007 16:56:22 Spiros Denaxas wrote:
> Yep! I have some rough stats I have at home, I will post them later on
> tonight. Roughly, if i remember correctly, 50% of the tests were still
> using Test, all the others were using Test::More.
>
> More to follow later on,
> Spiros
>
> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> > It should be easy enough to find those t/*.t files that have "use Test;"
> > or "require Test;" This should provide a list of files still needing to
> > be converted over to Test::More. As discussed previously, it may also be
> > useful to use Test::Exception to test for situations where
> > exceptions/warnings are thrown. If you add additional tests using this
> > module, you should add the Test::Exception module to t/lib/
> >
> > Good luck, and feel free to mail the list with questions/comments etc.
> >
> > Nath
> >
> > Chris Fields wrote:
> > > At the moment we do not have a comprehensive list up on the wiki.  I
> > > have been slowly working (alphabetically!) to switch them over, so
> > > any help would be appreciated.
> > >
> > > I have CC'd this to the main mail list for anyone else interested.
> > >
> > > chris
> > >
> > > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> > >> Hey guys,
> > >>
> > >> I noticed there's an open task regarding moving testing code to use
> > >> Test::More etc and that Chris and Nathan are already on to it. Is
> > >> there any kind of wiki page that you keep track of which modules you
> > >> are already working on? I am new to this and want to contribute,
> > >> having a fair amount of unit testing from work, but don't want to step
> > >> over other people's work and avoid duplication as well.
> > >> Any pointers where i could get started would be much appreciated :-)
> > >>
> > >> Thanks,
> > >> Spiros
> > >>
> > >> ps. apologies if this is not the correct list to post this, just
> > >> seemed the most intuitive choice.
> > >> _______________________________________________
> > >> Bioperl-guts-l mailing list
> > >> Bioperl-guts-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> > >
> > > Christopher Fields
> > > Postdoctoral Researcher
> > > Lab of Dr. Robert Switzer
> > > Dept of Biochemistry
> > > University of Illinois Urbana-Champaign
> > >
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Associate Professor    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From stewarta at nmrc.navy.mil  Wed Apr 11 18:40:18 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Wed, 11 Apr 2007 14:40:18 -0400
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
Message-ID: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>

First of all, mucho kudos to those who revamped this module.  It  
works really nice.  I have a couple thoughts..

* The .predict file from Glimmer provides frame and score information  
which could be parsed and included in the generated feature prediction

* It'd be nice to include the orfID somewhere on the feature  
prediction..  maybe the seqID ? (these could be post-processed into  
locus_tags for those using Glimmer as a preliminary annotation tool)

* Options to set the source and primary tags to something other than  
the default (ie) Glimmer3.X and 'transcript'.  This could always be  
done post-Bio::Tools::Glimmer, though, of course.

* This section..

         elsif (
                # Glimmer 2.X prediction
                (/^\s+(\d+)\s+      # gene num
                 (\d+)\s+(\d+)\s+   # start, end
                 \[([\+\-])\d{1}\s+ # strand
                 /ox ) ||
                # Glimmer 3.X prediction
                (/\w+(\d+)\s+       # orf (numeric portion)
                 (\d+)\s+(\d+)\s+   # start, end
                 ([\+\-])\d{1}\s+   # strand
                /ox)) {
	    my ($genenum,$start,$end,$strand) =
		( $1,$2,$3,$4 );

...isn't picking up more than the last digit in the orf-number.  Not  
sure if that's intentional.  A sample of the feature output using - 
 >gff_string shows up as ...

test-pseudocontig       Glimmer_3.X     transcript      1018     
8       .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      1134     
1736    .       +       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      1832     
2596    .       +       .       Group GenePrediction_4
test-pseudocontig       Glimmer_3.X     transcript      2710     
3225    .       +       .       Group GenePrediction_5
test-pseudocontig       Glimmer_3.X     transcript      3246     
4016    .       +       .       Group GenePrediction_6
test-pseudocontig       Glimmer_3.X     transcript      4177     
5064    .       +       .       Group GenePrediction_7
test-pseudocontig       Glimmer_3.X     transcript      5083     
5673    .       +       .       Group GenePrediction_8
test-pseudocontig       Glimmer_3.X     transcript      6001     
7275    .       +       .       Group GenePrediction_9
test-pseudocontig       Glimmer_3.X     transcript      7530     
8081    .       +       .       Group GenePrediction_0
test-pseudocontig       Glimmer_3.X     transcript      8785     
8117    .       -       .       Group GenePrediction_1
test-pseudocontig       Glimmer_3.X     transcript      9423     
8788    .       -       .       Group GenePrediction_2
test-pseudocontig       Glimmer_3.X     transcript      10088    
9549    .       -       .       Group GenePrediction_3

...which was parsed originally from...

orf00001     1018        8  -2     2.95
orf00002     1134     1736  +3     2.91
orf00004     1832     2596  +2     2.93
orf00005     2710     3225  +1     2.90
orf00006     3246     4016  +3     2.93
orf00007     4177     5064  +1     2.94
orf00008     5083     5673  +1     2.91
orf00009     6001     7275  +1     2.96
orf00010     7530     8081  +3     2.58
orf00011     8785     8117  -2     2.92
orf00012     9423     8788  -1     2.81
orf00013    10088     9549  -3     2.90

* It'd also be nice if you could somehow set the string that is  
placed in front of the orf-number in the line...

                  '-tag'         => { 'Group' => "GenePrediction_ 
$genenum"},

...seeing as how these tag/values can't seem to be changed manually  
anymore without getting into AnnotationCollection stuff, which is no  
longer a simple matter of changing a tag/value string.  (By the way,  
where can I find a list of AnnotationCollectionI compliant objects?)


Any thoughts on the suggestions?  (I don't mind taking a stab at  
incorporating them into the code.. I've never submitted anything to  
BioPerl before)


-Andrew


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From cjfields at uiuc.edu  Wed Apr 11 19:53:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 11 Apr 2007 14:53:54 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
Message-ID: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>

I'm posting this to the mail list in case anyone has any ideas on  
what is going on...

I have noticed an odd (read: annoying) rash of spam on the wiki.   
Jason also ran some spam reversions, so maybe he can chime in.   
Essentially it looks like the responsible spambots 'correct' the wiki  
text and links, so that '+' is being removed and URI-encoded symbols  
in links are reverted to symbols.  Unfortunately the removal occurs  
in all text, so places where '+' is intended (for instance, raw text  
for showing example record formats) are also changed.  My guess is  
we'll need to block the IP address or add to the blacklist if possible.

Between Jason and I we have blocked ~9 spambots and counting.   
Couldn't find anything via Google yet...

chris


From torsten.seemann at infotech.monash.edu.au  Thu Apr 12 00:33:02 2007
From: torsten.seemann at infotech.monash.edu.au (Torsten Seemann)
Date: Thu, 12 Apr 2007 10:33:02 +1000
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
Message-ID: <a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>

Andrew,

>                 # Glimmer 3.X prediction
>                 (/\w+(\d+)\s+       # orf (numeric portion)
> ...isn't picking up more than the last digit in the orf-number.  Not
> sure if that's intentional.  A sample of the feature output using -
>  >gff_string shows up as ...

I think that regexp should be \w+?(\d+)

ie. the \w+ should be non-greedy, otherwise it will swallow up all but
one of the following \d+ (as \d is a subset of \w)

I've CC:ed this to Mark Johnson who made the recent changes to this module.

Thanks for your feedback,

--Torsten Seemann


From spiros at lokku.com  Thu Apr 12 01:08:47 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 02:08:47 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
Message-ID: <bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>

Good idea Chris. Just got back home so will probably do it tomorrow
morning or so.

Spiros

On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> We should probably place something on the wiki to prevent overlaps
> (i.e. make sure no two devs are working on the same tests).  I
> planned on working on the G's last night but got bogged down.
>
> Spiros, if you haven't already go ahead and create a list on a wiki
> page for tracking.  We can lay claim to them by tagging with our sigs
> and cross them off once complete.
>
> chris
>
> On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
>
> > Yep! I have some rough stats I have at home, I will post them later on
> > tonight. Roughly, if i remember correctly, 50% of the tests were still
> > using Test, all the others were using Test::More.
> >
> > More to follow later on,
> > Spiros
> >
> > On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> >> It should be easy enough to find those t/*.t files that have "use
> >> Test;"
> >> or "require Test;" This should provide a list of files still
> >> needing to
> >> be converted over to Test::More. As discussed previously, it may
> >> also be
> >> useful to use Test::Exception to test for situations where
> >> exceptions/warnings are thrown. If you add additional tests using
> >> this
> >> module, you should add the Test::Exception module to t/lib/
> >>
> >> Good luck, and feel free to mail the list with questions/comments
> >> etc.
> >>
> >> Nath
> >>
> >>
> >> Chris Fields wrote:
> >> > At the moment we do not have a comprehensive list up on the
> >> wiki.  I
> >> > have been slowly working (alphabetically!) to switch them over, so
> >> > any help would be appreciated.
> >> >
> >> > I have CC'd this to the main mail list for anyone else interested.
> >> >
> >> > chris
> >> >
> >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> >> >
> >> >
> >> >> Hey guys,
> >> >>
> >> >> I noticed there's an open task regarding moving testing code to
> >> use
> >> >> Test::More etc and that Chris and Nathan are already on to it. Is
> >> >> there any kind of wiki page that you keep track of which
> >> modules you
> >> >> are already working on? I am new to this and want to contribute,
> >> >> having a fair amount of unit testing from work, but don't want
> >> to step
> >> >> over other people's work and avoid duplication as well.
> >> >> Any pointers where i could get started would be much
> >> appreciated :-)
> >> >>
> >> >> Thanks,
> >> >> Spiros
> >> >>
> >> >> ps. apologies if this is not the correct list to post this, just
> >> >> seemed the most intuitive choice.
> >> >> _______________________________________________
> >> >> Bioperl-guts-l mailing list
> >> >> Bioperl-guts-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> >> >>
> >> >
> >> > Christopher Fields
> >> > Postdoctoral Researcher
> >> > Lab of Dr. Robert Switzer
> >> > Dept of Biochemistry
> >> > University of Illinois Urbana-Champaign
> >> >
> >> >
> >> >
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From Kevin.M.Brown at asu.edu  Thu Apr 12 15:24:15 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Thu, 12 Apr 2007 08:24:15 -0700
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <461D0407.8050105@watson.wustl.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>
	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>
	<461D0407.8050105@watson.wustl.edu>
Message-ID: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>

> >> What is going on here? Can anyone remember doing this?

> >> Please can I ask what is the purpose of the line @pos = 
> sort @pos; in 
> >> the select_noncont subroutine of SimpleAlign.pm.
> >>
> >>  
> >>
> >> In previous versions this line was not present and I could use the 
> >> function to reorder the alignment e.g in an alignment with 5 
> >> sequences I could reorder it to put the second sequence last using 
> >> $aln->select_noncont(1,3,4,5,2). The sort function stops this, but 
> >> even if the idea is to sort numerically this dos not work 
> since the 
> >> sort function as is will put 10 before 2, so that
> >> ->select_noncont(1,2,3,4,5,6,7,8,9,10) would reorder the 
> sequences in
> >> the alignment to be 1, 10, 2, 3, 4,5, 6, 7, 8,9 .
> >>     
> >
> > Not sure why 10 would come before 2 since perl would interpret that 
> > list as a series of integers even if they were entered as 
> strings and 
> > do the sort.
> >
> >   
> Because, according to the documentation for Perl's sort 
> function, sorting occurs "in standard string comparison 
> order" unless the user specifies another comparison function to use.

OK, guess I never realized that since I've used just "sort @array" and
gotten things back how I expected them to be.


From bix at sendu.me.uk  Thu Apr 12 15:58:53 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 12 Apr 2007 16:58:53 +0100
Subject: [Bioperl-l] Fwd: SimpleAlign bug?
In-Reply-To: <1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>
References: <200704111114.27839.heikki@sanbi.ac.za>	<1A4207F8295607498283FE9E93B775B402FCB2C7@EX02.asurite.ad.asu.edu>	<461D0407.8050105@watson.wustl.edu>
	<1A4207F8295607498283FE9E93B775B402FCB4AE@EX02.asurite.ad.asu.edu>
Message-ID: <461E573D.1060906@sendu.me.uk>

Kevin Brown wrote:
>> Because, according to the documentation for Perl's sort 
>> function, sorting occurs "in standard string comparison 
>> order" unless the user specifies another comparison function to use.
> 
> OK, guess I never realized that since I've used just "sort @array" and
> gotten things back how I expected them to be.

If you were sorting numbers, getting the order wrong either didn't 
matter or you didn't notice the problem. Not realizing sort won't do 
what you expect in this case is a common source of bugs.

It might be worth it for you (and anyone else) to go through your old 
code to make sure you haven't been bitten.


From johnsonm at gmail.com  Thu Apr 12 17:26:33 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 12:26:33 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
Message-ID: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>

    I'd call that a buggy regexp.  Sounds like a good (but minimal)
fix.  Torsten, I don't have cvs write access, I think you do, can you
fix that up?  Andrew, can you file that as a bug:

http://bugzilla.bioperl.org/

    Everything else sounds like enhancements.  I'm not necessarily
opposed, but a little discussion is probably in order before putting
any tickets in for any of that.  Also, I'm not sure when I'll be able
to spare some time to work on the module.  It was easy to justify
spending time from my day job getting the module up to where is now,
as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
quite well for my purposes.  Again, I'm not opposed to further
enhancements, but If I'm going to work on any of them, they'll have to
fit into everything else I'm doing and it could be a while.  However,
there's no reason somebody else can't do what I did.  Discuss the
changes here, work out a plan, implement it, send along the diff(s)
attached to a bug in bugzilla.  Next thing you know, your changes are
in cvs.  8)

On 4/11/07, Torsten Seemann <torsten.seemann at infotech.monash.edu.au> wrote:
> Andrew,
>
> >                 # Glimmer 3.X prediction
> >                 (/\w+(\d+)\s+       # orf (numeric portion)
> > ...isn't picking up more than the last digit in the orf-number.  Not
> > sure if that's intentional.  A sample of the feature output using -
> >  >gff_string shows up as ...
>
> I think that regexp should be \w+?(\d+)
>
> ie. the \w+ should be non-greedy, otherwise it will swallow up all but
> one of the following \d+ (as \d is a subset of \w)
>
> I've CC:ed this to Mark Johnson who made the recent changes to this module.
>
> Thanks for your feedback,
>
> --Torsten Seemann


From cjfields at uiuc.edu  Thu Apr 12 18:11:33 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 13:11:33 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
Message-ID: <7314C1CD-8AD5-4400-A495-6C8124833D0D@uiuc.edu>

Agreed; anyone can suggest code enhancements and bug fixes and submit  
patches for these:

http://www.bioperl.org/wiki/HOWTO:SubmitPatch

You'll see a long list of unimplemented enhancement requests in  
Bugzilla.  These are the ones where no patch is given; you'll find  
that very few are willing to go through the effort to work on them  
unless there is something in it for them!  Enhancement requests that  
come with patches and tests tend to get committed fairly rapidly  
(sometimes within hours).

chris

On Apr 12, 2007, at 12:26 PM, Mark Johnson wrote:

>     I'd call that a buggy regexp.  Sounds like a good (but minimal)
> fix.  Torsten, I don't have cvs write access, I think you do, can you
> fix that up?  Andrew, can you file that as a bug:
>
> http://bugzilla.bioperl.org/
>
>     Everything else sounds like enhancements.  I'm not necessarily
> opposed, but a little discussion is probably in order before putting
> any tickets in for any of that.  Also, I'm not sure when I'll be able
> to spare some time to work on the module.  It was easy to justify
> spending time from my day job getting the module up to where is now,
> as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
> quite well for my purposes.  Again, I'm not opposed to further
> enhancements, but If I'm going to work on any of them, they'll have to
> fit into everything else I'm doing and it could be a while.  However,
> there's no reason somebody else can't do what I did.  Discuss the
> changes here, work out a plan, implement it, send along the diff(s)
> attached to a bug in bugzilla.  Next thing you know, your changes are
> in cvs.  8)
>
> On 4/11/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>> Andrew,
>>
>>>                 # Glimmer 3.X prediction
>>>                 (/\w+(\d+)\s+       # orf (numeric portion)
>>> ...isn't picking up more than the last digit in the orf-number.  Not
>>> sure if that's intentional.  A sample of the feature output using -
>>>> gff_string shows up as ...
>>
>> I think that regexp should be \w+?(\d+)
>>
>> ie. the \w+ should be non-greedy, otherwise it will swallow up all  
>> but
>> one of the following \d+ (as \d is a subset of \w)
>>
>> I've CC:ed this to Mark Johnson who made the recent changes to  
>> this module.
>>
>> Thanks for your feedback,
>>
>> --Torsten Seemann


From stewarta at nmrc.navy.mil  Thu Apr 12 18:35:00 2007
From: stewarta at nmrc.navy.mil (Andrew Stewart)
Date: Thu, 12 Apr 2007 14:35:00 -0400
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
Message-ID: <DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>

I'm willing to do the coding and testing, I'm just not familiar with  
the submission process yet (I see there's a HOWTO now, nice).   Let's  
discuss first.

So to reiterate, I'm suggesting that the module also parse out the  
frame and score information from Glimmer output.  I take back my  
suggestion of overriding the source / primary tags through the module  
as this can easily be done post-parser.  Other annotations can also  
be edited post-parser easily enough.

Reasons for:  Parsing everything out of the output and letting the  
user determine what's useful or not.

Reasons against:  Extra information may not be relevant to the format  
of the generated feature type?


-Andrew


On Apr 12, 2007, at 1:26 PM, Mark Johnson wrote:

>    I'd call that a buggy regexp.  Sounds like a good (but minimal)
> fix.  Torsten, I don't have cvs write access, I think you do, can you
> fix that up?  Andrew, can you file that as a bug:
>
> http://bugzilla.bioperl.org/
>
>    Everything else sounds like enhancements.  I'm not necessarily
> opposed, but a little discussion is probably in order before putting
> any tickets in for any of that.  Also, I'm not sure when I'll be able
> to spare some time to work on the module.  It was easy to justify
> spending time from my day job getting the module up to where is now,
> as I needed a BioPerl-ish glimmer2/glimmer3 parser.  It's working
> quite well for my purposes.  Again, I'm not opposed to further
> enhancements, but If I'm going to work on any of them, they'll have to
> fit into everything else I'm doing and it could be a while.  However,
> there's no reason somebody else can't do what I did.  Discuss the
> changes here, work out a plan, implement it, send along the diff(s)
> attached to a bug in bugzilla.  Next thing you know, your changes are
> in cvs.  8)
>
> On 4/11/07, Torsten Seemann  
> <torsten.seemann at infotech.monash.edu.au> wrote:
>> Andrew,
>>
>> >                 # Glimmer 3.X prediction
>> >                 (/\w+(\d+)\s+       # orf (numeric portion)
>> > ...isn't picking up more than the last digit in the orf-number.   
>> Not
>> > sure if that's intentional.  A sample of the feature output using -
>> >  >gff_string shows up as ...
>>
>> I think that regexp should be \w+?(\d+)
>>
>> ie. the \w+ should be non-greedy, otherwise it will swallow up all  
>> but
>> one of the following \d+ (as \d is a subset of \w)
>>
>> I've CC:ed this to Mark Johnson who made the recent changes to  
>> this module.
>>
>> Thanks for your feedback,
>>
>> --Torsten Seemann


--
Andrew Stewart
Research Assistant, Genomics Team
Navy Medical Research Center (NMRC)
Biological Defense Research Directorate (BDRD)
BDRD Annex
12300 Washington Avenue, 2nd Floor
Rockville, MD 20852

email: stewarta at nmrc.navy.mil
phone: 301-231-6700 Ext 270


From johnsonm at gmail.com  Thu Apr 12 19:11:18 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 14:11:18 -0500
Subject: [Bioperl-l] Thoughts on Bio::Tools::Glimmer
In-Reply-To: <DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>
References: <6B252B15-930F-44C3-A7E9-E6363522A1D0@nmrc.navy.mil>
	<a79f6a4b0704111733v703853d4jdc20a022ef2f5562@mail.gmail.com>
	<ebf5eb170704121026g43910e6fhbb46e6b8ac34b48@mail.gmail.com>
	<DFD3EE8A-C4D6-48B4-BD94-CCA41F1C8332@nmrc.navy.mil>
Message-ID: <ebf5eb170704121211s19062ac8hb9b510d440fcfe44@mail.gmail.com>

> So to reiterate, I'm suggesting that the module also parse out the frame and
> score information from Glimmer output.  I take back my suggestion of
> overriding the source / primary tags through the module as this can easily
> be done post-parser.  Other annotations can also be edited post-parser
> easily enough.

The reason the source tags are what they are for my addition(s) is
that the output from glimmer2/glimmer3 does not include a version
string.  You can figure out the major version from the output
formatting, but that's about it.  Also, being my first significant
contribution, I wasn't out to break new ground.  I did what some of
the other gene predictors seem to do, and what the existing code
already did.  Maybe there should be a method to pass in the exact
version if you know it.  Further than that, I think the Glimmer module
should stay consistent with what the other gene predictors do.  No
reason, though, that they couldn't *all* be enhanced similarly, if you
want to be able to further control the source tag.  8)

Part of the reason I didn't parse out the frame / score info for
either glimmer2 or glimmer3 was that I didn't need it.  The other part
being that my regexp kung-fu is nothing special.  This sounds like a
no-brainer to me.  Extend the regexps to capture it and tag it (and
the tests).

As far as the ORFs go, I guess you could just use
Bio::SeqFeature::Generic to represent them.  I haven't been keeping
track of the relevant feature/annotation interfaces, but maybe there
should be some kind of relation between the ORFs and predictions?

The glimmer3 detail file is a little trickier.  The least disruptive
thing to do, interface wise, might be to specify that as a seperate
input via an argument to the constructor.  Then you've got *two* input
files, and are going to have to override the automagic stuff that
expects one input file and takes care of it all.

As far as process, I just got on the list and started pestering
people, and they haven't thrown me off yet.  8)  I'm afraid that
you're going to find that while people are happy to discuss
implementation details, when it comes time to fire up the editor,
you're usually on your own, if it's an enhancement.

I'd love to work on Bioperl more, but so far, it's only been for what
I need for my job.


From spiros at lokku.com  Thu Apr 12 19:16:39 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 20:16:39 +0100
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
	<bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
Message-ID: <bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>

Hey guys,

I have added a link as per Chris's nice suggestion for keeping track
on whats going on regarding the migration:
http://www.bioperl.org/wiki/TestMoreProgress
There's also a link to this page from the project priority list.
However, adding our signature for each module etc , in my humble
opinion, seems tedious. May i suggest we just split up the list in
'starting letter sections' and each one does his part.
I volunteer to work on all tests starting with the letter R down to
the bottom of the list.

Let me know if this makes sense or not. I will work on
removing/flagging all the files that have already been migrated on
that list as well.

-spiros

On 4/12/07, Spiros Denaxas <spiros at lokku.com> wrote:
> Good idea Chris. Just got back home so will probably do it tomorrow
> morning or so.
>
> Spiros
>
> On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> > We should probably place something on the wiki to prevent overlaps
> > (i.e. make sure no two devs are working on the same tests).  I
> > planned on working on the G's last night but got bogged down.
> >
> > Spiros, if you haven't already go ahead and create a list on a wiki
> > page for tracking.  We can lay claim to them by tagging with our sigs
> > and cross them off once complete.
> >
> > chris
> >
> > On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
> >
> > > Yep! I have some rough stats I have at home, I will post them later on
> > > tonight. Roughly, if i remember correctly, 50% of the tests were still
> > > using Test, all the others were using Test::More.
> > >
> > > More to follow later on,
> > > Spiros
> > >
> > > On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
> > >> It should be easy enough to find those t/*.t files that have "use
> > >> Test;"
> > >> or "require Test;" This should provide a list of files still
> > >> needing to
> > >> be converted over to Test::More. As discussed previously, it may
> > >> also be
> > >> useful to use Test::Exception to test for situations where
> > >> exceptions/warnings are thrown. If you add additional tests using
> > >> this
> > >> module, you should add the Test::Exception module to t/lib/
> > >>
> > >> Good luck, and feel free to mail the list with questions/comments
> > >> etc.
> > >>
> > >> Nath
> > >>
> > >>
> > >> Chris Fields wrote:
> > >> > At the moment we do not have a comprehensive list up on the
> > >> wiki.  I
> > >> > have been slowly working (alphabetically!) to switch them over, so
> > >> > any help would be appreciated.
> > >> >
> > >> > I have CC'd this to the main mail list for anyone else interested.
> > >> >
> > >> > chris
> > >> >
> > >> > On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
> > >> >
> > >> >
> > >> >> Hey guys,
> > >> >>
> > >> >> I noticed there's an open task regarding moving testing code to
> > >> use
> > >> >> Test::More etc and that Chris and Nathan are already on to it. Is
> > >> >> there any kind of wiki page that you keep track of which
> > >> modules you
> > >> >> are already working on? I am new to this and want to contribute,
> > >> >> having a fair amount of unit testing from work, but don't want
> > >> to step
> > >> >> over other people's work and avoid duplication as well.
> > >> >> Any pointers where i could get started would be much
> > >> appreciated :-)
> > >> >>
> > >> >> Thanks,
> > >> >> Spiros
> > >> >>
> > >> >> ps. apologies if this is not the correct list to post this, just
> > >> >> seemed the most intuitive choice.
> > >> >> _______________________________________________
> > >> >> Bioperl-guts-l mailing list
> > >> >> Bioperl-guts-l at lists.open-bio.org
> > >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
> > >> >>
> > >> >
> > >> > Christopher Fields
> > >> > Postdoctoral Researcher
> > >> > Lab of Dr. Robert Switzer
> > >> > Dept of Biochemistry
> > >> > University of Illinois Urbana-Champaign
> > >> >
> > >> >
> > >> >
> > >> > _______________________________________________
> > >> > Bioperl-l mailing list
> > >> > Bioperl-l at lists.open-bio.org
> > >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >> >
> > >>
> > >>
> >
> > Christopher Fields
> > Postdoctoral Researcher
> > Lab of Dr. Robert Switzer
> > Dept of Biochemistry
> > University of Illinois Urbana-Champaign
> >
> >
> >
> >
>


From marian.thieme at lycos.de  Wed Apr 11 16:02:14 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Wed, 11 Apr 2007 16:02:14 +0000
Subject: [Bioperl-l] Affys ReseqChip
Message-ID: <188661178017404@lycos-europe.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070411/bc2eb3aa/attachment-0004.html>

From johnsonm at gmail.com  Thu Apr 12 19:35:35 2007
From: johnsonm at gmail.com (Mark Johnson)
Date: Thu, 12 Apr 2007 14:35:35 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
Message-ID: <ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>

Looks like MediaWiki has some built in functionality:

    http://meta.wikimedia.org/wiki/Anti-spam_Features
    http://www.mediawiki.org/wiki/Extension:ConfirmEdit

I'm not sure I'd call what they're doing spam, more like vandalism,
but either way, I don't see the point (though I only looked at a
couple examples via Recent Changes).

If they're indeed bots, maybe it's time to enable Captchas? Depending
on who they are and what their goals are, that may get rid of them
completely or just slow them down.

On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
> I'm posting this to the mail list in case anyone has any ideas on
> what is going on...
>
> I have noticed an odd (read: annoying) rash of spam on the wiki.
> Jason also ran some spam reversions, so maybe he can chime in.
> Essentially it looks like the responsible spambots 'correct' the wiki
> text and links, so that '+' is being removed and URI-encoded symbols
> in links are reverted to symbols.  Unfortunately the removal occurs
> in all text, so places where '+' is intended (for instance, raw text
> for showing example record formats) are also changed.  My guess is
> we'll need to block the IP address or add to the blacklist if possible.
>
> Between Jason and I we have blocked ~9 spambots and counting.
> Couldn't find anything via Google yet...
>
> chris


From cjfields at uiuc.edu  Thu Apr 12 19:44:28 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 14:44:28 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
Message-ID: <BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>


On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:

> Looks like MediaWiki has some built in functionality:
>
>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>
> I'm not sure I'd call what they're doing spam, more like vandalism,
> but either way, I don't see the point (though I only looked at a
> couple examples via Recent Changes).
>
> If they're indeed bots, maybe it's time to enable Captchas? Depending
> on who they are and what their goals are, that may get rid of them
> completely or just slow them down.

Already done; Mauricio installed ConfirmEdit yesterday after a bit of  
off-list discussion (thanks again Mauricio!).

If you create a new account you'll encounter a simple captcha (it  
isn't configured for each edit yet).  We may implement confirmations  
per edit or install picture captchas at a later point, dep. on how  
well the current system works.

We may start granting anyone interested in maintaining the wiki sysop  
privs which makes handling spam easier.  If so we'll probably  
announce something along those lines here first.

chris


From cjfields at uiuc.edu  Thu Apr 12 19:48:41 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 14:48:41 -0500
Subject: [Bioperl-l] [Bioperl-guts-l] moving tests to use Test::More
In-Reply-To: <bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>
References: <bba689ec0704091123w77a5d0d9u2620800ba061f7c@mail.gmail.com>
	<5B4CD007-F8D8-48EF-9F90-72D69748BA77@uiuc.edu>
	<461CF0F3.1010708@sheffield.ac.uk>
	<bba689ec0704110756h72dd65e6l7fc03e5b886a1651@mail.gmail.com>
	<FD9A2F5C-0F0E-4FF5-A97B-46605896B500@uiuc.edu>
	<bba689ec0704111808g6cd28a52g5435b0c4de551b32@mail.gmail.com>
	<bba689ec0704121216w45e83ean2efb4b07288d7806@mail.gmail.com>
Message-ID: <3B4500DD-CAB6-4FD6-ABF9-A0160981F7E3@uiuc.edu>

Sounds good!  I'll finish up the P's (halfway through now...) and  
move on to other things; got plenty to do, believe me!

Appreciate all the help, Spiros!

chris

On Apr 12, 2007, at 2:16 PM, Spiros Denaxas wrote:

> Hey guys,
>
> I have added a link as per Chris's nice suggestion for keeping track
> on whats going on regarding the migration:
> http://www.bioperl.org/wiki/TestMoreProgress
> There's also a link to this page from the project priority list.
> However, adding our signature for each module etc , in my humble
> opinion, seems tedious. May i suggest we just split up the list in
> 'starting letter sections' and each one does his part.
> I volunteer to work on all tests starting with the letter R down to
> the bottom of the list.
>
> Let me know if this makes sense or not. I will work on
> removing/flagging all the files that have already been migrated on
> that list as well.
>
> -spiros
>
> On 4/12/07, Spiros Denaxas <spiros at lokku.com> wrote:
>> Good idea Chris. Just got back home so will probably do it tomorrow
>> morning or so.
>>
>> Spiros
>>
>> On 4/11/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> We should probably place something on the wiki to prevent overlaps
>>> (i.e. make sure no two devs are working on the same tests).  I
>>> planned on working on the G's last night but got bogged down.
>>>
>>> Spiros, if you haven't already go ahead and create a list on a wiki
>>> page for tracking.  We can lay claim to them by tagging with our  
>>> sigs
>>> and cross them off once complete.
>>>
>>> chris
>>>
>>> On Apr 11, 2007, at 9:56 AM, Spiros Denaxas wrote:
>>>
>>>> Yep! I have some rough stats I have at home, I will post them  
>>>> later on
>>>> tonight. Roughly, if i remember correctly, 50% of the tests were  
>>>> still
>>>> using Test, all the others were using Test::More.
>>>>
>>>> More to follow later on,
>>>> Spiros
>>>>
>>>> On 4/11/07, Nathan Haigh <n.haigh at sheffield.ac.uk> wrote:
>>>>> It should be easy enough to find those t/*.t files that have "use
>>>>> Test;"
>>>>> or "require Test;" This should provide a list of files still
>>>>> needing to
>>>>> be converted over to Test::More. As discussed previously, it may
>>>>> also be
>>>>> useful to use Test::Exception to test for situations where
>>>>> exceptions/warnings are thrown. If you add additional tests using
>>>>> this
>>>>> module, you should add the Test::Exception module to t/lib/
>>>>>
>>>>> Good luck, and feel free to mail the list with questions/comments
>>>>> etc.
>>>>>
>>>>> Nath
>>>>>
>>>>>
>>>>> Chris Fields wrote:
>>>>>> At the moment we do not have a comprehensive list up on the
>>>>> wiki.  I
>>>>>> have been slowly working (alphabetically!) to switch them  
>>>>>> over, so
>>>>>> any help would be appreciated.
>>>>>>
>>>>>> I have CC'd this to the main mail list for anyone else  
>>>>>> interested.
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Apr 9, 2007, at 1:23 PM, Spiros Denaxas wrote:
>>>>>>
>>>>>>
>>>>>>> Hey guys,
>>>>>>>
>>>>>>> I noticed there's an open task regarding moving testing code to
>>>>> use
>>>>>>> Test::More etc and that Chris and Nathan are already on to  
>>>>>>> it. Is
>>>>>>> there any kind of wiki page that you keep track of which
>>>>> modules you
>>>>>>> are already working on? I am new to this and want to contribute,
>>>>>>> having a fair amount of unit testing from work, but don't want
>>>>> to step
>>>>>>> over other people's work and avoid duplication as well.
>>>>>>> Any pointers where i could get started would be much
>>>>> appreciated :-)
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Spiros
>>>>>>>
>>>>>>> ps. apologies if this is not the correct list to post this, just
>>>>>>> seemed the most intuitive choice.
>>>>>>> _______________________________________________
>>>>>>> Bioperl-guts-l mailing list
>>>>>>> Bioperl-guts-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-guts-l
>>>>>>>
>>>>>>
>>>>>> Christopher Fields
>>>>>> Postdoctoral Researcher
>>>>>> Lab of Dr. Robert Switzer
>>>>>> Dept of Biochemistry
>>>>>> University of Illinois Urbana-Champaign
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>
>>>>>
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From spiros at lokku.com  Thu Apr 12 20:19:18 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 21:19:18 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
Message-ID: <bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>

Nice idea, i saw it a bit before. However, any chance of implementing
white lists with regular and/or trusted users to skip it each time we
add something to the wiki ?

Spiros

On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>
> > Looks like MediaWiki has some built in functionality:
> >
> >    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >
> > I'm not sure I'd call what they're doing spam, more like vandalism,
> > but either way, I don't see the point (though I only looked at a
> > couple examples via Recent Changes).
> >
> > If they're indeed bots, maybe it's time to enable Captchas? Depending
> > on who they are and what their goals are, that may get rid of them
> > completely or just slow them down.
>
> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> off-list discussion (thanks again Mauricio!).
>
> If you create a new account you'll encounter a simple captcha (it
> isn't configured for each edit yet).  We may implement confirmations
> per edit or install picture captchas at a later point, dep. on how
> well the current system works.
>
> We may start granting anyone interested in maintaining the wiki sysop
> privs which makes handling spam easier.  If so we'll probably
> announce something along those lines here first.
>
> chris
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From Jonathan_Epstein at nih.gov  Thu Apr 12 20:22:40 2007
From: Jonathan_Epstein at nih.gov (Jonathan Epstein)
Date: Thu, 12 Apr 2007 16:22:40 -0400
Subject: [Bioperl-l] Affys ReseqChip
In-Reply-To: <188661178017404@lycos-europe.com>
References: <188661178017404@lycos-europe.com>
Message-ID: <6.2.3.4.2.20070412161407.04a38b60@mail.nih.gov>

This sounds great to me.

Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this.  But I suggest that you proceed full-speed-ahead, and we can sort this out in the future.

Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach.

Jonathan


At 12:02 PM 4/11/2007, Marian Thieme wrote:
>Hi,
>
>I am working on a piece of software, which is aimed to analyse the outcome of Affymetrix DNA Resequencing Arrays. (In particular Mitochip V2). The main goal of the software is to take into account for the redundant fragments. The software is able to align the redundant fragments to the entire sequence and in particular to call bases which arent called by the entire sequence and to detect insertions/deletion, depending on the design of the redundant frags.
>
>I would be glad to distribute the software to the bioperl package or otherwise, if anybody is interested I can give the code and/or further develop some features.
>
>Marian

Jonathan Epstein                                Jonathan_Epstein at nih.gov
Head, Unit on Biologic Computation              (301)402-4563
Office of the Scientific Director               Bldg 31, Room 2A47
Nat. Inst. of Child Health & Human Development  31 Center Drive
National Institutes of Health                   Bethesda, MD 20892  


From spiros at lokku.com  Thu Apr 12 21:35:43 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 22:35:43 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EA4FA.8010504@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
	<461EA4FA.8010504@campus.iztacala.unam.mx>
Message-ID: <bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>

Mauricio, thanks for your response. I actually edited a page several
times today and i got the captcha. More specifically, it was displayed
because "the page i edited contained external links" which is true
since i included a {{CPAN}} link.

Spiros

On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> The chance of having white lists exists but as far as I tested last
> night, the captcha is working only at the Create Account pages, not at
> the time of applying changes to wiki content (I tested as a regular user
> and not as a wiki admin).
>
> The idea at this moment is only to block automated methods for account
> creation (bots). Registered users who haven't been blocked and/or have
> confirmed their email wouldn't be bothered while adding/changing wiki
> content.
>
> Regards,
> Mauricio.
>
> Spiros Denaxas wrote:
> > Nice idea, i saw it a bit before. However, any chance of implementing
> > white lists with regular and/or trusted users to skip it each time we
> > add something to the wiki ?
> >
> > Spiros
> >
> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
> >>
> >>> Looks like MediaWiki has some built in functionality:
> >>>
> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >>>
> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
> >>> but either way, I don't see the point (though I only looked at a
> >>> couple examples via Recent Changes).
> >>>
> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
> >>> on who they are and what their goals are, that may get rid of them
> >>> completely or just slow them down.
> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> >> off-list discussion (thanks again Mauricio!).
> >>
> >> If you create a new account you'll encounter a simple captcha (it
> >> isn't configured for each edit yet).  We may implement confirmations
> >> per edit or install picture captchas at a later point, dep. on how
> >> well the current system works.
> >>
> >> We may start granting anyone interested in maintaining the wiki sysop
> >> privs which makes handling spam easier.  If so we'll probably
> >> announce something along those lines here first.
> >>
> >> chris
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>


From arareko at campus.iztacala.unam.mx  Thu Apr 12 21:30:34 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 12 Apr 2007 16:30:34 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
Message-ID: <461EA4FA.8010504@campus.iztacala.unam.mx>

The chance of having white lists exists but as far as I tested last 
night, the captcha is working only at the Create Account pages, not at 
the time of applying changes to wiki content (I tested as a regular user 
and not as a wiki admin).

The idea at this moment is only to block automated methods for account 
creation (bots). Registered users who haven't been blocked and/or have 
confirmed their email wouldn't be bothered while adding/changing wiki 
content.

Regards,
Mauricio.

Spiros Denaxas wrote:
> Nice idea, i saw it a bit before. However, any chance of implementing
> white lists with regular and/or trusted users to skip it each time we
> add something to the wiki ?
> 
> Spiros
> 
> On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>>
>>> Looks like MediaWiki has some built in functionality:
>>>
>>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>>>
>>> I'm not sure I'd call what they're doing spam, more like vandalism,
>>> but either way, I don't see the point (though I only looked at a
>>> couple examples via Recent Changes).
>>>
>>> If they're indeed bots, maybe it's time to enable Captchas? Depending
>>> on who they are and what their goals are, that may get rid of them
>>> completely or just slow them down.
>> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
>> off-list discussion (thanks again Mauricio!).
>>
>> If you create a new account you'll encounter a simple captcha (it
>> isn't configured for each edit yet).  We may implement confirmations
>> per edit or install picture captchas at a later point, dep. on how
>> well the current system works.
>>
>> We may start granting anyone interested in maintaining the wiki sysop
>> privs which makes handling spam easier.  If so we'll probably
>> announce something along those lines here first.
>>
>> chris
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From arareko at campus.iztacala.unam.mx  Thu Apr 12 21:53:51 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Thu, 12 Apr 2007 16:53:51 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>	
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>	
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
Message-ID: <461EAA6F.1090805@campus.iztacala.unam.mx>

I've reconfigured the extension to display captchas exclusively for 
account creation and disabled it when adding URLs in pages. Don't know 
why this didn't happened to me while testing last night...

Please try do it again to see if the change works. Thanks for pointing 
this out Spiros :)

Mauricio.

Spiros Denaxas wrote:
> Mauricio, thanks for your response. I actually edited a page several
> times today and i got the captcha. More specifically, it was displayed
> because "the page i edited contained external links" which is true
> since i included a {{CPAN}} link.
> 
> Spiros
> 
> On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> 
> wrote:
>> The chance of having white lists exists but as far as I tested last
>> night, the captcha is working only at the Create Account pages, not at
>> the time of applying changes to wiki content (I tested as a regular user
>> and not as a wiki admin).
>>
>> The idea at this moment is only to block automated methods for account
>> creation (bots). Registered users who haven't been blocked and/or have
>> confirmed their email wouldn't be bothered while adding/changing wiki
>> content.
>>
>> Regards,
>> Mauricio.
>>
>> Spiros Denaxas wrote:
>> > Nice idea, i saw it a bit before. However, any chance of implementing
>> > white lists with regular and/or trusted users to skip it each time we
>> > add something to the wiki ?
>> >
>> > Spiros
>> >
>> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>> >>
>> >>> Looks like MediaWiki has some built in functionality:
>> >>>
>> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>> >>>
>> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
>> >>> but either way, I don't see the point (though I only looked at a
>> >>> couple examples via Recent Changes).
>> >>>
>> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
>> >>> on who they are and what their goals are, that may get rid of them
>> >>> completely or just slow them down.
>> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
>> >> off-list discussion (thanks again Mauricio!).
>> >>
>> >> If you create a new account you'll encounter a simple captcha (it
>> >> isn't configured for each edit yet).  We may implement confirmations
>> >> per edit or install picture captchas at a later point, dep. on how
>> >> well the current system works.
>> >>
>> >> We may start granting anyone interested in maintaining the wiki sysop
>> >> privs which makes handling spam easier.  If so we'll probably
>> >> announce something along those lines here first.
>> >>
>> >> chris
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>>
>> -- 
>> MAURICIO HERRERA CUADRA
>> arareko at campus.iztacala.unam.mx
>> Laboratorio de Gen?tica
>> Unidad de Morfofisiolog?a y Funci?n
>> Facultad de Estudios Superiores Iztacala, UNAM
>>
>>
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From spiros at lokku.com  Thu Apr 12 22:11:46 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Thu, 12 Apr 2007 23:11:46 +0100
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
	<461EAA6F.1090805@campus.iztacala.unam.mx>
Message-ID: <bba689ec0704121511y135f0da0j26d520a11dd3ffa1@mail.gmail.com>

You're welcome Mauricio. Its all cool now, works without the captcha
for internal edits. Thanks for changing it over :-)

-spiros

On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx> wrote:
> I've reconfigured the extension to display captchas exclusively for
> account creation and disabled it when adding URLs in pages. Don't know
> why this didn't happened to me while testing last night...
>
> Please try do it again to see if the change works. Thanks for pointing
> this out Spiros :)
>
> Mauricio.
>
> Spiros Denaxas wrote:
> > Mauricio, thanks for your response. I actually edited a page several
> > times today and i got the captcha. More specifically, it was displayed
> > because "the page i edited contained external links" which is true
> > since i included a {{CPAN}} link.
> >
> > Spiros
> >
> > On 4/12/07, Mauricio Herrera Cuadra <arareko at campus.iztacala.unam.mx>
> > wrote:
> >> The chance of having white lists exists but as far as I tested last
> >> night, the captcha is working only at the Create Account pages, not at
> >> the time of applying changes to wiki content (I tested as a regular user
> >> and not as a wiki admin).
> >>
> >> The idea at this moment is only to block automated methods for account
> >> creation (bots). Registered users who haven't been blocked and/or have
> >> confirmed their email wouldn't be bothered while adding/changing wiki
> >> content.
> >>
> >> Regards,
> >> Mauricio.
> >>
> >> Spiros Denaxas wrote:
> >> > Nice idea, i saw it a bit before. However, any chance of implementing
> >> > white lists with regular and/or trusted users to skip it each time we
> >> > add something to the wiki ?
> >> >
> >> > Spiros
> >> >
> >> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
> >> >>
> >> >>> Looks like MediaWiki has some built in functionality:
> >> >>>
> >> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
> >> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
> >> >>>
> >> >>> I'm not sure I'd call what they're doing spam, more like vandalism,
> >> >>> but either way, I don't see the point (though I only looked at a
> >> >>> couple examples via Recent Changes).
> >> >>>
> >> >>> If they're indeed bots, maybe it's time to enable Captchas? Depending
> >> >>> on who they are and what their goals are, that may get rid of them
> >> >>> completely or just slow them down.
> >> >> Already done; Mauricio installed ConfirmEdit yesterday after a bit of
> >> >> off-list discussion (thanks again Mauricio!).
> >> >>
> >> >> If you create a new account you'll encounter a simple captcha (it
> >> >> isn't configured for each edit yet).  We may implement confirmations
> >> >> per edit or install picture captchas at a later point, dep. on how
> >> >> well the current system works.
> >> >>
> >> >> We may start granting anyone interested in maintaining the wiki sysop
> >> >> privs which makes handling spam easier.  If so we'll probably
> >> >> announce something along those lines here first.
> >> >>
> >> >> chris
> >> >>
> >> >>
> >> >> _______________________________________________
> >> >> Bioperl-l mailing list
> >> >> Bioperl-l at lists.open-bio.org
> >> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >>
> >> > _______________________________________________
> >> > Bioperl-l mailing list
> >> > Bioperl-l at lists.open-bio.org
> >> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >> >
> >>
> >> --
> >> MAURICIO HERRERA CUADRA
> >> arareko at campus.iztacala.unam.mx
> >> Laboratorio de Gen?tica
> >> Unidad de Morfofisiolog?a y Funci?n
> >> Facultad de Estudios Superiores Iztacala, UNAM
> >>
> >>
> >
>
> --
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>
>


From cjfields at uiuc.edu  Thu Apr 12 22:02:51 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Thu, 12 Apr 2007 17:02:51 -0500
Subject: [Bioperl-l] Odd spamming on bioperl wiki
In-Reply-To: <461EAA6F.1090805@campus.iztacala.unam.mx>
References: <B16A789C-6F35-4BE5-9435-E7218318AC9B@uiuc.edu>	
	<ebf5eb170704121235t25e780c9wde9ce26ab85100b0@mail.gmail.com>	
	<BDE8ED5B-0464-48A7-ACDF-FE0FF6A58AB8@uiuc.edu>	
	<bba689ec0704121319y7392000apadafbe93ebb60176@mail.gmail.com>	
	<461EA4FA.8010504@campus.iztacala.unam.mx>
	<bba689ec0704121435se351761j3321d3b22ec59561@mail.gmail.com>
	<461EAA6F.1090805@campus.iztacala.unam.mx>
Message-ID: <E1139262-84C3-4282-8E9D-643BF91A3656@uiuc.edu>

You disabled yourself as sysop last night, IIRC.  Don't know; could  
be what Spiros suggested, eg. adding external links trips it.

chris

On Apr 12, 2007, at 4:53 PM, Mauricio Herrera Cuadra wrote:

> I've reconfigured the extension to display captchas exclusively for  
> account creation and disabled it when adding URLs in pages. Don't  
> know why this didn't happened to me while testing last night...
>
> Please try do it again to see if the change works. Thanks for  
> pointing this out Spiros :)
>
> Mauricio.
>
> Spiros Denaxas wrote:
>> Mauricio, thanks for your response. I actually edited a page several
>> times today and i got the captcha. More specifically, it was  
>> displayed
>> because "the page i edited contained external links" which is true
>> since i included a {{CPAN}} link.
>> Spiros
>> On 4/12/07, Mauricio Herrera Cuadra  
>> <arareko at campus.iztacala.unam.mx> wrote:
>>> The chance of having white lists exists but as far as I tested last
>>> night, the captcha is working only at the Create Account pages,  
>>> not at
>>> the time of applying changes to wiki content (I tested as a  
>>> regular user
>>> and not as a wiki admin).
>>>
>>> The idea at this moment is only to block automated methods for  
>>> account
>>> creation (bots). Registered users who haven't been blocked and/or  
>>> have
>>> confirmed their email wouldn't be bothered while adding/changing  
>>> wiki
>>> content.
>>>
>>> Regards,
>>> Mauricio.
>>>
>>> Spiros Denaxas wrote:
>>> > Nice idea, i saw it a bit before. However, any chance of  
>>> implementing
>>> > white lists with regular and/or trusted users to skip it each  
>>> time we
>>> > add something to the wiki ?
>>> >
>>> > Spiros
>>> >
>>> > On 4/12/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>> >> On Apr 12, 2007, at 2:35 PM, Mark Johnson wrote:
>>> >>
>>> >>> Looks like MediaWiki has some built in functionality:
>>> >>>
>>> >>>    http://meta.wikimedia.org/wiki/Anti-spam_Features
>>> >>>    http://www.mediawiki.org/wiki/Extension:ConfirmEdit
>>> >>>
>>> >>> I'm not sure I'd call what they're doing spam, more like  
>>> vandalism,
>>> >>> but either way, I don't see the point (though I only looked at a
>>> >>> couple examples via Recent Changes).
>>> >>>
>>> >>> If they're indeed bots, maybe it's time to enable Captchas?  
>>> Depending
>>> >>> on who they are and what their goals are, that may get rid of  
>>> them
>>> >>> completely or just slow them down.
>>> >> Already done; Mauricio installed ConfirmEdit yesterday after a  
>>> bit of
>>> >> off-list discussion (thanks again Mauricio!).
>>> >>
>>> >> If you create a new account you'll encounter a simple captcha (it
>>> >> isn't configured for each edit yet).  We may implement  
>>> confirmations
>>> >> per edit or install picture captchas at a later point, dep. on  
>>> how
>>> >> well the current system works.
>>> >>
>>> >> We may start granting anyone interested in maintaining the  
>>> wiki sysop
>>> >> privs which makes handling spam easier.  If so we'll probably
>>> >> announce something along those lines here first.
>>> >>
>>> >> chris
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> Bioperl-l mailing list
>>> >> Bioperl-l at lists.open-bio.org
>>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >>
>>> > _______________________________________________
>>> > Bioperl-l mailing list
>>> > Bioperl-l at lists.open-bio.org
>>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> >
>>>
>>> -- 
>>> MAURICIO HERRERA CUADRA
>>> arareko at campus.iztacala.unam.mx
>>> Laboratorio de Gen?tica
>>> Unidad de Morfofisiolog?a y Funci?n
>>> Facultad de Estudios Superiores Iztacala, UNAM
>>>
>>>
>
> -- 
> MAURICIO HERRERA CUADRA
> arareko at campus.iztacala.unam.mx
> Laboratorio de Gen?tica
> Unidad de Morfofisiolog?a y Funci?n
> Facultad de Estudios Superiores Iztacala, UNAM
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Fri Apr 13 08:30:50 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 13 Apr 2007 09:30:50 +0100
Subject: [Bioperl-l] GenericHit->start/end needs tiled hsps?
Message-ID: <461F3FBA.2010101@sendu.me.uk>

Hi all,

I want to double-check my thinking regarding 
Bio::Search::Hit::GenericHit->start() and end(). Right now the docs 
claim that hsps of the hit object must be tiled before the answer can be 
produced. The code is implemented in that way 
(Bio::Search::SearchUtils::tile_hsps($self)).

Yet as far as I can see, all you need to do is loop through all hsps and 
pick out the smallest start and largest end respectively in terms of 
subject and query.

This comes up because I have a blast report where a single hit contains 
over 80000 hsps and the tiling takes over an hour (I gave up on it, 
don't know how long it really takes). The simple loop through hsps takes 
seconds or less.

Now in this situation the answer isn't especially useful (to me). An 
alternative way of fixing the problem would be to re-write the tiling 
algorithm (again) to somehow make it hundreds of times faster, then 
provide some way in start() and end() for the user to request the start 
and end of the best contig, or other contig of choice. Easier said than 
done though!


What do people think?


From marian.thieme at lycos.de  Fri Apr 13 10:12:51 2007
From: marian.thieme at lycos.de (Marian Thieme)
Date: Fri, 13 Apr 2007 10:12:51 +0000
Subject: [Bioperl-l] Affys ReseqChip
Message-ID: <18866117804894@lycos-europe.com>

Hi,

To provide a better understanding of the matter and to assess the approach I will shortly present 
1.) the problem and 2.) my approach.


1.)
given: fragments (string of certain length) with description of location within some reference sequence. For instance:

- redundant fragment: acgtnna--gcta (deletion: pos12, pos13)
- start position: 5
- end position: 17
- and some suited reference sequence

Fragments are assumed to be mappable 1:1 to reference sequence and can contain gaps and n's, the latter indicates that the base wasnt determined maybe because of failed hybridization or something like this.
Thus we dont need to cope with insertions/deletions in terms of only parsing an array design file (description of all insertions and deletions in each redundant fragment) and according to that description inserting gaps in the reference sequence and in the fragments if required.
So from my point of view and in the case of the affy mitochip v2 we only need to process the description file rather than calculating an alignment via dynamic programming matrix.


2.)
My current approach is like the following 5 steps:

1.) input reference sequence and redundant fragments into SeqIO object.

2.) calculate a hash with all insertions defined by length and position and
3.) insert the longest insertion of each position in the appropriate fragments and in the reference sequence. And hence insert as many gaps as given by

length(max_insertion(position_x))-length(insertion(fragment_y, position _x))

to each fragment/reference sequence.
(This is done by iterating over each sequence in the SeqIO and insert gaps according to insertion hash) and

4.) Create SimpleAlign object with LocatableSeq objects

5.) Afterwards we can do some statistical analysis and calc some consensus base for each column in the SimpleAlignment. (I use a Statistics module from cpan).

Unfortunatly I didnt manage to find some method that is giving me the set of bases (column) for a given position in the alignment (did I overlooked something ? is SimpleAlign not appropriate? ), so I iterate for each position (base) of the reference sequence and for each fragments which covers that particular position.


Marian


Jonathan Epstein schrieb:

> This sounds great to me.
>
> Resequencing in general (whether by Affy or by other technology such as Celexa) is likely to become important in the coming few years, and I wonder whether it's worth thinking about a general paradigm for handing this.  But I suggest that you proceed full-speed-ahead, and we can sort this out in the future.
>
> Perhaps one of the experts can advise you whether to use the Bio::UnivAln object, some of the Bio::Assembly objects, or some other approach.
>
> Jonathan

Stelle Deine Fragen bei Lycos iQ -  http://iq.lycos.de/qa/ask/

From thiago.venancio at gmail.com  Fri Apr 13 19:05:12 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Fri, 13 Apr 2007 16:05:12 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
Message-ID: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>

Hi all.

What is the best way to extract coding region from a nucleotide sequence
based on a BLASTX or TBLASTX comparisons ?

Thanks in advance.

Thiago
-- 
"The way to get started is to quit talking and begin doing."
      Walt Disney

========================
Thiago Motta Venancio, MSc
PhD student in Bioinformatics
University of Sao Paulo
========================


From jason at bioperl.org  Fri Apr 13 20:05:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:05:42 -0700
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
Message-ID: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>

Depends on how far away the query protein is, but I don't trust BLAST  
for the actual alignment.  Find the boundaries, add a little slop,  
and refine the alignment of protein to genome with a good alignment  
program designed to like genewise or exonerate or even FASTX/Y.

-jason
On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:

> Hi all.
>
> What is the best way to extract coding region from a nucleotide  
> sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
> -- 
> "The way to get started is to quit talking and begin doing."
>       Walt Disney
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From jason at bioperl.org  Fri Apr 13 20:13:07 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:13:07 -0700
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
Message-ID: <7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>

I think it just needs an edit the code in the to_string which checks  
for the type of algorithm.  You'd need to add to the if/elsif cascade  
and add something for the RPSBLAST type and codes the query and  
target dbs and query and target sequence types properly.  This would  
be very trivial to code in, have you tried adding this to see if it  
works?

if you submit a bug with and example report we'd be able to make  
appropriate changes faster.

-jason
On Apr 11, 2007, at 6:32 AM, Emeric Sevin wrote:

> Hi everybody,
>
> I'm sorry to bug, but either I missed something so obvious nobody  
> bothered to answer, either I'm being a little boycotted here...
> A little help would be very much appreciated
>
> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>
>> Hello,
>>
>> I am new to this community, and apologize if this subject has been  
>> posted before.
>>
>> I want to print out only selected results from a multiple blast- 
>> alignments results file. Problem is, the algorithm used is  
>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the  
>> actual writing task yields "unclean" warnings. Although an ouput  
>> is actually written, the writer  
>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by  
>> the fact rpsblast DBs are not labeled with  
>> "protein"/"nucleic"/"translated".
>> Does anybody know of an easy fix to that bug, or of another way to  
>> come around it?
>>
>> Thank you very much
>>
>> Emeric SEVIN
>> Universit? de Rennes 1_______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From thiago.venancio at gmail.com  Fri Apr 13 20:20:32 2007
From: thiago.venancio at gmail.com (Thiago Venancio)
Date: Fri, 13 Apr 2007 17:20:32 -0300
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
Message-ID: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>

Thanks Jason.

I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
comparisons and want to extract some translated coding regions for further
multiple aligmnent and phylogenetic analysis.

Best.

Thiago

On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>
> Depends on how far away the query protein is, but I don't trust BLAST for
> the actual alignment.  Find the boundaries, add a little slop, and refine
> the alignment of protein to genome with a good alignment program designed to
> like genewise or exonerate or even FASTX/Y.
> -jason
> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>
> Hi all.
>
> What is the best way to extract coding region from a nucleotide sequence
> based on a BLASTX or TBLASTX comparisons ?
>
> Thanks in advance.
>
> Thiago
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>


From jason at bioperl.org  Fri Apr 13 20:47:50 2007
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 13 Apr 2007 13:47:50 -0700
Subject: [Bioperl-l] extracting coding sequence from BLAST
In-Reply-To: <44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
References: <44255ea80704131205haba420dg8adf11bd0596f65e@mail.gmail.com>
	<8C7B42CC-A652-4172-A038-E9461231EE84@bioperl.org>
	<44255ea80704131320t79bc5c64kc519c5c90ebe4ed@mail.gmail.com>
Message-ID: <54F53FA0-4ED6-4DE8-A853-750AE5930FC2@bioperl.org>

Hi -

There are some tools that do this for you -- I've listed a few from a  
google search or from what I remember reading.  It would be great If  
you (and others!) are willing to contribute a little of the info of  
what you find that works for you to the wiki, that would be great as  
well.   A little HOWTO would be cool - here or on openwetware.org.

Prot4EST http://zeldia.cap.ed.ac.uk/bioinformatics/prot4EST/index.shtml
EST-PAC:  doi: http://dx.doi.org/10.1186/1751-0473-1-2

Ewan Birney's estwise as part of wise package also can help if you  
have a likely protein from BLAST you want to align to the est -  
estwise can handle frameshifts, but can be too slow for some people.   
Exonerate's protein2dna model may also work here, but I haven't tried  
it.

-jason
On Apr 13, 2007, at 1:20 PM, Thiago Venancio wrote:

> Thanks Jason.
>
> I have a large dataset (assembled ESTs) and several BLASTX or TBLASTX
> comparisons and want to extract some translated coding regions for  
> further
> multiple aligmnent and phylogenetic analysis.
>
> Best.
>
> Thiago
>
> On 4/13/07, Jason Stajich <jason at bioperl.org> wrote:
>>
>> Depends on how far away the query protein is, but I don't trust  
>> BLAST for
>> the actual alignment.  Find the boundaries, add a little slop, and  
>> refine
>> the alignment of protein to genome with a good alignment program  
>> designed to
>> like genewise or exonerate or even FASTX/Y.
>> -jason
>> On Apr 13, 2007, at 12:05 PM, Thiago Venancio wrote:
>>
>> Hi all.
>>
>> What is the best way to extract coding region from a nucleotide  
>> sequence
>> based on a BLASTX or TBLASTX comparisons ?
>>
>> Thanks in advance.
>>
>> Thiago
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> --
>> Jason Stajich
>> jason at bioperl.org
>> http://jason.open-bio.org/
>>
>>
>>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From gopu_36 at yahoo.com  Fri Apr 13 16:48:48 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Fri, 13 Apr 2007 09:48:48 -0700 (PDT)
Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score
Message-ID: <9982570.post@talk.nabble.com>


Can anyone help me to collect the value of the second best hit score
(ie)raw_score from the blast results which contains multiple queries? I have
used searchIO object to parse my blast report. I am only interested in the
second best hit/raw_score and not the first hit!

Thanks in advance!


-- 
View this message in context: http://www.nabble.com/How-to-parse-blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Sat Apr 14 17:53:42 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 14 Apr 2007 10:53:42 -0700
Subject: [Bioperl-l] How to parse blast result to get 2nd best hit score
In-Reply-To: <9982570.post@talk.nabble.com>
References: <9982570.post@talk.nabble.com>
Message-ID: <67974DCD-B1F9-4286-86A4-5E4C4DBA3914@bioperl.org>

Try reading the HOWTO.

http://bioperl.org/wiki/HOWTO:SearchIO

On Apr 13, 2007, at 9:48 AM, gopu_36 wrote:

>
> Can anyone help me to collect the value of the second best hit score
> (ie)raw_score from the blast results which contains multiple  
> queries? I have
> used searchIO object to parse my blast report. I am only interested  
> in the
> second best hit/raw_score and not the first hit!
>
> Thanks in advance!
>
>
> -- 
> View this message in context: http://www.nabble.com/How-to-parse- 
> blast-result-to-get-2nd-best-hit-score-tf3572717.html#a9982570
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070414/6e7d38dd/attachment.p7s>

From gdorjee at hotmail.com  Sat Apr 14 21:39:50 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
Message-ID: <9997343.post@talk.nabble.com>


hi all, 
can anyone please tell me why the following script gives me error like:
waiting... 5 units of time
Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.
cheers!

use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], 
                              -format => 'fasta');
my $query = $Seq_in->next_seq(); 

my $factory = Bio::Tools::Run::RemoteBlast->new(
                                '-prog'  => 'blastp',
                                '-data' => 'swissprot',
                                 _READMETHOD => "Blast"
                         );
my $blast_report = $factory->submit_blast($query);
my $max_number = 100;
my $trial = 0;


while ( my @rids = $factory->each_rid ) {

    print STDERR "\nSorry, maximum number of retries $max_number exceeded\n"
if $trial >= $max_number;
    last if $trial >= $max_number;
    $trial++;

    print STDERR "waiting... ".(5*$trial)." units of time\n" ;

    # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
    foreach my $rid ( @rids ) {
        my $rc = $factory->retrieve_blast($rid);
       if( !ref($rc) ) {
           if( $rc < 0 ) {
                # retrieve_blast returns -1 on error
               $factory->remove_rid($rid);
            }
            # retrieve_blast returns 0 on 'job not finished'
           sleep 5*$trial;
        } else {

            #---- Blast done ----
            $factory->remove_rid($rid);
            my $result = $rc->next_result;
            print "database: ", $result->database_name(), "\n";
            while( my $hit = $result->next_hit ) {
                print "hit name is: ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                    print "score is: ", $hsp->score, "\n";
                }          }
        }
    }
} 
-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Sat Apr 14 21:39:50 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 14 Apr 2007 14:39:50 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
Message-ID: <9997343.post@talk.nabble.com>


hi all, 
can anyone please tell me why and how can i fix the following script, which
gives me an error like:
waiting... 5 units of time
Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.
cheers!

use Bio::SeqIO;
use Bio::Tools::Run::RemoteBlast;

my $Seq_in = Bio::SeqIO->new (-file => $ARGV[0], 
                              -format => 'fasta');
my $query = $Seq_in->next_seq(); 

my $factory = Bio::Tools::Run::RemoteBlast->new(
                                '-prog'  => 'blastp',
                                '-data' => 'swissprot',
                                 _READMETHOD => "Blast"
                         );
my $blast_report = $factory->submit_blast($query);
my $max_number = 100;
my $trial = 0;


while ( my @rids = $factory->each_rid ) {

    print STDERR "\nSorry, maximum number of retries $max_number exceeded\n"
if $trial >= $max_number;
    last if $trial >= $max_number;
    $trial++;

    print STDERR "waiting... ".(5*$trial)." units of time\n" ;

    # RID = Remote Blast ID (e.g: 1017772174-16400-6638)
    foreach my $rid ( @rids ) {
        my $rc = $factory->retrieve_blast($rid);
       if( !ref($rc) ) {
           if( $rc < 0 ) {
                # retrieve_blast returns -1 on error
               $factory->remove_rid($rid);
            }
            # retrieve_blast returns 0 on 'job not finished'
           sleep 5*$trial;
        } else {

            #---- Blast done ----
            $factory->remove_rid($rid);
            my $result = $rc->next_result;
            print "database: ", $result->database_name(), "\n";
            while( my $hit = $result->next_hit ) {
                print "hit name is: ", $hit->name, "\n";
                while( my $hsp = $hit->next_hsp ) {
                    print "score is: ", $hsp->score, "\n";
                }          }
        }
    }
} 
-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a9997343
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From dmessina at wustl.edu  Sun Apr 15 16:02:51 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 15 Apr 2007 11:02:51 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <9997343.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
Message-ID: <ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>

Hi DeeGee,

Your script worked fine for me. Perhaps the problem is in your input  
fasta file?

Dave

% perl test.pl AAC12660.fa
waiting... 5 units of time
waiting... 10 units of time
waiting... 15 units of time
database: Non-redundant SwissProt sequences
hit name is: sp|Q15750|TAB1_HUMAN
score is: 2413
hit name is: sp|Q8CF89|TAB1_MOUSE
score is: 2352
hit name is: sp|P49444|PP2C_PARTE
score is: 159
hit name is: sp|Q6ING9|PP2CK_XENLA
[...etc...]


From spiros at lokku.com  Sun Apr 15 16:12:05 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Sun, 15 Apr 2007 17:12:05 +0100
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
Message-ID: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>

Yep, it must be in the input file. The

$result->database_name()

function gets called on $result the result object.

The error you get,

Can't call method "database_name" on an undefined value at
test1_remote_swissblast.pl line 41, <GEN4> line 31.

means the result object is not defined thus the function fails since
there are no data to operate on.

Spiros

On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
> Hi DeeGee,
>
> Your script worked fine for me. Perhaps the problem is in your input
> fasta file?
>
> Dave
>
> % perl test.pl AAC12660.fa
> waiting... 5 units of time
> waiting... 10 units of time
> waiting... 15 units of time
> database: Non-redundant SwissProt sequences
> hit name is: sp|Q15750|TAB1_HUMAN
> score is: 2413
> hit name is: sp|Q8CF89|TAB1_MOUSE
> score is: 2352
> hit name is: sp|P49444|PP2C_PARTE
> score is: 159
> hit name is: sp|Q6ING9|PP2CK_XENLA
> [...etc...]
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From dr.hogart at gmail.com  Sun Apr 15 16:13:29 2007
From: dr.hogart at gmail.com (sergei ryazansky)
Date: Sun, 15 Apr 2007 20:13:29 +0400
Subject: [Bioperl-l] error with blast parsing by searchIO
Message-ID: <op.tqt10r17avnppr@hogart.img.ras.ru>

Hello all,

script (parsing blastn report) that previously had worked today "tell" me  
that:

------------- EXCEPTION  -------------
MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
: No such file or directory
STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm:273
STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
STACK toplevel parse-te-lib2.pl:3

--------------------------------------

What does it mean??

ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8


From cjfields at uiuc.edu  Sun Apr 15 17:40:24 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 15 Apr 2007 12:40:24 -0500
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqt10r17avnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
Message-ID: <460926E6-0EEA-45D9-838E-70706062857C@uiuc.edu>

You have to update to bioperl 1.5.2 or CVS.  BLAST parsing is broken  
for recent BLAST versions (> v.2.2, I believe).

chris

On Apr 15, 2007, at 11:13 AM, sergei ryazansky wrote:

> Hello all,
>
> script (parsing blastn report) that previously had worked today  
> "tell" me
> that:
>
> ------------- EXCEPTION  -------------
> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
> : No such file or directory
> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: 
> 273
> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
> STACK toplevel parse-te-lib2.pl:3
>
> --------------------------------------
>
> What does it mean??
>
> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From jason at bioperl.org  Sun Apr 15 18:24:56 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 15 Apr 2007 11:24:56 -0700
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqt10r17avnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
Message-ID: <C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>

It looks like something is broken in your script as to how you are  
passing it a filename - it is trying to open a file called "BLASTN  
2.2.13 [Nov-27-2005]".
did you already open the file and are you passing data from the first  
line of the file to SearchIO perhaps?
Sending the relevant part of your script to the list will help us  
diagnose the problem better.

-jason
On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote:

> Hello all,
>
> script (parsing blastn report) that previously had worked today  
> "tell" me
> that:
>
> ------------- EXCEPTION  -------------
> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
> : No such file or directory
> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm: 
> 273
> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
> STACK toplevel parse-te-lib2.pl:3
>
> --------------------------------------
>
> What does it mean??
>
> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment-0004.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 2613 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070415/b2cef8ca/attachment.p7s>

From gdorjee at hotmail.com  Mon Apr 16 00:40:22 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
Message-ID: <10008507.post@talk.nabble.com>


hi guys,
thanks for your replies, but i still don't understand why it doesn't work.
my input fasta sequence looks fine. here, take a look,

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

is it possible that the script is not being about to read the
RemoteBlast.pm? but the thing is, i can run the standalone blast on the
command line, although i've never been able the run the same with cgi module
(by gettting the input from an html textarea). i don't understand. i've been
trying to get the standalone running for a while now, and i also mentioned
it in my previous postings....but all in vain. i haven't got over it yet. 
any help or an example would be much appreciated.


Spiros Denaxas wrote:
> 
> Yep, it must be in the input file. The
> 
> $result->database_name()
> 
> function gets called on $result the result object.
> 
> The error you get,
> 
> Can't call method "database_name" on an undefined value at
> test1_remote_swissblast.pl line 41, <GEN4> line 31.
> 
> means the result object is not defined thus the function fails since
> there are no data to operate on.
> 
> Spiros
> 
> On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
>> Hi DeeGee,
>>
>> Your script worked fine for me. Perhaps the problem is in your input
>> fasta file?
>>
>> Dave
>>
>> % perl test.pl AAC12660.fa
>> waiting... 5 units of time
>> waiting... 10 units of time
>> waiting... 15 units of time
>> database: Non-redundant SwissProt sequences
>> hit name is: sp|Q15750|TAB1_HUMAN
>> score is: 2413
>> hit name is: sp|Q8CF89|TAB1_MOUSE
>> score is: 2352
>> hit name is: sp|P49444|PP2C_PARTE
>> score is: 159
>> hit name is: sp|Q6ING9|PP2CK_XENLA
>> [...etc...]
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Mon Apr 16 00:40:22 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sun, 15 Apr 2007 17:40:22 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
Message-ID: <10008507.post@talk.nabble.com>


hi guys,
thanks for your replies, but i still don't understand why it doesn't work.
my input fasta sequence looks fine. here, take a look,

>gi|18676474|dbj|BAB84889.1| FLJ00134 protein [Homo sapiens]
HLSAQKASVGPESVSGLGTRTWPRVSCEVTVQCWPGCHLKVGGFKMAPWQGVGRRPWFLTWGPLCGAASVSPSMTVASSQ
QGWDCTAGRRWLGEGEIEALAQVSEFKTVLSFQGPAASPDGSSATRVPQDVTQGPGATGGKEDSGMIPLAGTAPGAEGPA
PGDSQAVRPYKQEPSSPPLAPGLPAFLAAPGTTSCPECGKTSLKPAHLLRHRQSHSGEKPHACPECGKAFRRKEHLRRHR
DTHPGSPGSPGPALRPLPAREKPHACCECGKTFYWREHLVRHRKTHSGARPFACWECGKGFGRREHVLRHQRIHGRAAAS
AQGAVAPGPDGGGPFPPWPLG

is it possible that the script is not being able to read the RemoteBlast.pm?
but the thing is, i can run the standalone blast on the command line,
although i've never been able the run the same with cgi module (by gettting
the input from an html textarea). i don't understand. i've been trying to
get the standalone running for a while now, and i also mentioned it in my
previous postings....but all in vain. i haven't got over it yet. 
any help or an example would be much appreciated.


Spiros Denaxas wrote:
> 
> Yep, it must be in the input file. The
> 
> $result->database_name()
> 
> function gets called on $result the result object.
> 
> The error you get,
> 
> Can't call method "database_name" on an undefined value at
> test1_remote_swissblast.pl line 41, <GEN4> line 31.
> 
> means the result object is not defined thus the function fails since
> there are no data to operate on.
> 
> Spiros
> 
> On 4/15/07, David Messina <dmessina at wustl.edu> wrote:
>> Hi DeeGee,
>>
>> Your script worked fine for me. Perhaps the problem is in your input
>> fasta file?
>>
>> Dave
>>
>> % perl test.pl AAC12660.fa
>> waiting... 5 units of time
>> waiting... 10 units of time
>> waiting... 15 units of time
>> database: Non-redundant SwissProt sequences
>> hit name is: sp|Q15750|TAB1_HUMAN
>> score is: 2413
>> hit name is: sp|Q8CF89|TAB1_MOUSE
>> score is: 2352
>> hit name is: sp|P49444|PP2C_PARTE
>> score is: 159
>> hit name is: sp|Q6ING9|PP2CK_XENLA
>> [...etc...]
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10008507
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From dmessina at wustl.edu  Mon Apr 16 02:43:06 2007
From: dmessina at wustl.edu (David Messina)
Date: Sun, 15 Apr 2007 21:43:06 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10008507.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
Message-ID: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>

You're right, it's not the input sequence. I just tried it with your  
script and it worked.


> is it possible that the script is not being about to read the
> RemoteBlast.pm?

I think the program wouldn't compile if that were the case, and your  
error message would be about not finding RemoteBlast.pm rather than  
the one you got.


> but the thing is, i can run the standalone blast on the
> command line, although i've never been able the run the same with  
> cgi module
> (by gettting the input from an html textarea). i don't understand.

This result really suggests that perl and Bioperl are not the issue.  
I'm not saying the following to give you the brushoff, but given the  
numerous ways in which web-based apps can fail and in which  
webservers can be installed, it might be best for you to find someone  
at your institution who can sit down with you and work through it.

Dave


From cjfields at uiuc.edu  Mon Apr 16 03:51:05 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 15 Apr 2007 22:51:05 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
Message-ID: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>

This sounds like a similar issue that popped up a few weeks ago  
related to URLAPI changes for remote BLAST access.  That was fixed on  
NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
well.

Saying that, my guess is the same as Dave's, that there are  
connectivity issues.  What happens when you set the RemoteBlast  
factory to a verbosity of 1?  This will spill out debugging output  
from the repeated queries to the NCBI server (so if there are  
problems they'll show up there).

...
my $factory = Bio::Tools::Run::RemoteBlast->new(
                                 '-prog'  => 'blastp',
                                 '-data' => 'swissprot',
                                  _READMETHOD => "Blast",
                                  -verbose => 1    # debugging output
                          );
...

If you see the BLAST report but get the same error try using the  
RemoteBlast in CVS to see if it fixes the problem.

chris


On Apr 15, 2007, at 9:43 PM, David Messina wrote:

> You're right, it's not the input sequence. I just tried it with your
> script and it worked.
>
>
>> is it possible that the script is not being about to read the
>> RemoteBlast.pm?
>
> I think the program wouldn't compile if that were the case, and your
> error message would be about not finding RemoteBlast.pm rather than
> the one you got.
>
>
>> but the thing is, i can run the standalone blast on the
>> command line, although i've never been able the run the same with
>> cgi module
>> (by gettting the input from an html textarea). i don't understand.
>
> This result really suggests that perl and Bioperl are not the issue.
> I'm not saying the following to give you the brushoff, but given the
> numerous ways in which web-based apps can fail and in which
> webservers can be installed, it might be best for you to find someone
> at your institution who can sit down with you and work through it.
>
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From dr.hogart at gmail.com  Mon Apr 16 07:03:46 2007
From: dr.hogart at gmail.com (sergei ryazansky)
Date: Mon, 16 Apr 2007 11:03:46 +0400
Subject: [Bioperl-l] error with blast parsing by searchIO
References: <op.tqt10r17avnppr@hogart.img.ras.ru>
	<C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>
Message-ID: <op.tqu68kvzavnppr@hogart.img.ras.ru>

The problem was resolved by the direct path (-file=>'d\...\input.txt') to  
input file in the my script.
I think that Chris right and i should update my bioperl to 1.5 version.
By the way, bioperl-1.5 is not accessible via ppm. Where I can download it  
for winXP?

On Sun, 15 Apr 2007 22:24:56 +0400, Jason Stajich <jason at bioperl.org>  
wrote:

> It looks like something is broken in your script as to how you are
> passing it a filename - it is trying to open a file called "BLASTN
> 2.2.13 [Nov-27-2005]".
> did you already open the file and are you passing data from the first
> line of the file to SearchIO perhaps?
> Sending the relevant part of your script to the list will help us
> diagnose the problem better.
>
> -jason
> On Apr 15, 2007, at 9:13 AM, sergei ryazansky wrote:
>
>> Hello all,
>>
>> script (parsing blastn report) that previously had worked today
>> "tell" me
>> that:
>>
>> ------------- EXCEPTION  -------------
>> MSG: Could not open BLASTN 2.2.13 [Nov-27-2005]
>> : No such file or directory
>> STACK Bio::Root::IO::_initialize_io c:/Perl/site/lib/Bio/Root/IO.pm:
>> 273
>> STACK Bio::Root::IO::new c:/Perl/site/lib/Bio/Root/IO.pm:213
>> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:135
>> STACK Bio::SearchIO::new c:/Perl/site/lib/Bio/SearchIO.pm:167
>> STACK toplevel parse-te-lib2.pl:3
>>
>> --------------------------------------
>>
>> What does it mean??
>>
>> ps. bioperl-1.4 with ActivePerl 5.8.7&5.8.8
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>


-- 
?????????? M2, ????????????? ???????? ?????????? Opera:  
http://www.opera.com/mail/mail/


From bix at sendu.me.uk  Mon Apr 16 08:34:56 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 16 Apr 2007 09:34:56 +0100
Subject: [Bioperl-l] error with blast parsing by searchIO
In-Reply-To: <op.tqu68kvzavnppr@hogart.img.ras.ru>
References: <op.tqt10r17avnppr@hogart.img.ras.ru>	<C1C40C71-E21F-42E1-AE4E-6D51F1CB9850@bioperl.org>
	<op.tqu68kvzavnppr@hogart.img.ras.ru>
Message-ID: <46233530.1010109@sendu.me.uk>

sergei ryazansky wrote:
> The problem was resolved by the direct path (-file=>'d\...\input.txt') to  
> input file in the my script.
> I think that Chris right and i should update my bioperl to 1.5 version.
> By the way, bioperl-1.5 is not accessible via ppm. Where I can download it  
> for winXP?

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 14:36:33 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 22:36:33 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>


Dear all,
 
Given a GO id, is there a way to extract all
the related gene names from that id with Perl?
 
Anybody has experience with that?
I've looked through GO module in CPAN, but can't seem
to find any tool that facilitated that searc
 
Look forward very much for your advice.
 
--
Edward WIJAYA
SINGAPORE

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From spiros at lokku.com  Mon Apr 16 15:10:49 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 16 Apr 2007 16:10:49 +0100
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>

Hi Edward,

What organism are you interested in? I have some code from my PhD
based on the Saccharomyces cerevisiae genome. Basically uses the SGD
flat files and a local MySQL instance of GO. Might be worth turning
into modules if people are interested in it, although it is pretty
organism oriented and the lack of abstraction might introduce a number
of problems.

Spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 15:14:09 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 23:14:09 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>


Hi Spiros,
 
Thanks for your reply. I am interested to apply it for 
all the kind of organisms related to that particular GO ID.
 
Do you have a CPAN module for that?
--
Edward WIJAYA
SINGAPORE

________________________________

From: s.denaxas at gmail.com on behalf of Spiros Denaxas
Sent: Mon 4/16/2007 11:10 PM
To: Wijaya Edward
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


Hi Edward,

What organism are you interested in? I have some code from my PhD
based on the Saccharomyces cerevisiae genome. Basically uses the SGD
flat files and a local MySQL instance of GO. Might be worth turning
into modules if people are interested in it, although it is pretty
organism oriented and the lack of abstraction might introduce a number
of problems.

Spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From dmessina at wustl.edu  Mon Apr 16 15:21:01 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 16 Apr 2007 10:21:01 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>

I use BioMART for this kind of thing. If you need to do this for more  
than a couple of GO terms, BioMART has a Perl API you can use to  
connect to their data.

http://www.biomart.org/

http://www.biomart.org/install-overview.html

Dave


From spiros at lokku.com  Mon Apr 16 15:21:40 2007
From: spiros at lokku.com (Spiros Denaxas)
Date: Mon, 16 Apr 2007 16:21:40 +0100
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
Message-ID: <bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>

Nope, I don't have a CPAN module for it, and to be honest, I don't
think I will release one for it until I actually finish my PhD. The
code is really scruffy at some parts, lacks documentation and might
not work under all setups. My plan is to take some time after and
clean it up and release a proper version of it to the public.

What you are talking about however, if I understand correctly, is a
much much bigger project. Different genome databases have different
formats and a potential module must take them all into consideration.
Then the issue of the different evidence codes GO annotators use
throughout different genomes and which you consider of higher or lower
quality respective.

If you happen to stumble upon such a module, please share it, it would
be very interesting !

spiros

On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> >
> > Dear all,
> >
> > Given a GO id, is there a way to extract all
> > the related gene names from that id with Perl?
> >
> > Anybody has experience with that?
> > I've looked through GO module in CPAN, but can't seem
> > to find any tool that facilitated that searc
> >
> > Look forward very much for your advice.
> >
> > --
> > Edward WIJAYA
> > SINGAPORE
> >
> > ------------ Institute For Infocomm Research - Disclaimer -------------
> > This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> > --------------------------------------------------------
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
>


From ewijaya at i2r.a-star.edu.sg  Mon Apr 16 15:33:27 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Mon, 16 Apr 2007 23:33:27 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>


Hi David, 
 
There seems to be no biomart-perl module in CPAN.
 
I tried their cvs:
cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login
 
But require password. Can suggest if there is another way to get this module?
 
--
Edward WIJAYA

________________________________

From: David Messina [mailto:dmessina at wustl.edu]
Sent: Mon 4/16/2007 11:21 PM
To: Wijaya Edward
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


I use BioMART for this kind of thing. If you need to do this for more 
than a couple of GO terms, BioMART has a Perl API you can use to 
connect to their data.

http://www.biomart.org/

http://www.biomart.org/install-overview.html

Dave


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From Kevin.M.Brown at asu.edu  Mon Apr 16 15:44:28 2007
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Mon, 16 Apr 2007 08:44:28 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net><BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
	<3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
Message-ID: <1A4207F8295607498283FE9E93B775B4030A4914@EX02.asurite.ad.asu.edu>

Did you follow the directions as listed at?

http://www.biomart.org/install-overview.html 


> There seems to be no biomart-perl module in CPAN.
>  
> I tried their cvs:
> cvs -d :pserver:cvsuser at cvs.sanger.ac.uk:/cvsroot/biomart login
>  
> But require password. Can suggest if there is another way to 
> get this module?


From dmessina at wustl.edu  Mon Apr 16 15:44:26 2007
From: dmessina at wustl.edu (David Messina)
Date: Mon, 16 Apr 2007 10:44:26 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<BDBF8338-69B2-4E60-AC56-3CD3D8852E9F@wustl.edu>
	<3ACF03E372996C4EACD542EA8A05E66A061685@mailbe01.teak.local.net>
Message-ID: <2D698B2E-49B9-411E-B1FA-C12F4A235EB2@wustl.edu>

The password you need to enter when asked is CVSUSER.

Dave


From sdavis2 at mail.nih.gov  Mon Apr 16 15:55:14 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Mon, 16 Apr 2007 11:55:14 -0400
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
	<bba689ec0704160821u7f9718d8mec40e7d3453a042c@mail.gmail.com>
Message-ID: <200704161155.14567.sdavis2@mail.nih.gov>


> > On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
> > > Dear all,
> > >
> > > Given a GO id, is there a way to extract all
> > > the related gene names from that id with Perl?

This is a pretty simple problem if you have the data in a useable format.  The 
data that you need are available here:

ftp://ftp.ncbi.nih.gov/gene/DATA

The README file gives details, but the files in this directory are all 
tab-delimited text.  Download the gene2go.gz file, which contains a mapping 
from Entrez Gene ID to GO accession.  Then, download the gene_info.gz file, 
which contains the information about the Entrez Gene ID, including 
description, gene symbol, etc.  If you need to link to other data, you can of 
course download the respective files from NCBI.  You can either load the data 
into a SQL database of some type for general queries, or you can simply read 
them into perl directly (with appropriate data structures) to do you mapping.  
Since they are tab-delimited text, I would choose the database route and then 
use SQL and DBI to do the queries you like.

Sean


From cjfields at uiuc.edu  Mon Apr 16 16:25:42 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 11:25:42 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
Message-ID: <ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>

You can limit EntrezGene searches by Gene Ontology ID using the [Gene  
Ontology] field in queries.  The following query:

'9220[Gene Ontology]'

will give 120 gene IDs.  You can get the same list using the still- 
under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm  
still working on this):

my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
                                        -db => 'gene',
                                        -term => '9220[Gene Ontology]',
                                        -retmax => 300);
$esearch->get_response;
my @ids = $esearch->get_ids;
print join "\n", at ids;

In my opinion, Sean's idea of using SQL is probably better if you  
have tons of searches to do.

chris

On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:

>
> Dear all,
>
> Given a GO id, is there a way to extract all
> the related gene names from that id with Perl?
>
> Anybody has experience with that?
> I've looked through GO module in CPAN, but can't seem
> to find any tool that facilitated that searc
>
> Look forward very much for your advice.
>
> --
> Edward WIJAYA
> SINGAPORE
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Apr 16 18:34:25 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 13:34:25 -0500
Subject: [Bioperl-l] Bio::Matrix::PSM::ProtPsm
Message-ID: <CA820306-7480-478D-BD3E-A0F094943065@uiuc.edu>

I was going through tests converting to Test::More and found this  
module is largely unimplemented (relevant tests are in t/ProtPsm.t in  
CVS).  It was written by James Thompson a few years ago and the  
module docs seem to indicate some uncertainty on what this class is  
meant to accomplish.  Does anyone know the status of this code?

chris


From cjm at fruitfly.org  Mon Apr 16 18:49:23 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 11:49:23 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com>
	<3ACF03E372996C4EACD542EA8A05E66A061684@mailbe01.teak.local.net>
Message-ID: <AAF82F3A-3C75-4D51-AFD4-FDE358391A03@fruitfly.org>


Download:
http://search.cpan.org/~cmungall/go-db-perl

or do:

cpan GO::AppHandle

The API call you want is here:
http://search.cpan.org/~cmungall/go-db-perl/GO/ 
AppHandle.pm#get_deep_products

Here is an example snippet:

   use GO::AppHandle;
   my $apph=GO::AppHandle->connect(@ARGV);
   my $go_acc = shift @ARGV;
   my $gps = $apph->get_deep_products({term=>{acc=>$go_acc}});
   foreach my $gp (@$gps) {
     printf "%s %s\n", $gp->xref->acc, $gp->symbol;
   }

You will need to download the GO Database.

Cheers
Chris

On Apr 16, 2007, at 8:14 AM, Wijaya Edward wrote:

>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO)  
> with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer  
>> -------------
>> This email is confidential and may be privileged.  If you are not  
>> the intended recipient, please delete it and notify us  
>> immediately. Please do not copy or use it for any purpose, or  
>> disclose its contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer  
> -------------
> This email is confidential and may be privileged.  If you are not  
> the intended recipient, please delete it and notify us immediately.  
> Please do not copy or use it for any purpose, or disclose its  
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From gdorjee at hotmail.com  Mon Apr 16 19:10:01 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:10:01 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
Message-ID: <10022463.post@talk.nabble.com>


hi Chris,
thanks for your reply. i set the RemoteBlast factory to a verbosity of 1,
and i get the same error message. i'm new to all these. so, could you plz
tell me how can i do the RemoteBlast in CVS that you've suggested.

cheers!!!
 

Chris Fields wrote:
> 
> This sounds like a similar issue that popped up a few weeks ago  
> related to URLAPI changes for remote BLAST access.  That was fixed on  
> NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
> well.
> 
> Saying that, my guess is the same as Dave's, that there are  
> connectivity issues.  What happens when you set the RemoteBlast  
> factory to a verbosity of 1?  This will spill out debugging output  
> from the repeated queries to the NCBI server (so if there are  
> problems they'll show up there).
> 
> ...
> my $factory = Bio::Tools::Run::RemoteBlast->new(
>                                  '-prog'  => 'blastp',
>                                  '-data' => 'swissprot',
>                                   _READMETHOD => "Blast",
>                                   -verbose => 1    # debugging output
>                           );
> ...
> 
> If you see the BLAST report but get the same error try using the  
> RemoteBlast in CVS to see if it fixes the problem.
> 
> chris
> 
> 
> On Apr 15, 2007, at 9:43 PM, David Messina wrote:
> 
>> You're right, it's not the input sequence. I just tried it with your
>> script and it worked.
>>
>>
>>> is it possible that the script is not being about to read the
>>> RemoteBlast.pm?
>>
>> I think the program wouldn't compile if that were the case, and your
>> error message would be about not finding RemoteBlast.pm rather than
>> the one you got.
>>
>>
>>> but the thing is, i can run the standalone blast on the
>>> command line, although i've never been able the run the same with
>>> cgi module
>>> (by gettting the input from an html textarea). i don't understand.
>>
>> This result really suggests that perl and Bioperl are not the issue.
>> I'm not saying the following to give you the brushoff, but given the
>> numerous ways in which web-based apps can fail and in which
>> webservers can be installed, it might be best for you to find someone
>> at your institution who can sit down with you and work through it.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022463
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From gdorjee at hotmail.com  Mon Apr 16 19:11:18 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:11:18 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
Message-ID: <10022464.post@talk.nabble.com>


Thank you, David.


David Messina-2 wrote:
> 
> You're right, it's not the input sequence. I just tried it with your  
> script and it worked.
> 
> 
>> is it possible that the script is not being about to read the
>> RemoteBlast.pm?
> 
> I think the program wouldn't compile if that were the case, and your  
> error message would be about not finding RemoteBlast.pm rather than  
> the one you got.
> 
> 
>> but the thing is, i can run the standalone blast on the
>> command line, although i've never been able the run the same with  
>> cgi module
>> (by gettting the input from an html textarea). i don't understand.
> 
> This result really suggests that perl and Bioperl are not the issue.  
> I'm not saying the following to give you the brushoff, but given the  
> numerous ways in which web-based apps can fail and in which  
> webservers can be installed, it might be best for you to find someone  
> at your institution who can sit down with you and work through it.
> 
> Dave
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022464
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjm at fruitfly.org  Mon Apr 16 18:41:59 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 11:41:59 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
Message-ID: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>


Unless the Entrez interface has changed since I last looked, the  
query below for "pyrimidine ribonucleotide biosynthetic process" will  
NOT perform the transitive closure over the graph; this means genes  
and gene products annotated to GO:0009174 "pyrimidine ribonucleoside  
monophosphate biosynthetic process", for example

On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:

> You can limit EntrezGene searches by Gene Ontology ID using the [Gene
> Ontology] field in queries.  The following query:
>
> '9220[Gene Ontology]'
>
> will give 120 gene IDs.  You can get the same list using the still-
> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
> still working on this):
>
> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>                                         -db => 'gene',
>                                         -term => '9220[Gene  
> Ontology]',
>                                         -retmax => 300);
> $esearch->get_response;
> my @ids = $esearch->get_ids;
> print join "\n", at ids;
>
> In my opinion, Sean's idea of using SQL is probably better if you
> have tons of searches to do.
>
> chris
>
> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer
>> -------------
>> This email is confidential and may be privileged.  If you are not
>> the intended recipient, please delete it and notify us immediately.
>> Please do not copy or use it for any purpose, or disclose its
>> contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at uiuc.edu  Mon Apr 16 19:25:14 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 14:25:14 -0500
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
	<50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
Message-ID: <3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>

You are correct; it explains why the list is only 120 genes.  The  
only way (currently) to do so would be to perform the closure locally  
somehow (maybe via go-perl or similar).

chris

On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:

>
> Unless the Entrez interface has changed since I last looked, the  
> query below for "pyrimidine ribonucleotide biosynthetic process"  
> will NOT perform the transitive closure over the graph; this means  
> genes and gene products annotated to GO:0009174 "pyrimidine  
> ribonucleoside monophosphate biosynthetic process", for example
>
> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>
>> You can limit EntrezGene searches by Gene Ontology ID using the [Gene
>> Ontology] field in queries.  The following query:
>>
>> '9220[Gene Ontology]'
>>
>> will give 120 gene IDs.  You can get the same list using the still-
>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>> still working on this):
>>
>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>                                         -db => 'gene',
>>                                         -term => '9220[Gene  
>> Ontology]',
>>                                         -retmax => 300);
>> $esearch->get_response;
>> my @ids = $esearch->get_ids;
>> print join "\n", at ids;
>>
>> In my opinion, Sean's idea of using SQL is probably better if you
>> have tons of searches to do.
>>
>> chris
>>
>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>
>>>
>>> Dear all,
>>>
>>> Given a GO id, is there a way to extract all
>>> the related gene names from that id with Perl?
>>>
>>> Anybody has experience with that?
>>> I've looked through GO module in CPAN, but can't seem
>>> to find any tool that facilitated that searc
>>>
>>> Look forward very much for your advice.
>>>
>>> --
>>> Edward WIJAYA
>>> SINGAPORE
>>>
>>> ------------ Institute For Infocomm Research - Disclaimer
>>> -------------
>>> This email is confidential and may be privileged.  If you are not
>>> the intended recipient, please delete it and notify us immediately.
>>> Please do not copy or use it for any purpose, or disclose its
>>> contents to any other person. Thank you.
>>> --------------------------------------------------------
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr 16 19:27:32 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 12:27:32 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
Message-ID: <10022661.post@talk.nabble.com>


hi Chris, 
sorry to bother you again, but could you plz check the following script to
see what's wrong. i've been getting errors like :

Content-type: text/html
Software error:
------------- EXCEPTION  -------------
MSG:   (0) not Bio::Seq object or array of Bio::Seq objects or file name!
STACK Bio::Tools::Run::StandAloneBlast::blastall
/usr/perl5/5.6.1/lib/Bio/Tools/Run/StandAloneBlast.pm:532
STACK toplevel /usr/local/apache2/htdocs/rmtest.pl:46
--------------------------------------

#### the script ######
#!/usr/bin/perl -w
use strict;
use warnings;

use Bio::SeqIO;
use Bio::SearchIO;
use Bio::DB::GenPept; 
use Bio::Tools::Run::StandAloneBlast;
use CGI;
use CGI::Carp qw(fatalsToBrowser);

my $cgi = new CGI;

print $cgi->header,
$cgi->start_html(-title=>'A StandAloneBlast Test'),
$cgi->h1('Blast Result'),
$cgi->start_form,
"Enter or paste an amino-acid sequence? ",
$cgi->p,
$cgi->textarea(-name=>'name', rows=>10, -columns=>60),
$cgi->p,
$cgi->submit,
$cgi->end_form,
$cgi->hr;

open(OUTPUT,">result/query.faa");

if ($cgi->param()) {
        my $seq = $cgi->param('name');
        print OUTPUT $seq;

my @params = ('program'=>'blastp', 'database' =>
'/export/home/dorjee/database/nrpart', 'outfile' => 'result/blast.out',
_READMETHOD => 'Blast');
my $factory = Bio::Tools::Run::StandAloneBlast->new(@params);

# Blast a sequence against a database:
my $str = Bio::SeqIO->new(-file => "result/query.faa", '-format' => 'Fasta'
);
my $input = $str->next_seq();
my $blast_report = $factory->blastall($input);
}


Chris Fields wrote:
> 
> This sounds like a similar issue that popped up a few weeks ago  
> related to URLAPI changes for remote BLAST access.  That was fixed on  
> NCBI's end but I also added a fix to RemoteBlast in CVS that works as  
> well.
> 
> Saying that, my guess is the same as Dave's, that there are  
> connectivity issues.  What happens when you set the RemoteBlast  
> factory to a verbosity of 1?  This will spill out debugging output  
> from the repeated queries to the NCBI server (so if there are  
> problems they'll show up there).
> 
> ...
> my $factory = Bio::Tools::Run::RemoteBlast->new(
>                                  '-prog'  => 'blastp',
>                                  '-data' => 'swissprot',
>                                   _READMETHOD => "Blast",
>                                   -verbose => 1    # debugging output
>                           );
> ...
> 
> If you see the BLAST report but get the same error try using the  
> RemoteBlast in CVS to see if it fixes the problem.
> 
> chris
> 
> 
> On Apr 15, 2007, at 9:43 PM, David Messina wrote:
> 
>> You're right, it's not the input sequence. I just tried it with your
>> script and it worked.
>>
>>
>>> is it possible that the script is not being about to read the
>>> RemoteBlast.pm?
>>
>> I think the program wouldn't compile if that were the case, and your
>> error message would be about not finding RemoteBlast.pm rather than
>> the one you got.
>>
>>
>>> but the thing is, i can run the standalone blast on the
>>> command line, although i've never been able the run the same with
>>> cgi module
>>> (by gettting the input from an html textarea). i don't understand.
>>
>> This result really suggests that perl and Bioperl are not the issue.
>> I'm not saying the following to give you the brushoff, but given the
>> numerous ways in which web-based apps can fail and in which
>> webservers can be installed, it might be best for you to find someone
>> at your institution who can sit down with you and work through it.
>>
>> Dave
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10022661
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Apr 16 19:37:58 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 14:37:58 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10022463.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
Message-ID: <5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>

The 'verbose' setting doesn't change the way the BLAST query is sent,  
it just sends the raw output from the repeated attempts to retrieve  
the report (using the RID) to STDERR.  The error you saw won't be  
fixed by doing so.

What I was interested in was the raw HTML output dumped to the  
screen.  If it is querying the NCBI server it should dump stuff that  
includes something like this:

...
<HTML>
<p></p>
<!--
QBlastInfoBegin
         Status=WAITING
QBlastInfoEnd
--><p></p>
<SCRIPT LANGUAGE="JavaScript"><!--
...

which indicates you have a request in the BLAST queue.  If you aren't  
seeing anything then the problem is likely network-related on your  
end, so getting the latest RemoteBlast won't help.  Do any other  
BioPerl modules requiring network access work (Bio::DB::GenBank, for  
instance)?  If not it could be a proxy issue...

Just in case, here's the browsable CVS location for RemoteBlast:

http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
Tools/Run/RemoteBlast.pm?cvsroot=bioperl

Click on the download link and save over your local version.

chris

On Apr 16, 2007, at 2:10 PM, DeeGee wrote:

>
> hi Chris,
> thanks for your reply. i set the RemoteBlast factory to a verbosity  
> of 1,
> and i get the same error message. i'm new to all these. so, could  
> you plz
> tell me how can i do the RemoteBlast in CVS that you've suggested.
>
> cheers!!!


From gdorjee at hotmail.com  Mon Apr 16 20:42:37 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 16 Apr 2007 13:42:37 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
Message-ID: <10024333.post@talk.nabble.com>


hi 
i tried the following code just to check the network, and it worked fine
except for the SwissProt part, for which i got the error message instead of
the sequence:

------------- EXCEPTION  -------------
MSG: swissprot stream with no ID. Not swissprot in my book
STACK Bio::SeqIO::swiss::next_seq
/usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
/usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
STACK toplevel bbbbb.pl:21
--------------------------------------

#### check #####
#!/usr/bin/perl -w
use strict;
use Bio::DB::GenBank;
use Bio::DB::SwissProt;
use Bio::DB::GenPept;
use Bio::SeqIO;

my $genpeptdb = new Bio::DB::GenPept();
my $genbankdb = new Bio::DB::GenBank();
my $swissdb = new Bio::DB::SwissProt();

my $seqio = new Bio::SeqIO(-format => 'fasta',
                           -fh     => \*STDOUT);

my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
$seqio->write_seq($protseq);

my $seq = $genbankdb->get_Seq_by_acc('AF303112');
$seqio->write_seq($seq);

$protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
$seqio->write_seq($protseq);

thanks a lot.


Chris Fields wrote:
> 
> The 'verbose' setting doesn't change the way the BLAST query is sent,  
> it just sends the raw output from the repeated attempts to retrieve  
> the report (using the RID) to STDERR.  The error you saw won't be  
> fixed by doing so.
> 
> What I was interested in was the raw HTML output dumped to the  
> screen.  If it is querying the NCBI server it should dump stuff that  
> includes something like this:
> 
> ...
> <HTML>
> <p></p>
> <!--
> QBlastInfoBegin
>          Status=WAITING
> QBlastInfoEnd
> --><p></p>
> <SCRIPT LANGUAGE="JavaScript"><!--
> ...
> 
> which indicates you have a request in the BLAST queue.  If you aren't  
> seeing anything then the problem is likely network-related on your  
> end, so getting the latest RemoteBlast won't help.  Do any other  
> BioPerl modules requiring network access work (Bio::DB::GenBank, for  
> instance)?  If not it could be a proxy issue...
> 
> Just in case, here's the browsable CVS location for RemoteBlast:
> 
> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/ 
> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
> 
> Click on the download link and save over your local version.
> 
> chris
> 
> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
> 
>>
>> hi Chris,
>> thanks for your reply. i set the RemoteBlast factory to a verbosity  
>> of 1,
>> and i get the same error message. i'm new to all these. so, could  
>> you plz
>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>
>> cheers!!!
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10024333
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Apr 16 22:24:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 17:24:11 -0500
Subject: [Bioperl-l] HOWTO:Writing BioPerl Tests
Message-ID: <547A30CD-6BAA-4C08-A935-9975634691B2@uiuc.edu>

I have posted a quickie HOWTO on writing up BioPerl tests using  
Test::More.  If anyone wants to add to it feel free (make sure to  
credit yourself in the authors section).

http://www.bioperl.org/wiki/HOWTO:Writing_BioPerl_Tests

There is space in there if we decide to add more modules for  
enhancing tests (I think Nathan suggested Test::Exception or similar).

chris


From cjfields at uiuc.edu  Mon Apr 16 23:24:32 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 16 Apr 2007 18:24:32 -0500
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <10024333.post@talk.nabble.com>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
	<10024333.post@talk.nabble.com>
Message-ID: <54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>

What version of bioperl are you using?  I get an error but it is b/c  
the ID doesn't exist.

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: acc KPYK_ECOLI does not exist
STACK: Error::throw
STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
Root/Root.pm:359
STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /Users/cjfields/src/bioperl- 
live/Bio/DB/WebDBSeqI.pm:181
STACK: genpept.pl:21
-----------------------------------------------------------

The actual accession is 'KPYK1_ECOLI'.

chris

On Apr 16, 2007, at 3:42 PM, DeeGee wrote:

>
> hi
> i tried the following code just to check the network, and it worked  
> fine
> except for the SwissProt part, for which i got the error message  
> instead of
> the sequence:
>
> ------------- EXCEPTION  -------------
> MSG: swissprot stream with no ID. Not swissprot in my book
> STACK Bio::SeqIO::swiss::next_seq
> /usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
> /usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
> STACK toplevel bbbbb.pl:21
> --------------------------------------
>
> #### check #####
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::GenBank;
> use Bio::DB::SwissProt;
> use Bio::DB::GenPept;
> use Bio::SeqIO;
>
> my $genpeptdb = new Bio::DB::GenPept();
> my $genbankdb = new Bio::DB::GenBank();
> my $swissdb = new Bio::DB::SwissProt();
>
> my $seqio = new Bio::SeqIO(-format => 'fasta',
>                            -fh     => \*STDOUT);
>
> my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
> $seqio->write_seq($protseq);
>
> my $seq = $genbankdb->get_Seq_by_acc('AF303112');
> $seqio->write_seq($seq);
>
> $protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
> $seqio->write_seq($protseq);
>
> thanks a lot.
>
>
> Chris Fields wrote:
>>
>> The 'verbose' setting doesn't change the way the BLAST query is sent,
>> it just sends the raw output from the repeated attempts to retrieve
>> the report (using the RID) to STDERR.  The error you saw won't be
>> fixed by doing so.
>>
>> What I was interested in was the raw HTML output dumped to the
>> screen.  If it is querying the NCBI server it should dump stuff that
>> includes something like this:
>>
>> ...
>> <HTML>
>> <p></p>
>> <!--
>> QBlastInfoBegin
>>          Status=WAITING
>> QBlastInfoEnd
>> --><p></p>
>> <SCRIPT LANGUAGE="JavaScript"><!--
>> ...
>>
>> which indicates you have a request in the BLAST queue.  If you aren't
>> seeing anything then the problem is likely network-related on your
>> end, so getting the latest RemoteBlast won't help.  Do any other
>> BioPerl modules requiring network access work (Bio::DB::GenBank, for
>> instance)?  If not it could be a proxy issue...
>>
>> Just in case, here's the browsable CVS location for RemoteBlast:
>>
>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
>>
>> Click on the download link and save over your local version.
>>
>> chris
>>
>> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
>>
>>>
>>> hi Chris,
>>> thanks for your reply. i set the RemoteBlast factory to a verbosity
>>> of 1,
>>> and i get the same error message. i'm new to all these. so, could
>>> you plz
>>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>>
>>> cheers!!!
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
> -- 
> View this message in context: http://www.nabble.com/error-while- 
> remote-blast-against-swissprot-db-tf3577674.html#a10024333
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjm at fruitfly.org  Tue Apr 17 00:59:46 2007
From: cjm at fruitfly.org (Chris Mungall)
Date: Mon, 16 Apr 2007 17:59:46 -0700
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl
In-Reply-To: <3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu>
	<50A1CCF2-4650-4F87-8386-DB0E87292023@fruitfly.org>
	<3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu>
Message-ID: <9612F3E7-F239-49C1-A5BE-E10FF2FC2063@fruitfly.org>


You could perform the closure locally and then iterate over the  
individual IDs or construct a big disjunctive query to Entrez -  
either way it's not so efficient, especially for less specific nodes  
(distributed queries with ontologies is an interesting challenge).

Soon you'll be able to do the same query over the GO Database / AmiGO  
using a REST API

On Apr 16, 2007, at 12:25 PM, Chris Fields wrote:

> You are correct; it explains why the list is only 120 genes.  The
> only way (currently) to do so would be to perform the closure locally
> somehow (maybe via go-perl or similar).
>
> chris
>
> On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:
>
>>
>> Unless the Entrez interface has changed since I last looked, the
>> query below for "pyrimidine ribonucleotide biosynthetic process"
>> will NOT perform the transitive closure over the graph; this means
>> genes and gene products annotated to GO:0009174 "pyrimidine
>> ribonucleoside monophosphate biosynthetic process", for example
>>
>> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>>
>>> You can limit EntrezGene searches by Gene Ontology ID using the  
>>> [Gene
>>> Ontology] field in queries.  The following query:
>>>
>>> '9220[Gene Ontology]'
>>>
>>> will give 120 gene IDs.  You can get the same list using the still-
>>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>>> still working on this):
>>>
>>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                         -db => 'gene',
>>>                                         -term => '9220[Gene
>>> Ontology]',
>>>                                         -retmax => 300);
>>> $esearch->get_response;
>>> my @ids = $esearch->get_ids;
>>> print join "\n", at ids;
>>>
>>> In my opinion, Sean's idea of using SQL is probably better if you
>>> have tons of searches to do.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> Given a GO id, is there a way to extract all
>>>> the related gene names from that id with Perl?
>>>>
>>>> Anybody has experience with that?
>>>> I've looked through GO module in CPAN, but can't seem
>>>> to find any tool that facilitated that searc
>>>>
>>>> Look forward very much for your advice.
>>>>
>>>> --
>>>> Edward WIJAYA
>>>> SINGAPORE
>>>>
>>>> ------------ Institute For Infocomm Research - Disclaimer
>>>> -------------
>>>> This email is confidential and may be privileged.  If you are not
>>>> the intended recipient, please delete it and notify us immediately.
>>>> Please do not copy or use it for any purpose, or disclose its
>>>> contents to any other person. Thank you.
>>>> --------------------------------------------------------
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ewijaya at i2r.a-star.edu.sg  Tue Apr 17 03:51:18 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Tue, 17 Apr 2007 11:51:18 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<ED0EBAAF-3755-4235-B215-EBE620F8DD3C@uiuc.edu><50A1CCF2-4650-4F87-8386-DB0
	E87292023@fruitfly.org><3D7F9BDC-EB03-471B-BDC8-7B649664D320@uiuc.edu><9612
	F3E7-F239-49C1-A5BE-E10FF2FC2063@fruitfly.org>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061686@mailbe01.teak.local.net>


Thanks so much for all the suggestion.
It was really helpful to me. 
 
--
Edward WIJAYA

________________________________

From: Chris Mungall [mailto:cjm at fruitfly.org]
Sent: Tue 4/17/2007 8:59 AM
To: Chris Fields
Cc: bioperl-l at lists.open-bio.org; Wijaya Edward
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


You could perform the closure locally and then iterate over the 
individual IDs or construct a big disjunctive query to Entrez - 
either way it's not so efficient, especially for less specific nodes 
(distributed queries with ontologies is an interesting challenge).

Soon you'll be able to do the same query over the GO Database / AmiGO 
using a REST API

On Apr 16, 2007, at 12:25 PM, Chris Fields wrote:

> You are correct; it explains why the list is only 120 genes.  The
> only way (currently) to do so would be to perform the closure locally
> somehow (maybe via go-perl or similar).
>
> chris
>
> On Apr 16, 2007, at 1:41 PM, Chris Mungall wrote:
>
>>
>> Unless the Entrez interface has changed since I last looked, the
>> query below for "pyrimidine ribonucleotide biosynthetic process"
>> will NOT perform the transitive closure over the graph; this means
>> genes and gene products annotated to GO:0009174 "pyrimidine
>> ribonucleoside monophosphate biosynthetic process", for example
>>
>> On Apr 16, 2007, at 9:25 AM, Chris Fields wrote:
>>
>>> You can limit EntrezGene searches by Gene Ontology ID using the 
>>> [Gene
>>> Ontology] field in queries.  The following query:
>>>
>>> '9220[Gene Ontology]'
>>>
>>> will give 120 gene IDs.  You can get the same list using the still-
>>> under-development Bio::DB::EUtilities (usual EUtilities caveat: I'm
>>> still working on this):
>>>
>>> my $esearch = Bio::DB::EUtilities->new(-eutil => 'esearch',
>>>                                         -db => 'gene',
>>>                                         -term => '9220[Gene
>>> Ontology]',
>>>                                         -retmax => 300);
>>> $esearch->get_response;
>>> my @ids = $esearch->get_ids;
>>> print join "\n", at ids;
>>>
>>> In my opinion, Sean's idea of using SQL is probably better if you
>>> have tons of searches to do.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 9:36 AM, Wijaya Edward wrote:
>>>
>>>>
>>>> Dear all,
>>>>
>>>> Given a GO id, is there a way to extract all
>>>> the related gene names from that id with Perl?
>>>>
>>>> Anybody has experience with that?
>>>> I've looked through GO module in CPAN, but can't seem
>>>> to find any tool that facilitated that searc
>>>>
>>>> Look forward very much for your advice.
>>>>
>>>> --
>>>> Edward WIJAYA
>>>> SINGAPORE
>>>>
>>>> ------------ Institute For Infocomm Research - Disclaimer
>>>> -------------
>>>> This email is confidential and may be privileged.  If you are not
>>>> the intended recipient, please delete it and notify us immediately.
>>>> Please do not copy or use it for any purpose, or disclose its
>>>> contents to any other person. Thank you.
>>>> --------------------------------------------------------
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Robert Switzer
>>> Dept of Biochemistry
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From hlapp at gmx.net  Tue Apr 17 04:00:55 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 17 Apr 2007 00:00:55 -0400
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>

Hi Leighton, please see below:

On Apr 16, 2007, at 11:55 AM, Leighton Pritchard wrote:

> Hi,
>
> I've been trying to upload the GO into a clean BioSQL (MySQL, 1.4.1)
> schema using the BioPerl bp_load_ontology.pl script, with the OBOv1.0,
> OBOv1.2, and the most recent flatfiles from
> http://www.geneontology.org/GO.downloads.ontology.shtml - none of my
> attempts have been successful.  The errors below are from a Linux
> installation, but the same errors are thrown on OS X, too.  I am using
> the most recent versions of BioPerl and bioperl-db, installed via  
> CPAN:
>
> [lpritc at lplinuxdev sequence_data]$ perl -MBio::Root::Version -e 'print
> $Bio::Root::Version::VERSION,"\n"'
> 1.005002102
>
> and bioperl-db 1.5.2.
>
> I have attached the traceback below (running with --safe throws a  
> number
> of equivalent errors),

Using --safe will throw the same errors, but will continue loading.  
I.e., you'd lose the one term, but keep everything else.

I do realize that especially for a graph losing an internal node can  
be quite detrimental.

> [...]
> ########
>
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format obo ~/Downloads/gene_ontology_edit.obo
> Loading ontology gene_ontology:
>         ... terms
>         ... relationships
>         Done with gene_ontology.
> Loading ontology biological_process:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------

This would point to a problem of the BioPerl obo parser. According to  
the message, both the database name and the accession of the db_xref  
for the term are - surely erroneously - empty. Apparently the parser  
fails to parse out database and accession for this db_xref of term GO: 
0018901.

If you can edit the obo file, you can try deleting the db_xref(s) for  
that term that look odd (or delete all if you don't need them).

I'd have to debug the obo parser to see exactly where it's going  
wrong in parsing.

> Could not store term GO:0018901, name '2,4-dichlorophenoxyacetic acid
> metabolic process':
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> [...]
> [lpritc at lplinuxdev sequence_data]$ bp_load_ontology.pl --host  
> localhost
> --dbname biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass
> ******** --format goflat --fmtargs ~/Downloads/GO.defs

Note that the argument for --fmtargs here should read
"-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
there is no tilde expansion.)

> ~/Downloads/function.ontology
> Loading ontology Gene Ontology:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("MetaCyc","2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RXN","0","")  
> FKs
> ()
> Duplicate entry '2\,3-DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX- 
> MetaCyc-0' for
> key 2
> ---------------------------------------------------

This is one the things why you've got to love MySQL (and I am correct  
in inferring that you're using MySQL?). The width of the  
dbxref.accession column (for which the second value in parentheses  
is) is 40 chars. The apparently pre-existing value ("2\,3- 
DIHYDROXYINDOLE-2\,3-DIOXYGENASE-RX-MetaCyc-0") is 50 chars, which  
when loaded should have resulted in an exception. Instead, MySQL just  
simply and silently truncates it to 40 chars, which makes it  
identical to the first 40 chars of "2\,3-DIHYDROXYINDOLE-2\,3- 
DIOXYGENASE-RXN" (which is 41 chars in length).

It may be necessary to widen the length of dbname.accession here, for  
example to 80 chars? Let me know if you need help with the DDL  
command to do this.

Let me know how far this gets you.

	-hilmar

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From lubapardo at gmail.com  Tue Apr 17 09:16:04 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 17 Apr 2007 10:16:04 +0100
Subject: [Bioperl-l] CVS AND PAML
Message-ID: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>

Dear all,
I have two questions.
1.) I am trying to download some modules from Bioperl-run via CVS but I can
not login.

$ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.

The error I get is: time out, failed to connect to the server. I have
no trouble to download other files and I installed bioperl modules via
CPAN and it works.

2) The second question I have is that I am using the PAML:CODEML
module to do phylogenetic analysis.

I have used the example provided in the HOWTO:PAML (also given as
example: pairwise_ka_ks.PL). The program does not crash but it returns
and empty object. I think the problem is in the last part of the
script because I manage to get sequences and also the alignment, but I
can not get any ka, ks value. I am not sure whether there is a bug in
the last part of the script.

Does anyone have an idea?

Thank you very much

Luba Pardo

$kaks_factory->alignment($dna_aln);

# run the KaKs analysis
my ($rc,$parser) = $kaks_factory->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();

my @otus = $result->get_seqs();
# this gives us a mapping from the PAML order of sequences back to
# the input order (since names get truncated)
my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
    my $c= 1;
    foreach my $s ( @each ) {
      last if( $s->display_id eq $_->display_id );
      $c++;
    }
    $c;
   } @otus;

print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
<http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
<http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
for( my $i = 0; $i < (scalar
<http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
$i++) {
  for( my $j = $i+1; $j < (scalar
<http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
    my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
    my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
    print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
join <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
$otus[$i]->display_id,
                         $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
                         $MLmatrix->[$i]->[$j]->{'dS'},
                         $MLmatrix->[$i]->[$j]->{'omega'},
                         sprintf
<http://www.perldoc.com/perl5.6/pod/func/sprintf.html>("%.2f",$sub_aa_aln->percentage_identity),
                         sprintf
<http://www.perldoc.com/perl5.6/pod/func/sprintf.html>("%.2f",$sub_dna_aln->percentage_identity),
                         ), "\n";
  }
}


From avilella at gmail.com  Tue Apr 17 09:25:40 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 17 Apr 2007 10:25:40 +0100
Subject: [Bioperl-l] CVS AND PAML
In-Reply-To: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
References: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
Message-ID: <358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>

hmmm, there are some perldoc links around your code snippet. can you post
the code again? what is the input data you are trying this with?

On 4/17/07, Luba Pardo <lubapardo at gmail.com> wrote:
>
> Dear all,
> I have two questions.
> 1.) I am trying to download some modules from Bioperl-run via CVS but I
> can
> not login.
>
> $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.
>
> The error I get is: time out, failed to connect to the server. I have
> no trouble to download other files and I installed bioperl modules via
> CPAN and it works.
>
> 2) The second question I have is that I am using the PAML:CODEML
> module to do phylogenetic analysis.
>
> I have used the example provided in the HOWTO:PAML (also given as
> example: pairwise_ka_ks.PL). The program does not crash but it returns
> and empty object. I think the problem is in the last part of the
> script because I manage to get sequences and also the alignment, but I
> can not get any ka, ks value. I am not sure whether there is a bug in
> the last part of the script.
>
> Does anyone have an idea?
>
> Thank you very much
>
> Luba Pardo
>
> $kaks_factory->alignment($dna_aln);
>
> # run the KaKs analysis
> my ($rc,$parser) = $kaks_factory->run();
> my $result = $parser->next_result;
> my $MLmatrix = $result->get_MLmatrix();
>
> my @otus = $result->get_seqs();
> # this gives us a mapping from the PAML order of sequences back to
> # the input order (since names get truncated)
> my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
>     my $c= 1;
>     foreach my $s ( @each ) {
>       last if( $s->display_id eq $_->display_id );
>       $c++;
>     }
>     $c;
>    } @otus;
>
> print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
> <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
> <http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
> Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
> for( my $i = 0; $i < (scalar
> <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
> $i++) {
>   for( my $j = $i+1; $j < (scalar
> <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
>     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
>     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
>     print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
> join <http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
> $otus[$i]->display_id,
>
> $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
>                          $MLmatrix->[$i]->[$j]->{'dS'},
>                          $MLmatrix->[$i]->[$j]->{'omega'},
>                          sprintf
> <http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> >("%.2f",$sub_aa_aln->percentage_identity),
>                          sprintf
> <http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> >("%.2f",$sub_dna_aln->percentage_identity),
>                          ), "\n";
>   }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From IoannisKirmitzoglou at gmail.com  Tue Apr 17 13:05:37 2007
From: IoannisKirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Tue, 17 Apr 2007 06:05:37 -0700 (PDT)
Subject: [Bioperl-l] Parsing FASTA m10 output
Message-ID: <10034698.post@talk.nabble.com>


I apologize if this question has already been answered but my search came up
with no relevant results.
I am new to the FASTA program and after reading the fasta3x.doc I decided to
run it using the m10 output. The reason for doing such a choice was 

Quote from fasta3x.doc:  
     -m 10 is a new, parseable format for use with other
     programs.... 


I ran FASTA in batch mode and waited about 3-4 days for the results.
My problem is that today, when i started writing a perl script to parse the
output I realized that SearchIO doesn't supports m10 format.
Seems like I had to be more careful...
Before starting coding a module that will be able to parse the output (or
re-running FASTA with -m9 -d0 switches which will take 4 more days) I would
be really thankful if any of you knows of any other way to parse those
files?
Thanks in advance...

Ioannis Kirmitzoglou, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus

-- 
View this message in context: http://www.nabble.com/Parsing-FASTA-m10-output-tf3590568.html#a10034698
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From ewijaya at i2r.a-star.edu.sg  Tue Apr 17 13:10:00 2007
From: ewijaya at i2r.a-star.edu.sg (Edward WIJAYA)
Date: Tue, 17 Apr 2007 21:10:00 +0800
Subject: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
Message-ID: <462473B7.4070905@i2r.a-star.edu.sg>


Dear all,

How do you usually construct a graph for TFBS (binding sites) position
within their sequences? I was thinking to build something like this kind of
visualization tool:

http://research.i2r.a-star.edu.sg/Dragon/Motif_Search/cgi-bin/tmp/29740M1.html

or

http://wingless.cs.washington.edu:8080/assessment/servlet?filenameID=submission/SPACE.D9F26D506DE90E9A0A0010BB6BCCAEF3&pageType=visualizationForm&action=Visualize+It

Is there a BioPerl module to do that?

--
Edward


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From lubapardo at gmail.com  Tue Apr 17 14:01:57 2007
From: lubapardo at gmail.com (Luba Pardo)
Date: Tue, 17 Apr 2007 15:01:57 +0100
Subject: [Bioperl-l] CVS AND PAML
In-Reply-To: <358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>
References: <58ff33550704170216r2c780adcm53b6a2dab77580f0@mail.gmail.com>
	<358f4d650704170225h505764ccnbfa8b4e78a5ed5e@mail.gmail.com>
Message-ID: <58ff33550704170701p1207ad51r271b0aff235bfd05@mail.gmail.com>

Hi,
Sorry. Bellow is the code. The part of the code that does not work is when
using the codeml module.
Thanks
Luba
# for projecting alignments from protein to R/DNA space
use Bio::Align::Utilities qw(aa_to_dna_aln);
# for input of the sequence data
use Bio::SeqIO;
use Bio::AlignIO;

my $aln_factory = Bio::Tools::Run::Alignment::Clustalw->new;
my $seqdata = shift || 'cds.fa';

my $seqio = new Bio::SeqIO(-file   => $seqdata,
                           -format => 'fasta');
my %seqs;
my @prots;
# process each sequence
while ( my $seq = $seqio->next_seq ) {
    $seqs{$seq->display_id} = $seq;
    # translate them into protein
    my $protein = $seq->translate();
    my $pseq = $protein->seq();
    if( $pseq =~ /\*/ &&
        $pseq !~ /\*$/ ) {
          warn("provided a CDS sequence with a stop codon, PAML will
choke!");
          exit(0);
    }
    # Tcoffee can't handle '*' even if it is trailing
    $pseq =~ s/\*//g;
    $protein->seq($pseq);
    push @prots, $protein;
}

if( @prots < 2 ) {
    warn("Need at least 2 CDS sequences to proceed");
    exit(0);
}

open(OUT, ">align_output.txt") ||  die("cannot open output align_output for
writing");
# Align the sequences with clustalw
my $aa_aln = $aln_factory->align(\@prots);
# project the protein alignment back to CDS coordinates
my $dna_aln = aa_to_dna_aln($aa_aln, \%seqs);

my @each = $dna_aln->each_seq();

my $kaks_factory = Bio::Tools::Run::Phylo::PAML::Codeml->new
                   ( -params => { 'runmode' => -2,
                                  'seqtype' => 1,
                                } );

# set the alignment object
$kaks_factory->alignment($dna_aln);

# run the KaKs analysis
my ($rc,$parser) = $kaks_factory->run();
my $result = $parser->next_result;
my $MLmatrix = $result->get_MLmatrix();

my @otus = $result->get_seqs();
# this gives us a mapping from the PAML order of sequences back to
# the input order (since names get truncated)
my @pos = map {
    my $c= 1;
    foreach my $s ( @each ) {
      last if( $s->display_id eq $_->display_id );
      $c++;
    }
    $c;
   } @otus;

print OUT join("\t", qw(SEQ1 SEQ2 Ka Ks Ka/Ks PROT_PERCENTID
CDNA_PERCENTID)),"\n";
for( my $i = 0; $i < (scalar @otus -1) ; $i++) {
  for( my $j = $i+1; $j < (scalar @otus); $j++ ) {
    my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
    my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
    print OUT join("\t", $otus[$i]->display_id,

$otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
                         $MLmatrix->[$i]->[$j]->{'dS'},
                         $MLmatrix->[$i]->[$j]->{'omega'},
                         sprintf("%.2f",$sub_aa_aln->percentage_identity),
                         sprintf("%.2f",$sub_dna_aln->percentage_identity),
                         ), "\n";
  }
}

On 17/04/07, Albert Vilella <avilella at gmail.com> wrote:
>
> hmmm, there are some perldoc links around your code snippet. can you post
> the code again? what is the input data you are trying this with?
>
> On 4/17/07, Luba Pardo <lubapardo at gmail.com> wrote:
>
> > Dear all,
> > I have two questions.
> > 1.) I am trying to download some modules from Bioperl-run via CVS but I
> > can
> > not login.
> >
> > $ cvs -d :pserver:cvs at code.open-bio.org:/home/repository/bioperl login.
> >
> > The error I get is: time out, failed to connect to the server. I have
> > no trouble to download other files and I installed bioperl modules via
> > CPAN and it works.
> >
> > 2) The second question I have is that I am using the PAML:CODEML
> > module to do phylogenetic analysis.
> >
> > I have used the example provided in the HOWTO:PAML (also given as
> > example: pairwise_ka_ks.PL). The program does not crash but it returns
> > and empty object. I think the problem is in the last part of the
> > script because I manage to get sequences and also the alignment, but I
> > can not get any ka, ks value. I am not sure whether there is a bug in
> > the last part of the script.
> >
> > Does anyone have an idea?
> >
> > Thank you very much
> >
> > Luba Pardo
> >
> > $kaks_factory->alignment($dna_aln);
> >
> > # run the KaKs analysis
> > my ($rc,$parser) = $kaks_factory->run();
> > my $result = $parser->next_result;
> > my $MLmatrix = $result->get_MLmatrix();
> >
> > my @otus = $result->get_seqs();
> > # this gives us a mapping from the PAML order of sequences back to
> > # the input order (since names get truncated)
> > my @pos = map <http://www.perldoc.com/perl5.6/pod/func/map.html> {
> >     my $c= 1;
> >     foreach my $s ( @each ) {
> >       last if( $s->display_id eq $_->display_id );
> >       $c++;
> >     }
> >     $c;
> >    } @otus;
> >
> > print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT join
> > < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t", qw
> > <http://www.perldoc.com/perl5.6/pod/func/qw.html>(SEQ1 SEQ2 Ka Ks
> > Ka/Ks PROT_PERCENTID CDNA_PERCENTID)),"\n";
> > for( my $i = 0; $i < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus -1) ;
> > $i++) {
> >   for( my $j = $i+1; $j < (scalar
> > <http://www.perldoc.com/perl5.6/pod/func/scalar.html> @otus); $j++ ) {
> >     my $sub_aa_aln  = $aa_aln->select_noncont($pos[$i],$pos[$j]);
> >     my $sub_dna_aln = $dna_aln->select_noncont($pos[$i],$pos[$j]);
> >     print <http://www.perldoc.com/perl5.6/pod/func/print.html> OUT
> > join < http://www.perldoc.com/perl5.6/pod/func/join.html>("\t",
> > $otus[$i]->display_id,
> >
> > $otus[$j]->display_id,$MLmatrix->[$i]->[$j]->{'dN'},
> >                          $MLmatrix->[$i]->[$j]->{'dS'},
> >                          $MLmatrix->[$i]->[$j]->{'omega'},
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_aa_aln->percentage_identity),
> >                          sprintf
> > < http://www.perldoc.com/perl5.6/pod/func/sprintf.html
> > >("%.2f",$sub_dna_aln->percentage_identity),
> >                          ), "\n";
> >   }
> > }
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>


From alexl at users.sourceforge.net  Tue Apr 17 13:54:13 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Tue, 17 Apr 2007 06:54:13 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu> (Chris Fields's
	message of "Fri\, 30 Mar 2007 23\:39\:15 -0500")
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
Message-ID: <5h4pnff6nu.fsf@delpy.biol.berkeley.edu>

On Mar 30, 2007, at 11:02 PM, Allen Day wrote:

[...]

>> If we're in agreement that the primary data sets and
>> libraries/applications for producing derivative data should not be
>> present in Fedora Extras, then it follows that the Bioperl classes
>> for manipulating these primary and derivative data should also not
>> be present in Fedora Extras as they are of little use without data
>> to manipulate.

Chris Fields wrote:

CF> I respectfully disagree.  BioPerl, to me, is a toolkit which helps
CF> accomplish certain tasks.  As with any toolkit, not all parts are
CF> required to do what one needs.  A good number of end-users use
CF> BioPerl for remote database queries
CF> (Bio::DB::GenBank/Taxonomy/etc), remote BLAST, seq analysis,
CF> alignment analysis, phylogenetic tree manipulation, etc, none of
CF> which require outside apps be installed.  For many a remote db is
CF> their primary source of data; not everybody sets up BioPerl for
CF> accessing local db records, running programs, etc (just the smart
CF> ones!).  As for outside apps, the docs are pretty explicit where
CF> certain outside resources (libxml2, expat, libgd) are needed for
CF> functionality.

CF> When we package up a new release we generally have ActiveState PPM
CF> archives available for Win32 users who want an easy way to install
CF> BioPerl.  I wouldn't have a problem if ActiveState wanted to post
CF> these to their repository.  Why would allowing someone to do the
CF> same for fedora extras be any different?

Hi all,

Given that there seems to be a reasonable consensus (including list
discussion here as well as in private e-mail) from bioperl folks that
including bioperl in Fedora is OK, I'm going ahead and building
bioperl for Fedora >= 6 (it's currently in the development branch).  I
thought about the issue carefully and this seems to makes sense for
several reasons:

1. Biopackages.net isn't currently building packages for Fedora Core 6
   and later (as Allen said, that may happen later when more build
   resources come online).  I won't build perl-bioperl for FC-5 or
   earlier to make sure that the Fedora package doesn't disrupt
   installations with the biopackages.net version.

2. Currently I've only run the the base bioperl (live) package through
   the reviewing gauntlet, but I plan to add the bioperl-run package
   as well.  Even though the bioperl-run package is intended to use
   third party packages (e.g. Clustal etc.) which may not be
   distributed with Fedora, it appears that the bioperl-run package
   contains code that can download those packages directly (albeit
   outside the RPM package system).  And some of the external tools
   could be packaged in Fedora because they have open-source licenses
   (e.g. Wise2, EMBOSS, NCBI toolkit etc.)

   Furthermore it appears the biopackages.net version of that package
   doesn't actually have "Requires:" that would automatically install
   those third-party tool that is run via bioperl (e.g. Clustal) in
   any case, so when biopackages start building for >FC-6 the Fedora
   perl-bioperl* packages can function as a drop-in replacement
   without disturbing other biopackages dependencies such as genome
   databases.

3. Third-party packages that can't be included directly in Fedora
   (such as Clustal) that can be used by bioperl-run could still be
   added via third-party repos like biopackages.net, in the same way
   that the multimedia packages gstreamer and gstreamer-plugins-good
   live in Fedora, but gstreamer-plugins-bad containing patent
   encumbered MP3 codecs with live in Livna.

Cheers,
Alex


From cjfields at uiuc.edu  Tue Apr 17 14:35:10 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 09:35:10 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
Message-ID: <0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>

On Apr 17, 2007, at 8:54 AM, Alex Lancaster wrote:

> Hi all,
>
> Given that there seems to be a reasonable consensus (including list
> discussion here as well as in private e-mail) from bioperl folks that
> including bioperl in Fedora is OK, I'm going ahead and building
> bioperl for Fedora >= 6 (it's currently in the development branch).  I
> thought about the issue carefully and this seems to makes sense for
> several reasons:
>
> ...
> 2. Currently I've only run the the base bioperl (live) package through
>    the reviewing gauntlet, but I plan to add the bioperl-run package
>    as well.  Even though the bioperl-run package is intended to use
>    third party packages (e.g. Clustal etc.) which may not be
>    distributed with Fedora, it appears that the bioperl-run package
>    contains code that can download those packages directly (albeit
>    outside the RPM package system).  And some of the external tools
>    could be packaged in Fedora because they have open-source licenses
>    (e.g. Wise2, EMBOSS, NCBI toolkit etc.)
...

Do you mean the bioperl core modules instead of "bioperl-live"?  We  
use the term "bioperl-live" to designate code updated regularly via  
CVS, which can be buggy depending on when it's retrieved.

I'm not sure how others feel about this, but it's probably best to  
stick with either the latest official releases (v 1.5.2 at this time)  
or even GBrowse-sponsored interim releases (which fix GBrowse-related  
bugs and normally pass tests).

chris


From hlapp at gmx.net  Tue Apr 17 15:09:45 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 17 Apr 2007 11:09:45 -0400
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>


On Apr 17, 2007, at 9:35 AM, Leighton Pritchard wrote:

> Hi Hilmar,
>
> Thanks for the very quick response.  Apologies for the long reply,  
> but I
> thought it might be useful if anyone else happens across the same
> problems that I did.

Thanks for reporting all these.

> [...]
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("","","0","") FKs ()
> Column 'dbname' cannot be null
> ---------------------------------------------------
> Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
> lactonase activity':
> [...]
> I tracked this down to an apparently poor formatting of the GO.defs  
> file
> (note that the first and third definition_lines appear to be two  
> halves
> of the same entry):
>
> term: 2-pyrone-4,6-dicarboxylate lactonase activity
> goid: GO:0047554
> definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate +  
> H2O
> = 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
> definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN

I wonder whether this is the line that throws the parser off. It  
looks like the database part of the reference is missing - bad.

> definition_reference: EC:3.1.1.57
> definition_reference: MetaCyc:2-PYRONE-4
>
> I found 43 similar errors for other GOIDs, and it appears to result  
> from
> the occurrence of the string "\," in a dbxref - mostly MetaCyc  
> entries,
> but also some UM-BBD_pathwayID entries.

I'm not sure - although the string "\," might indeed trip up the  
parser, would have to investigate to confirm. Could it be a  
coincidence with definition_references that lack the database part  
before the colon?

>
> These errors appear to have followed through into the generation of  
> the
> OBO format files in each case, e.g.:
>
> def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
> 4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE- 
> LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]

Again, the first db_xref lacks the database in front of the colon. I  
can also see why "\," will trip up the parser in this format.

>
> and so is something for the GO guys to fix, I guess.

The lack of a database for certain xrefs surely is. If the escaped  
comma does throw off the BioPerl parser then that part is for BioPerl  
to fix. It does seem to extract the parts correctly, if the error  
message is any indication, though you may argue that it should remove  
the escaping backslashes (and I'd certainly agree with that).

>
>
> Another error is thrown after fixing the above, though (with the same
> command as before):
>
> Loading ontology Gene Ontology:
>         ... terms
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values  
> were
> ("GO:0006905","vesicle transport","OBSOLETE (was not defined before
> being made obsolete).","X","") FKs (1)
> Duplicate entry 'vesicle transport-1-X' for key 3
> ---------------------------------------------------
> Could not store term GO:0006905, name 'vesicle transport':
> [...]
> There are duplicate terms, identical in the term table except for  
> GOID:
> GO:0006905 and GO:0005480.  They are both "vesicle transport", and
> obsoleted:

That violates the uniqueness constraint, and this sounds more like a  
bug in the GO file. I'm also not sure what motivated them to create  
the same term multiple times only to obsolete it immediately.

> [...]
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
> were ("PMID","","0","") FKs ()
> Column 'accession' cannot be null
> ---------------------------------------------------
> Could not store term GO:0032933, name 'SREBP-mediated signaling
> pathway':
> [...]
> with the offending entry being
>
> term: SREBP-mediated signaling pathway
> goid: GO:0032933
> definition: A series of molecular signals from the endoplasmic  
> reticulum
> to the nucleus generated as a consequence of altered levels of one or
> more lipids, and resulting in the activation of transcription by  
> SREBP.
> definition_reference: GOC:mah
> definition_reference: PMID:0
>
> I commented out the definition_reference for PMID:0, which seemed  
> to fix
> matters.

Right, it seems to be a bogus reference.

>
> The process.ontology and component.ontology files then went into the
> database without a hitch.  Thanks again for your help,

Fantastic you got it all loaded!

Note that you also have the --computetc switch which will compute the  
transitive closure for you automatically.

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From alexl at users.sourceforge.net  Tue Apr 17 15:13:30 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Tue, 17 Apr 2007 08:13:30 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu> (Chris Fields's
	message of "Tue\, 17 Apr 2007 09\:35\:10 -0500")
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
	<0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>
Message-ID: <nwy7krdof9.fsf@delpy.biol.berkeley.edu>

>>>>> "CF" == Chris Fields  writes:

[...]

CF> Do you mean the bioperl core modules instead of "bioperl-live"?
CF> We use the term "bioperl-live" to designate code updated regularly
CF> via CVS, which can be buggy depending on when it's retrieved.

Yes, I am referring to the core package.  Called perl-bioperl in the
Fedora naming scheme.

CF> I'm not sure how others feel about this, but it's probably best to
CF> stick with either the latest official releases (v 1.5.2 at this
CF> time) or even GBrowse-sponsored interim releases (which fix
CF> GBrowse-related bugs and normally pass tests).

Yes I am sticking to the latest official release 1.5.2_102.  The
package is here:

http://download.fedora.redhat.com/pub/fedora/linux/extras/development/i386/repoview/perl-bioperl.html

and installable via yum (on the development branch) using:

$ yum install perl-bioperl

The FC-6 package will be available soon.

Alex


From cjfields at uiuc.edu  Tue Apr 17 16:18:19 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:18:19 -0500
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
	<1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
Message-ID: <146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>

On Apr 17, 2007, at 11:05 AM, Leighton Pritchard wrote:
...
>
>>> and so is something for the GO guys to fix, I guess.
>>
>> The lack of a database for certain xrefs surely is. If the escaped
>> comma does throw off the BioPerl parser then that part is for BioPerl
>> to fix.
>
> I thinkk the problems are now all in the data I downloaded from
> http://www.geneontology.org/GO.downloads.shtml - I believe the BioPerl
> parser to be innocent of these charges ;)  I've submitted the issue at
> the GO site, and with any luck they'll handle it quite soon (if it  
> is in
> fact their problem).
>
>> Note that you also have the --computetc switch which will compute the
>> transitive closure for you automatically.
>
> :D Excellent!  Thanks for the pointer, and again for your efforts,
>
> L.
...

If you do find anything that is BioSQL- or Bioperl-related then file  
a bug report so we can track it.  I agree with Hilmar that it's  
likely the parser is partly to blame.

http://bugzilla.open-bio.org/

We really appreciate the work you're putting into this!

chris


From cjfields at uiuc.edu  Tue Apr 17 16:19:02 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:19:02 -0500
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <nwy7krdof9.fsf@delpy.biol.berkeley.edu>
References: <o5wt0z9h6p.fsf@delpy.biol.berkeley.edu>
	<1175258897.2668.21.camel@localhost.localdomain>
	<6d648ierkz.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301142r4d3f777bi681779a38559459f@mail.gmail.com>
	<1p8xdeb87r.fsf@delpy.biol.berkeley.edu>
	<5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<16153593-5B2A-43B4-9366-282C654E40E7@gmx.net>
	<5c24dcc30703302102w2f008b7bt6e7d77ec42f21011@mail.gmail.com>
	<A7F15A09-37A9-4A7E-9E1A-19E6C3A97798@uiuc.edu>
	<5h4pnff6nu.fsf@delpy.biol.berkeley.edu>
	<0E921F4A-2DC2-44B6-AAEC-6A81AA6240BE@uiuc.edu>
	<nwy7krdof9.fsf@delpy.biol.berkeley.edu>
Message-ID: <3963AFE3-68B6-43F0-8A20-82A575CA8806@uiuc.edu>


On Apr 17, 2007, at 10:13 AM, Alex Lancaster wrote:

>
> [...]
>
> CF> Do you mean the bioperl core modules instead of "bioperl-live"?
> CF> We use the term "bioperl-live" to designate code updated regularly
> CF> via CVS, which can be buggy depending on when it's retrieved.
>
> Yes, I am referring to the core package.  Called perl-bioperl in the
> Fedora naming scheme.
>
> CF> I'm not sure how others feel about this, but it's probably best to
> CF> stick with either the latest official releases (v 1.5.2 at this
> CF> time) or even GBrowse-sponsored interim releases (which fix
> CF> GBrowse-related bugs and normally pass tests).
>
> Yes I am sticking to the latest official release 1.5.2_102.  The
> package is here:
>
> http://download.fedora.redhat.com/pub/fedora/linux/extras/ 
> development/i386/repoview/perl-bioperl.html
>
> and installable via yum (on the development branch) using:
>
> $ yum install perl-bioperl
>
> The FC-6 package will be available soon.
>
> Alex

Sounds good.  Thanks Alex!

chris


From ioanniskirmitzoglou at gmail.com  Tue Apr 17 16:21:36 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Tue, 17 Apr 2007 19:21:36 +0300
Subject: [Bioperl-l]  Parsing FASTA m10 output
In-Reply-To: <b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
Message-ID: <b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>

Thanks for the prompt reply...
Seems like I will have to "quit talking and begin doing"
I will post the code here in case someone else finds himself in the same
situation...

-- 
Ioannis Kirmitzoglou, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


On 17/04/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
> I am parsing FASTA outputs these days.
>
> The m 10 format is a recent implementation, not so popular yet. So, I have

> first tested the Bio::SearchIO against a default output and everything is
> fine.
>
> I think future releases of Bio::SearchIO will deal with the m10 output. By
> now, you can run all again or code a little bit to parse what you want
(not
> a hard task).
>
> T.
>
>
> On 4/17/07, Ioannis Kirmitzoglou < IoannisKirmitzoglou at gmail.com> wrote:
> >
> > I apologize if this question has already been answered but my search
came
> up
> > with no relevant results.
> > I am new to the FASTA program and after reading the fasta3x.doc I
decided
> to
> > run it using the m10 output. The reason for doing such a choice was
> >
> > Quote from fasta3x.doc:
> >      -m 10 is a new, parseable format for use with other
> >      programs....
> >
> >
> > I ran FASTA in batch mode and waited about 3-4 days for the results.
> > My problem is that today, when i started writing a perl script to parse
> the
> > output I realized that SearchIO doesn't supports m10 format.
> > Seems like I had to be more careful...
> > Before starting coding a module that will be able to parse the output
(or
> > re-running FASTA with -m9 -d0 switches which will take 4 more days) I
> would
> > be really thankful if any of you knows of any other way to parse those
> > files?
> > Thanks in advance...
> >
> > Ioannis Kirmitzoglou, MSc
> > PhD. Student,
> > Bioinformatics Research Laboratory
> > Department of Biological Sciences
> > University of Cyprus
> >
> > --
> > View this message in context:
> http://www.nabble.com/Parsing-FASTA-m10-output-tf3590568.html#a10034698
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
>
>
>
> --
> "The way to get started is to quit talking and begin doing."
>       Walt Disney
>
> ========================
> Thiago Motta Venancio, MSc
> PhD student in Bioinformatics
> University of Sao Paulo
> ========================


From cjfields at uiuc.edu  Tue Apr 17 16:49:53 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 17 Apr 2007 11:49:53 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
Message-ID: <639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>

You can post here or add it to Bugzilla as an enhancement request if  
the code is particularly long.

chris

On Apr 17, 2007, at 11:21 AM, Ioannis Kirmitzoglou wrote:

> Thanks for the prompt reply...
> Seems like I will have to "quit talking and begin doing"
> I will post the code here in case someone else finds himself in the  
> same
> situation...
>
> -- 
> Ioannis Kirmitzoglou, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
>
>
> On 17/04/07, Thiago Venancio < thiago.venancio at gmail.com> wrote:
>> I am parsing FASTA outputs these days.
>>
>> The m 10 format is a recent implementation, not so popular yet.  
>> So, I have
>
>> first tested the Bio::SearchIO against a default output and  
>> everything is
>> fine.
>>
>> I think future releases of Bio::SearchIO will deal with the m10  
>> output. By
>> now, you can run all again or code a little bit to parse what you  
>> want
> (not
>> a hard task).
>>
>> T.
>>
>>
>> On 4/17/07, Ioannis Kirmitzoglou < IoannisKirmitzoglou at gmail.com>  
>> wrote:
>>>
>>> I apologize if this question has already been answered but my search
> came
>> up
>>> with no relevant results.
>>> I am new to the FASTA program and after reading the fasta3x.doc I
> decided
>> to
>>> run it using the m10 output. The reason for doing such a choice was
>>>
>>> Quote from fasta3x.doc:
>>>      -m 10 is a new, parseable format for use with other
>>>      programs....
>>>
>>>
>>> I ran FASTA in batch mode and waited about 3-4 days for the results.
>>> My problem is that today, when i started writing a perl script to  
>>> parse
>> the
>>> output I realized that SearchIO doesn't supports m10 format.
>>> Seems like I had to be more careful...
>>> Before starting coding a module that will be able to parse the  
>>> output
> (or
>>> re-running FASTA with -m9 -d0 switches which will take 4 more  
>>> days) I
>> would
>>> be really thankful if any of you knows of any other way to parse  
>>> those
>>> files?
>>> Thanks in advance...
>>>
>>> Ioannis Kirmitzoglou, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>>
>>> --
>>> View this message in context:
>> http://www.nabble.com/Parsing-FASTA-m10-output- 
>> tf3590568.html#a10034698
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> --
>> "The way to get started is to quit talking and begin doing."
>>       Walt Disney
>>
>> ========================
>> Thiago Motta Venancio, MSc
>> PhD student in Bioinformatics
>> University of Sao Paulo
>> ========================
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From lpritc at scri.ac.uk  Tue Apr 17 13:35:44 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 14:35:44 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
Message-ID: <1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>

Hi Hilmar, 

Thanks for the very quick response.  Apologies for the long reply, but I
thought it might be useful if anyone else happens across the same
problems that I did.

On Tue, 2007-04-17 at 00:00 -0400, Hilmar Lapp wrote:
> Apparently the parser  
> fails to parse out database and accession for this db_xref of term GO: 
> 0018901.
> 
> If you can edit the obo file, you can try deleting the db_xref(s) for  
> that term that look odd (or delete all if you don't need them).

You're spot on - see further down for details...

> Note that the argument for --fmtargs here should read
> "-defs_file,/path/to/Downloads/GO.defs". (Note that within the quotes  
> there is no tilde expansion.)

D'oh!  Thanks for the note - my bad, there.

> This is one the things why you've got to love MySQL (and I am correct  
> in inferring that you're using MySQL?). 

The 'choice' was forced upon me ;)

> It may be necessary to widen the length of dbname.accession here, for  
> example to 80 chars? Let me know if you need help with the DDL  
> command to do this.

I've fixed that now (and added it to my local biosqldb-mysql.sql
schema), but with a clean BioSQL schema and using:

[lpritc at lplinuxdev sql]$ bp_load_ontology.pl --host localhost --dbname
biosql --namespace "Gene Ontology" --dbuser lpritc --dbpass ********
--format goflat --fmtargs
"-defs_file,/home/lpritc/Downloads/GO.defs" /home/lpritc/Downloads/function.ontology 

I was still getting errors with the GO flatfile:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("","","0","") FKs ()
Column 'dbname' cannot be null
---------------------------------------------------
Could not store term GO:0047554, name '2-pyrone-4,6-dicarboxylate
lactonase activity':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0x88497a4)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x897f074)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x8d64ad8)', '-throw',
'CODE(0x851abc8)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

I tracked this down to an apparently poor formatting of the GO.defs file
(note that the first and third definition_lines appear to be two halves
of the same entry):

term: 2-pyrone-4,6-dicarboxylate lactonase activity
goid: GO:0047554
definition: Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O
= 4-carboxy-2-hydroxyhexa-2,4-dienedioate.
definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
definition_reference: EC:3.1.1.57
definition_reference: MetaCyc:2-PYRONE-4

I found 43 similar errors for other GOIDs, and it appears to result from
the occurrence of the string "\," in a dbxref - mostly MetaCyc entries,
but also some UM-BBD_pathwayID entries.

These errors appear to have followed through into the generation of the
OBO format files in each case, e.g.:

def: "Catalysis of the reaction: 2-pyrone-4,6-dicarboxylate + H2O =
4-carboxy-2-hydroxyhexa-2,4-dienedioate." [:6-DICARBOXYLATE-LACTONASE-RXN, EC:3.1.1.57, MetaCyc:2-PYRONE-4]

and so is something for the GO guys to fix, I guess.


Another error is thrown after fixing the above, though (with the same
command as before):

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::TermAdaptor (driver) failed, values were
("GO:0006905","vesicle transport","OBSOLETE (was not defined before
being made obsolete).","X","") FKs (1)
Duplicate entry 'vesicle transport-1-X' for key 3
---------------------------------------------------
Could not store term GO:0006905, name 'vesicle transport':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Ontology::GOterm) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbcac418)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x957805c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x995db20)', '-throw',
'CODE(0x9113bd0)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

There are duplicate terms, identical in the term table except for GOID:
GO:0006905 and GO:0005480.  They are both "vesicle transport", and
obsoleted:

term: vesicle transport
goid: GO:0005480
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because it represents a biological
process and not a molecular function. To update annotations, use the
biological process term 'vesicle-mediated transport ; GO:0016192'.

term: vesicle transport
goid: GO:0006905
definition: OBSOLETE (was not defined before being made obsolete).
definition_reference: GOC:go_curators
comment: This term was made obsolete because the meaning of the term is
ambiguous. To update annotations, consider the biological process term
'vesicle-mediated transport ; GO:0016192'.

I used the --noobsolete flag to avoid this error - reasoning that since
I'm populating the database for the first time, ignoring the obsolete
terms won't hurt - but finally this error was thrown:

Loading ontology Gene Ontology:
        ... terms

-------------------- WARNING ---------------------
MSG: insert in Bio::DB::BioSQL::DBLinkAdaptor (driver) failed, values
were ("PMID","","0","") FKs ()
Column 'accession' cannot be null
---------------------------------------------------
Could not store term GO:0032933, name 'SREBP-mediated signaling
pathway':

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: create: object (Bio::Annotation::DBLink) failed to insert or to be
found by unique key
STACK: Error::throw
STACK:
Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:359
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:206
STACK:
Bio::DB::BioSQL::TermAdaptor::store_children /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/TermAdaptor.pm:293
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::create /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:214
STACK:
Bio::DB::BioSQL::BasePersistenceAdaptor::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/BioSQL/BasePersistenceAdaptor.pm:251
STACK:
Bio::DB::Persistent::PersistentObject::store /usr/lib/perl5/site_perl/5.8.8/Bio/DB/Persistent/PersistentObject.pm:271
STACK: main::persist_term /usr/bin/bp_load_ontology.pl:805
STACK: /usr/bin/bp_load_ontology.pl:610
-----------------------------------------------------------

 at /usr/bin/bp_load_ontology.pl line 817
        main::persist_term('-term',
'Bio::Ontology::GOterm=HASH(0xbe18f14)', '-db',
'Bio::DB::BioSQL::DBAdaptor=HASH(0x99bbf2c)', '-termfactory',
'Bio::Ontology::TermFactory=HASH(0x9da0ad8)', '-throw',
'CODE(0x9556bb4)', '-mergeobs', ...) called
at /usr/bin/bp_load_ontology.pl line 610

with the offending entry being 

term: SREBP-mediated signaling pathway
goid: GO:0032933
definition: A series of molecular signals from the endoplasmic reticulum
to the nucleus generated as a consequence of altered levels of one or
more lipids, and resulting in the activation of transcription by SREBP.
definition_reference: GOC:mah
definition_reference: PMID:0

I commented out the definition_reference for PMID:0, which seemed to fix
matters.

The process.ontology and component.ontology files then went into the
database without a hitch.  Thanks again for your help,

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From lpritc at scri.ac.uk  Tue Apr 17 16:05:16 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 17:05:16 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
Message-ID: <1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>

Hello again,

On Tue, 2007-04-17 at 11:09 -0400, Hilmar Lapp wrote:
> Thanks for reporting all these.

No problem at all.

> On Apr 17, 2007, at 9:35 AM, Leighton Pritchard wrote:
> > term: 2-pyrone-4,6-dicarboxylate lactonase activity
[...]
> > definition_reference: :6-DICARBOXYLATE-LACTONASE-RXN
> 
> I wonder whether this is the line that throws the parser off. It  
> looks like the database part of the reference is missing - bad.

> > definition_reference: MetaCyc:2-PYRONE-4

I don't think the parser is to blame, here.  Note that if you join the
definition_reference strings from the GO.defs file, you get:

MetaCyc:2-PYRONE-4:6-DICARBOXYLATE-LACTONASE-RXN

Then if you replace the colon by "\," you get what should (I think)
actually be the MetaCyc entry:

MetaCyc:2-PYRONE-4\,6-DICARBOXYLATE-LACTONASE-RXN

> > I found 43 similar errors for other GOIDs, and it appears to result  
> > from
> > the occurrence of the string "\," in a dbxref - mostly MetaCyc  
> > entries,
> > but also some UM-BBD_pathwayID entries.
> 
> I'm not sure - although the string "\," might indeed trip up the  
> parser, would have to investigate to confirm. Could it be a  
> coincidence with definition_references that lack the database part  
> before the colon?

Inspecting the troublesome entries by eye seems to turn up the same
problem as above consistently: a GO term in the GO.defs file is
malformed.  The term should have a definition_reference field describing
a MetaCyc entry that matches the term field.  In the term string, there
would be an escaped comma, but the string ends where we expect this.
The string that would follow the escaped comma is present as the first
definition_reference.

This observation also extends to cases where there should be two
occurrences of "\," in the MetaCyc field, e.g.:

term: 2,3-dihydroxyindole 2,3-dioxygenase activity
goid: GO:0047528
definition: Catalysis of the reaction: 2,3-dihydroxyindole + O2 =
anthranilate + CO2.
definition_reference: :3-DIHYDROXYINDOLE-2
definition_reference: :3-DIOXYGENASE-RXN
definition_reference: EC:1.13.11.2
definition_reference: MetaCyc:2

It then appears as though the GO flatfiles were used automatically to
generate the OBO format files, and propagated the same error into the
square brackets in each case.

> > and so is something for the GO guys to fix, I guess.
> 
> The lack of a database for certain xrefs surely is. If the escaped  
> comma does throw off the BioPerl parser then that part is for BioPerl  
> to fix. 

I thinkk the problems are now all in the data I downloaded from
http://www.geneontology.org/GO.downloads.shtml - I believe the BioPerl
parser to be innocent of these charges ;)  I've submitted the issue at
the GO site, and with any luck they'll handle it quite soon (if it is in
fact their problem).

> Note that you also have the --computetc switch which will compute the  
> transitive closure for you automatically.

:D Excellent!  Thanks for the pointer, and again for your efforts,

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From stefan.kirov at bms.com  Tue Apr 17 15:09:30 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Tue, 17 Apr 2007 11:09:30 -0400
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph with
	Perl]
Message-ID: <4624E32A.6010704@bms.com>

Missed to send this to the list....
Stefan
-------------- next part --------------
An embedded message was scrubbed...
From: Stefan Kirov <stefan.kirov at bms.com>
Subject: Re: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
Date: Tue, 17 Apr 2007 10:30:11 -0400
Size: 2262
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20070417/cc49d62a/attachment-0004.mht>

From lpritc at scri.ac.uk  Tue Apr 17 16:55:38 2007
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 17 Apr 2007 17:55:38 +0100
Subject: [Bioperl-l] [BioSQL-l] Problem loading GO.
In-Reply-To: <146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>
References: <1176738922.988.26.camel@lplinuxdev.scri.sari.ac.uk>
	<B8DA7982-89F5-4D46-8736-A1D25EA7B504@gmx.net>
	<1176816944.988.83.camel@lplinuxdev.scri.sari.ac.uk>
	<5D5DDFF3-1C01-4D3D-80F8-CD777DEA38D5@gmx.net>
	<1176825916.988.121.camel@lplinuxdev.scri.sari.ac.uk>
	<146086E2-330B-4460-90AC-2632E82ED145@uiuc.edu>
Message-ID: <1176828938.988.133.camel@lplinuxdev.scri.sari.ac.uk>

Hi Chris,

On Tue, 2007-04-17 at 11:18 -0500, Chris Fields wrote:
> If you do find anything that is BioSQL- or Bioperl-related then file  
> a bug report so we can track it.  I agree with Hilmar that it's  
> likely the parser is partly to blame.
> 
> http://bugzilla.open-bio.org/

I've submitted a bug report, mostly replicating my first post in this
thread.  I added links to the appropriate point in the list archives so
that the rest of the discussion can be considered, too.

> We really appreciate the work you're putting into this!

Thanks - I'm just grateful that the Bio* repertoire is there at all so
that my problems are relatively minor (as opposed to attempting to
replicate the functionality independently).

L.

-- 
Dr Leighton Pritchard B.Sc.(Hons) MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland DD2 5DA
e:lpritc at scri.ac.uk            w:http://bioinf.scri.ac.uk/lp
gpg/pgp: 0xFEFC205C
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

SCRI, Invergowrie, Dundee, DD2 5DA.  
The Scottish Crop Research Institute is a charitable company limited by guarantee. 
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.


DISCLAIMER:

This email is from the Scottish Crop Research Institute, but the views 
expressed by the sender are not necessarily the views of SCRI and its 
subsidiaries.  This email and any files transmitted with it are confidential 
to the intended recipient at the e-mail address to which it has been 
addressed.  It may not be disclosed or used by any other than that addressee.
If you are not the intended recipient you are requested to preserve this 
confidentiality and you must not use, disclose, copy, print or rely on this 
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the 
name of the sender and delete the email from your system.

Although SCRI has taken reasonable precautions to ensure no viruses are 
present in this email, neither the Institute nor the sender accepts any 
responsibility for any viruses, and it is your responsibility to scan the email 
and the attachments (if any).


From lstein at cshl.edu  Tue Apr 17 17:47:25 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Tue, 17 Apr 2007 13:47:25 -0400
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <C2340DDA.D83F%bosborne11@verizon.net>
References: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>
	<C2340DDA.D83F%bosborne11@verizon.net>
Message-ID: <6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>

Hi,

I've been updating the WIKI in anticipation of a new GBrowse release and
have added a "stub" for the biopackages.net install. Since I don't use yum
(I've been running Slackware for ages and have recently started working with
Ubuntu) I'm not sure I got the details right. Could someone check?


        http://www.gmod.org/wiki/index.php/GBrowse_RPM_HOWTO

Also, I think some verbiage on how to use yum to install MySQL and Apache
would be great, since it will be consistent with the Ubuntu install page.

Thanks,

Lincoln

On 3/31/07, Brian Osborne <bosborne11 at verizon.net> wrote:
>
> Allen et al.,
>
> What happened to the "GMOD" package or packages? I've had some
> conversations
> in the past few months with you-all suggesting that a GMOD package, or
> packages, would be useful.
>
> Brian O.
>
>
>
>
> On 3/30/07 8:30 PM, "Allen Day" <allenday at gmail.com> wrote:
>
> > Hi Alex,
> >
> > You've aptly noted that there are several classes of packages being
> > discussed here, and that they should not be treated equally.  From my
> > point of view and of specific relevance to the Bioperl community we
> > have at least:
> >
> > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran
> > dependencies.  These should all be in Fedora Extras, as they are of
> > general utility.  Biopackages.net currently hosts about 200 packages
> > (.spec files, specifically) that are like this.  Maybe 80 of these are
> > needed for Bioperl.
> >
> > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan,
> > etc.  From what I've seen, these typically have strange/custom
> > licenses that may not be valid for some users.  BLAT has a dual
> > licensing scheme for academic and non-academic licensees, for
> > instance.  These packages are not of general utility.  For these two
> > reasons, my stance is that they should not be included in Fedora
> > Extras.
> >
> > 3) Bioperl packages.  Several subsets here.  The Bioperl-run libraries
> > depend directly on type (2) packages, so aren't appropriate to include
> > in Fedora Extras.  Bioperl-live is not really that useful without type
> > (2) packages.  It is also sensible to all of the keep the Bioperl-*
> > packages in the same repository.  For these reasons, my stance is that
> > they should not be included in Fedora Extras.
> >
> > 4) Bioinformatics / Comp. Bio. data sets.  These don't have licensing
> > problems, but they tend to be large.  Usually in the 10E7 - 10E10 byte
> > range.  RPM can not even generate correct metadata for some of them
> > correctly if the files are too large (overflow problems).  Probably
> > not appropriate to put in Fedora Extras because they are too large and
> > not generally useful.
> >
> > 5) Bioinformatics-specific System databases / daemons.  These
> > high-level packages depend on types (2), (3), and (4), and so are not
> > appropriate to put into Fedora Extras.  An example is a BLAT daemon,
> > which relies on the BLAT server, as well as NIB-formatted genome
> > sequence files.
> >
> > That said, there are a lot of type (1) packages in the Biopackages.net
> > repository.  If you're interested in migrating the spec files from our
> > repository to the Fedora project it would save us (the Biopackages.net
> > maintainers) a ton of build and maintenance time, so please feel free
> > to take them, just let us know.  If we can reach some agreement on
> > where the bioinformatics-specific packages should be maintained/built
> > we may be able to work together on these as well.
> >
> > -Allen
> >
> >
> > On 3/30/07, Alex Lancaster <alexl at users.sourceforge.net> wrote:
> >>>>>>> "AD" == Allen Day  writes:
> >>
> >> AD> Hi Alex, The Biopackages.net project is still active, we are
> >> AD> regularly adding packages to it, mostly R packages lately.  Most
> >> AD> of the systems we use are running CentOS at this point, which is
> >> AD> why you have not seen support for FC6 yet.  There is nothing
> >> AD> preventing building FC6 packages aside from lack of time to set up
> >> AD> the FC6 build farm nodes.
> >>
> >> Hi Allen and other,
> >>
> >> Great news to hear that Biopackages.net is still active!  I would like
> >> to help out if possible.  I don't believe in "FUD" either... ;)
> >>
> >> AD> If you're interested in packaging BioPerl or other
> >> AD> bioinformatics-related software, please join the Biopackages
> >> AD> project on SourceForge.  We object to the Fedora Extras FUD
> >> AD> tactics used to discourage people from using 3rd party
> >> AD> repositories, and suspect they may not want to host some of our
> >> AD> data packages, such as the >2GB genome packages.  Biopackages
> >> AD> project is likely to partially merge with RPMForge.  We are
> >> AD> already discussing with them how best to do it.
> >>
> >> The packages that I created which are currently available in Fedora
> >> Packages are Perl dependencies which, as I said are useful for
> >> packages outside the bioinformatics purview.  I do have a (base)
> >> bioperl package in review, but it is not yet released.
> >>
> >> As for third-party repos, I don't object to them at all, and for some
> >> kinds of projects they are indeed appropriate. (e.g. for non-free
> >> stuff like Livna or Freshrpms).  However I do have practical concerns
> >> about repository mixing, but I think that it does need to be handled
> >> carefully but that co-operation between Fedora and third-party repos
> >> can make it work.
> >>
> >> For example, one practical concern is that as of the
> >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there
> >> will be no distinction at the repository-level between formerly Extras
> >> packages and formerly Core packages (as of now there are only "Fedora
> >> Packages"), which means that it will not be possible for third-party
> >> repos to limit their dependencies to just those in a former base set
> >> (i.e. excluding Extras).
> >>
> >> I agree that a few years ago (circa 2003-2004) there was concern about
> >> the way some third party repositories were treated somewhat badly by
> >> the (then) Fedora Extras (with some people going so far as to say that
> >> third-party repos were bad in principle and should always be ignored
> >> which I disagree with too).  But it seems to me that culture has
> >> shifted since, with some notable packagers such as Matthias Saou (of
> >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to
> >> Fedora itself.  The process of contributing has also become much
> >> simpler and reviews are conducted speedily and efficiently, I had
> >> packages in the repository in a matter of a few days from initial
> >> submission.  Freshrpms itself now enables and depends on the (old)
> >> Extras.
> >>
> >> The real question for me, then is what packages it makes sense to go
> >> in Fedora, and what packages go in third party repositories.  It seems
> >> to me that in the case of Perl packages which could be dependencies
> >> for other packages not specific to the third-party repo in question,
> >> it makes sense for them to go into Fedora itself, so I think I will
> >> continue to package them.  This lessens the load on the third-party
> >> repo, while making them available for all other third-party repos.
> >> (This is approach that Freshrpms seems to be taking, Matthias has
> >> contributed most packages back to Fedora now other than the non-free
> >> ones).
> >>
> >> At the other end of the spectrum are packages like you mention, genome
> >> packages, which may be of concern because of their size and/or highly
> >> specialised nature, and, as you say, may make sense to go in a
> >> third-party repo like Biopackages.net.  Also packages which can't be
> >> packaged by Fedora for legal reasons like Clustal could/should go in
> >> Biopackages.net.
> >>
> >> In the middle are packages like bioperl itself which are potentially
> >> useful to perhaps a wider group of people than the genome packages but
> >> may not necessarily be dependencies for other packages.  I lean
> >> towards making them part of Fedora so that they will be available of
> >> out the box on the planned "Everything" DVD ISO, but I welcome a
> >> discussion on this.
> >>
> >> As I said, I'm glad to hear that Biopackages.net is alive and well and
> >> I welcome a discussion on how upstream Fedora can usefully interact
> >> with Biopackages.net (I guess perhaps on the Biopackages.net list).
> >>
> >> Regards,
> >> Alex
> >>
> >> PS.  As the upstream author If you could clarify the license on
> >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be great.
> >> --
> >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology, University of
> Arizona
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From alexl at users.sourceforge.net  Wed Apr 18 08:50:51 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 18 Apr 2007 01:50:51 -0700
Subject: [Bioperl-l] bioperl-run and Bio::Root::AccessorMaker
Message-ID: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>

In packaging bioperl-run for Fedora, I think I stumbled across a bug
in the bioperl-run package.  It appears from this edit:

http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl

that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
bioperl-run 1.5.2_100 still contains modules that use this module:

$ cd bioperl-run-1.5.2_100
$ grep -r AccessorMaker  *
Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
class min_version)]);
Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
('$'=>[qw(input_file output_file)]);

This causes the automatic Perl dependency generator for RPM to add
Bio::Root::AccessorMake as a requires which means RPM will refuse to
install perl-bioperl-run because it's looking for the now-removed-
from-Core-bioperl module

$ sudo rpm -Uvh --test
/home/alex/rpmbuild/RPMS/noarch/perl-bioperl-run-1.5.2_100-1.noarch.rpm 
error: Failed dependencies:
        perl(Bio::Root::AccessorMaker) is needed by
        perl-bioperl-run-1.5.2_100-1.noarch

Are the SDI and JavaRunner modules being actively developed?  What's
the best course of action for these modules, should I just exclude
them from the package for now? since they won't work, even if if you
tell RPM to ignore the dependency warning.

Alex


From shameer at ncbs.res.in  Wed Apr 18 10:16:07 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Wed, 18 Apr 2007 15:46:07 +0530 (IST)
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph
 with Perl]
In-Reply-To: <4624E32A.6010704@bms.com>
References: <4624E32A.6010704@bms.com>
Message-ID: <36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>

Hi,

I am also interested to use the Bio::Graphics modules from dynamic image
display. I have a doubt,  I tried all the sample programs explained in
this page http://stein.cshl.org/genome_informatics/BioGraphics/index.html.
Is it possible to generate a png/jpg/gif image from this module by
altering the same program. Currently its using diplay option. I know this
can be done by using GD/Image::MAgick in Perl. But Is there any quick way
to accomplish it in BioPerl .

Thanks,


> Missed to send this to the list....
> Stefan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From sdavis2 at mail.nih.gov  Wed Apr 18 11:18:48 2007
From: sdavis2 at mail.nih.gov (Sean Davis)
Date: Wed, 18 Apr 2007 07:18:48 -0400
Subject: [Bioperl-l] [Fwd: Re: How to Create Sequence and TFBS Graph
	with Perl]
In-Reply-To: <36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
References: <4624E32A.6010704@bms.com>
	<36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
Message-ID: <200704180718.48811.sdavis2@mail.nih.gov>

On Wednesday 18 April 2007 06:16, Shameer Khadar wrote:
> Hi,
>
> I am also interested to use the Bio::Graphics modules from dynamic image
> display. I have a doubt,  I tried all the sample programs explained in
> this page http://stein.cshl.org/genome_informatics/BioGraphics/index.html.
> Is it possible to generate a png/jpg/gif image from this module by
> altering the same program. Currently its using diplay option. 

You just need to print $panel->png to a file.

Sean


From bix at sendu.me.uk  Wed Apr 18 11:48:27 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 12:48:27 +0100
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
Message-ID: <4626058B.8090801@sendu.me.uk>

Alex Lancaster wrote:
> In packaging bioperl-run for Fedora, I think I stumbled across a bug
> in the bioperl-run package.  It appears from this edit:
> 
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
> 
> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
> bioperl-run 1.5.2_100 still contains modules that use this module:
> 
> $ cd bioperl-run-1.5.2_100
> $ grep -r AccessorMaker  *
> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
> class min_version)]);
> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
> ('$'=>[qw(input_file output_file)]);

It looks like I've implemented a similar idea to AccessorMaker and 
AbstractRunner in Bio::Root::Root->_set_from_args() and 
Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses 
AbstractRunner I propose deprecating it immediately.

Forester::SDI and JavaRunner have no tests which is why we didn't notice 
the problem. Since they've been out of use for a number of years now I 
also propose their immediate deprecation. Alternatively, it may not be 
too difficult to just update them to use _set_from_args and _setparams, 
but I've nothing to test against (and JavaRunner is self-described as 
"probably incomplete").


I can remove the modules from cvs and create bioperl-run-1.5.2_101, 
resolving the packaging issue. I plan on doing precisely this within the 
next seven days unless someone puts a hand up to stop me.


[BCC: author, Juguang Xiao]


From cjfields at uiuc.edu  Wed Apr 18 12:43:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 07:43:45 -0500
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <4626058B.8090801@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>


On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:

> It looks like I've implemented a similar idea to AccessorMaker and
> AbstractRunner in Bio::Root::Root->_set_from_args() and
> Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
> AbstractRunner I propose deprecating it immediately.

JavaRunner is-a AbstractRunner, but what you propose below takes care  
of that.

> Forester::SDI and JavaRunner have no tests which is why we didn't  
> notice
> the problem. Since they've been out of use for a number of years now I
> also propose their immediate deprecation. Alternatively, it may not be
> too difficult to just update them to use _set_from_args and  
> _setparams,
> but I've nothing to test against (and JavaRunner is self-described as
> "probably incomplete").
>
>
> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
> resolving the packaging issue. I plan on doing precisely this  
> within the
> next seven days unless someone puts a hand up to stop me.
>
>
> [BCC: author, Juguang Xiao]

I suppose you could just remove the modules from the branch for now,  
but (as you point out) the code appears largely incomplete, so might  
as well deprecate the entire lot.  The code will be in the 'attic'  
once removed if anyone's really interested in it.

You've forwarded the author and the mail list so let's see what the  
response is (if any)...

chris


From cjfields at uiuc.edu  Wed Apr 18 15:30:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 10:30:45 -0500
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <462634DB.2040701@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
Message-ID: <143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>


On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:

> Chris Fields wrote:
>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>>> resolving the packaging issue. I plan on doing precisely this  
>>> within the
>>> next seven days unless someone puts a hand up to stop me.
>>>
>>> [BCC: author, Juguang Xiao]
> [snip]
>> You've forwarded the author and the mail list so let's see what  
>> the response is (if any)...
>
> Unfortunately the mail was undeliverable, and I have no other  
> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few  
> more days for other responses on the list.
>
> I never made a branch for bioperl-run 1.5.2, so they'd be removed  
> from HEAD.

It might be a good idea to repost this using the module names  
affected in the subject, just in case, though the last post he made  
on the mail list was ~3 years ago using the same email:

http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/ 
match=xiao

He may be MIA.

chris


From bix at sendu.me.uk  Wed Apr 18 15:10:19 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 16:10:19 +0100
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
Message-ID: <462634DB.2040701@sendu.me.uk>

Chris Fields wrote:
> 
> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
> 
>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>> resolving the packaging issue. I plan on doing precisely this within the
>> next seven days unless someone puts a hand up to stop me.
>>
>> [BCC: author, Juguang Xiao]
[snip]
> You've forwarded the author and the mail list so let's see what the 
> response is (if any)...

Unfortunately the mail was undeliverable, and I have no other address 
for Juguang (I tried juguang at tll.org.sg). I'll wait a few more days for 
other responses on the list.

I never made a branch for bioperl-run 1.5.2, so they'd be removed from HEAD.


From hlapp at gmx.net  Wed Apr 18 15:59:52 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 18 Apr 2007 11:59:52 -0400
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
	<143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
Message-ID: <EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>

There is a Juguang Xiao at juguang.swf at gmail.com. Not sure he's  
the same, but sounds like it's a geek at least. (google and you'll  
see; has anyone here ever heard about neko??)

	-hilmar

On Apr 18, 2007, at 11:30 AM, Chris Fields wrote:

>
> On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:
>
>> Chris Fields wrote:
>>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>>> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
>>>> resolving the packaging issue. I plan on doing precisely this
>>>> within the
>>>> next seven days unless someone puts a hand up to stop me.
>>>>
>>>> [BCC: author, Juguang Xiao]
>> [snip]
>>> You've forwarded the author and the mail list so let's see what
>>> the response is (if any)...
>>
>> Unfortunately the mail was undeliverable, and I have no other
>> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few
>> more days for other responses on the list.
>>
>> I never made a branch for bioperl-run 1.5.2, so they'd be removed
>> from HEAD.
>
> It might be a good idea to repost this using the module names
> affected in the subject, just in case, though the last post he made
> on the mail list was ~3 years ago using the same email:
>
> http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/
> match=xiao
>
> He may be MIA.
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Wed Apr 18 16:00:49 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Wed, 18 Apr 2007 12:00:49 -0400
Subject: [Bioperl-l] Immediate-effect deprecations (was: bioperl-run and
	Bio::Root::AccessorMaker)
In-Reply-To: <4626058B.8090801@sendu.me.uk>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <9159C9DF-41BC-46AA-8511-763AD9B7A3D0@gmx.net>

sounds good to me - the less cruft the better. -hilmar
On Apr 18, 2007, at 7:48 AM, Sendu Bala wrote:

> Alex Lancaster wrote:
>> In packaging bioperl-run for Fedora, I think I stumbled across a bug
>> in the bioperl-run package.  It appears from this edit:
>>
>> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/ 
>> Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
>>
>> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
>> bioperl-run 1.5.2_100 still contains modules that use this module:
>>
>> $ cd bioperl-run-1.5.2_100
>> $ grep -r AccessorMaker  *
>> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
>> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw 
>> (jar
>> class min_version)]);
>> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
>> ('$'=>[qw(input_file output_file)]);
>
> It looks like I've implemented a similar idea to AccessorMaker and
> AbstractRunner in Bio::Root::Root->_set_from_args() and
> Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
> AbstractRunner I propose deprecating it immediately.
>
> Forester::SDI and JavaRunner have no tests which is why we didn't  
> notice
> the problem. Since they've been out of use for a number of years now I
> also propose their immediate deprecation. Alternatively, it may not be
> too difficult to just update them to use _set_from_args and  
> _setparams,
> but I've nothing to test against (and JavaRunner is self-described as
> "probably incomplete").
>
>
> I can remove the modules from cvs and create bioperl-run-1.5.2_101,
> resolving the packaging issue. I plan on doing precisely this  
> within the
> next seven days unless someone puts a hand up to stop me.
>
>
> [BCC: author, Juguang Xiao]
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From cjfields at uiuc.edu  Wed Apr 18 16:25:54 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 11:25:54 -0500
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
	<E66E16E9-670B-41E8-A8AE-9CD1BC64C381@uiuc.edu>
	<462634DB.2040701@sendu.me.uk>
	<143D5493-3DA3-4227-A00D-D997EAAECEF1@uiuc.edu>
	<EF4EEF1C-89BF-4078-9D66-EF98745476A1@gmx.net>
Message-ID: <E0195EBD-731D-4915-91AD-7FFE1FA9F608@uiuc.edu>

My guess is the hilmar's is the most current as posts were made this  
year.  I found another email: juguang at fugu-sg.org.  Looks like he  
added some stuff to Ensembl a while back (sorry about the long URL).

http://www.ensembl.org/info/software/Pdoc/ensembl/modules/Bio/EnsEMBL/ 
Utils/Converter/ens_bio_featurePair_raw.html

chris

On Apr 18, 2007, at 10:59 AM, Hilmar Lapp wrote:

> There is a Juguang Xiao at juguang.swf at gmail.com. Not sure he's
> the same, but sounds like it's a geek at least. (google and you'll
> see; has anyone here ever heard about neko??)
>
> 	-hilmar
>
> On Apr 18, 2007, at 11:30 AM, Chris Fields wrote:
>
>>
>> On Apr 18, 2007, at 10:10 AM, Sendu Bala wrote:
>>
>>> Chris Fields wrote:
>>>> On Apr 18, 2007, at 6:48 AM, Sendu Bala wrote:
>>>>> I can remove the modules from cvs and create bioperl- 
>>>>> run-1.5.2_101,
>>>>> resolving the packaging issue. I plan on doing precisely this
>>>>> within the
>>>>> next seven days unless someone puts a hand up to stop me.
>>>>>
>>>>> [BCC: author, Juguang Xiao]
>>> [snip]
>>>> You've forwarded the author and the mail list so let's see what
>>>> the response is (if any)...
>>>
>>> Unfortunately the mail was undeliverable, and I have no other
>>> address for Juguang (I tried juguang at tll.org.sg). I'll wait a few
>>> more days for other responses on the list.
>>>
>>> I never made a branch for bioperl-run 1.5.2, so they'd be removed
>>> from HEAD.
>>
>> It might be a good idea to repost this using the module names
>> affected in the subject, just in case, though the last post he made
>> on the mail list was ~3 years ago using the same email:
>>
>> http://article.gmane.org/gmane.comp.lang.perl.bio.general/4049/
>> match=xiao
>>
>> He may be MIA.
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From bix at sendu.me.uk  Wed Apr 18 16:37:55 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Wed, 18 Apr 2007 17:37:55 +0100
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
Message-ID: <46264963.9020306@sendu.me.uk>

Hi all,

t/DB.t is currently failing tests 40 and 41:

ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
                                          '-ids' => [qw(J00522 AF303112 
2981014)],
                                          -verbose => 1);

cmp_ok $query->count, '>', 0;

You can see that 
http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522%2CAF303112%2C2981014&retmax=100 
gives no results, where presumably it used to give 3. querying on the 3 
ids individually works fine. So... what changed and how do we get around it?


From cjfields at uiuc.edu  Wed Apr 18 17:05:12 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 12:05:12 -0500
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <46264963.9020306@sendu.me.uk>
References: <46264963.9020306@sendu.me.uk>
Message-ID: <6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>

I can verify on this end.  Not sure why, but the same accessions are  
used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)  
with success.

chris

On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:

> Hi all,
>
> t/DB.t is currently failing tests 40 and 41:
>
> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>                                           '-ids' => [qw(J00522  
> AF303112
> 2981014)],
>                                           -verbose => 1);
>
> cmp_ok $query->count, '>', 0;
>
> You can see that
> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi? 
> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522% 
> 2CAF303112%2C2981014&retmax=100
> gives no results, where presumably it used to give 3. querying on  
> the 3
> ids individually works fine. So... what changed and how do we get  
> around it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Wed Apr 18 18:07:22 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 13:07:22 -0500
Subject: [Bioperl-l] Skipping/Failing tests
Message-ID: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>

To the BioPerl community at large,

I have noticed a problem with some BioPerl tests when converting to  
Test::More.  When using the following:

     while ($seq = $seqin->next_seq) {
         my $acc = $seq->accession;
         ok exists $result{ $acc };
         is $seq->length, $result{ $acc };
         delete $result{$acc};
     }

if $seq is undef then the test plan is off by a factor of 2 for every  
iteration of the loop.  Two serious problems:

1) No specific failures are seen until the end of the test suite when  
the test plan doesn't match the number of tests (which could be  
several hundred lines away from the actual failure).
2) Worse, if one were lazy enough to not track the actual number of  
tests (heh, not that would happen) they could inadvertently change  
the test plan to match the final number of tests.

There are several ways to work around this, such as using a counter  
to track the number of iterations and check to make sure they pass:

     $ct = 0;
     while ($seq = $seqin->next_seq) {
         $ct++;
         my $acc = $seq->accession;
         ok exists $result{ $acc };
         is $seq->length, $result{ $acc };
         delete $result{$acc};
     }
     is($ct, 3);

Here, if $ct is 0 you'll get an error.  However, the test count will  
still be off at the end (the test plan will be off by 6 tests).

My opinion is that we should try to match the plan, as a single fail  
doesn't reflect the severity of the bug (i.e. it should fail each  
test per iteration, as expected).  Skipping to match is an option as  
well (one I've used) but again doesn't reflect the severity of the  
problem in my opinion.  The flip side is that some consider any  
failed test significant, so there is no reason to try matching the  
tests up.

What I would like to do is hammer out something we can add to the  
Writing Tests HOWTO which addresses some ways to deal with the above  
for those who want to contribute code and tests to BioPerl.  I'm  
looking for some (any) additional opinions on the matter (or, if you  
have the initiative, adding some ideas to the HOWTO itself).

http://www.bioperl.org/wiki/Special:Recentchanges

Thanks!

chris


From ki.baik at roche.com  Wed Apr 18 18:32:35 2007
From: ki.baik at roche.com (Baik, Ki)
Date: Wed, 18 Apr 2007 11:32:35 -0700
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
References: <46264963.9020306@sendu.me.uk>
	<6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
Message-ID: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>

I have had similar problems in which a couple of accession numbers out
of a series were not retrieved, yet they do exist in ncbi.

Ki Baik

-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris Fields
Sent: Wednesday, April 18, 2007 10:05 AM
To: Sendu Bala
Cc: bioperl-l
Subject: Re: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures

I can verify on this end.  Not sure why, but the same accessions are  
used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)  
with success.

chris

On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:

> Hi all,
>
> t/DB.t is currently failing tests 40 and 41:
>
> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>                                           '-ids' => [qw(J00522  
> AF303112
> 2981014)],
>                                           -verbose => 1);
>
> cmp_ok $query->count, '>', 0;
>
> You can see that
> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi? 
> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522% 
> 2CAF303112%2C2981014&retmax=100
> gives no results, where presumably it used to give 3. querying on  
> the 3
> ids individually works fine. So... what changed and how do we get  
> around it?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From arareko at campus.iztacala.unam.mx  Wed Apr 18 19:12:29 2007
From: arareko at campus.iztacala.unam.mx (Mauricio Herrera Cuadra)
Date: Wed, 18 Apr 2007 14:12:29 -0500
Subject: [Bioperl-l] Skipping/Failing tests
In-Reply-To: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>
References: <D686FF00-BEEE-40A3-90E8-CAE4756E2E33@uiuc.edu>
Message-ID: <46266D9D.1050703@campus.iztacala.unam.mx>

Hey Chris,

I don't know if this helps those working on the test suite but, there's 
a recently-cooked recipe for keeping track on the number of tests (thus 
helping to update the test plan accordingly):

http://www.perl.com/pub/a/2007/04/12/lightning-four.html?page=3

My quick .2 cents :)

Cheers,
Mauricio.

Chris Fields wrote:
> To the BioPerl community at large,
> 
> I have noticed a problem with some BioPerl tests when converting to  
> Test::More.  When using the following:
> 
>      while ($seq = $seqin->next_seq) {
>          my $acc = $seq->accession;
>          ok exists $result{ $acc };
>          is $seq->length, $result{ $acc };
>          delete $result{$acc};
>      }
> 
> if $seq is undef then the test plan is off by a factor of 2 for every  
> iteration of the loop.  Two serious problems:
> 
> 1) No specific failures are seen until the end of the test suite when  
> the test plan doesn't match the number of tests (which could be  
> several hundred lines away from the actual failure).
> 2) Worse, if one were lazy enough to not track the actual number of  
> tests (heh, not that would happen) they could inadvertently change  
> the test plan to match the final number of tests.
> 
> There are several ways to work around this, such as using a counter  
> to track the number of iterations and check to make sure they pass:
> 
>      $ct = 0;
>      while ($seq = $seqin->next_seq) {
>          $ct++;
>          my $acc = $seq->accession;
>          ok exists $result{ $acc };
>          is $seq->length, $result{ $acc };
>          delete $result{$acc};
>      }
>      is($ct, 3);
> 
> Here, if $ct is 0 you'll get an error.  However, the test count will  
> still be off at the end (the test plan will be off by 6 tests).
> 
> My opinion is that we should try to match the plan, as a single fail  
> doesn't reflect the severity of the bug (i.e. it should fail each  
> test per iteration, as expected).  Skipping to match is an option as  
> well (one I've used) but again doesn't reflect the severity of the  
> problem in my opinion.  The flip side is that some consider any  
> failed test significant, so there is no reason to try matching the  
> tests up.
> 
> What I would like to do is hammer out something we can add to the  
> Writing Tests HOWTO which addresses some ways to deal with the above  
> for those who want to contribute code and tests to BioPerl.  I'm  
> looking for some (any) additional opinions on the matter (or, if you  
> have the initiative, adding some ideas to the HOWTO itself).
> 
> http://www.bioperl.org/wiki/Special:Recentchanges
> 
> Thanks!
> 
> chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
MAURICIO HERRERA CUADRA
arareko at campus.iztacala.unam.mx
Laboratorio de Gen?tica
Unidad de Morfofisiolog?a y Funci?n
Facultad de Estudios Superiores Iztacala, UNAM


From cjfields at uiuc.edu  Wed Apr 18 19:41:56 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Wed, 18 Apr 2007 14:41:56 -0500
Subject: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
In-Reply-To: <6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
References: <46264963.9020306@sendu.me.uk>
	<6F311497-E1D2-42E1-9E9E-54E2A38343D5@uiuc.edu>
	<6D5431B47E46BD45AAA453432AD3B803027551D4@rpbmsem01.nala.roche.com>
Message-ID: <208DCD0F-6A0B-4054-A1C7-D599D32AC344@uiuc.edu>

The problem appears to be with eutils.  Using bare accession numbers  
no longer works with esearch (which Bio::DB::Query::GenBank uses).   
Using them via efetch still works, which explains why  
Bio::DB::GenBank passes tests using the same accession/GI mix.

NCBI has added an extra field descriptor specifically for accessions  
in esearch, which means any queries with accessions must look like  
the following (the last is a GI):

'J00522[accession] OR AF303112[accession] OR 2981014'

'J00522[accession] | AF303112[accession] | 2981014' also works.

We could separate them into two groups based on presence of letters  
and set up the query that way, or we can define exactly what kind of  
ID is acceptable for passing to ids() (GI or accession), or have ids 
() be GI and have a new method for accessions (or vice versa).   
Thoughts?

chris

On Apr 18, 2007, at 1:32 PM, Baik, Ki wrote:

> I have had similar problems in which a couple of accession numbers out
> of a series were not retrieved, yet they do exist in ncbi.
>
> Ki Baik
>
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Chris  
> Fields
> Sent: Wednesday, April 18, 2007 10:05 AM
> To: Sendu Bala
> Cc: bioperl-l
> Subject: Re: [Bioperl-l] DB.t (Bio::DB::Query::GenBank) failures
>
> I can verify on this end.  Not sure why, but the same accessions are
> used earlier in DB.t tests (Bio::DB::GenBank and get_Stream_by_acc)
> with success.
>
> chris
>
> On Apr 18, 2007, at 11:37 AM, Sendu Bala wrote:
>
>> Hi all,
>>
>> t/DB.t is currently failing tests 40 and 41:
>>
>> ok $query = Bio::DB::Query::GenBank->new('-db'  => 'nucleotide',
>>                                           '-ids' => [qw(J00522
>> AF303112
>> 2981014)],
>>                                           -verbose => 1);
>>
>> cmp_ok $query->count, '>', 0;
>>
>> You can see that
>> http://www.ncbi.nih.gov/entrez/eutils/esearch.fcgi?
>> db=nucleotide&datetype=mdat&usehistory=y&tool=bioperl&term=J00522%
>> 2CAF303112%2C2981014&retmax=100
>> gives no results, where presumably it used to give 3. querying on
>> the 3
>> ids individually works fine. So... what changed and how do we get
>> around it?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From boconnor at ucla.edu  Wed Apr 18 19:00:32 2007
From: boconnor at ucla.edu (Brian O'Connor)
Date: Wed, 18 Apr 2007 12:00:32 -0700
Subject: [Bioperl-l] Packaging bioperl for Fedora
In-Reply-To: <6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>
References: <5c24dcc30703301730v987b749q27c917a17bfd5905@mail.gmail.com>	
	<C2340DDA.D83F%bosborne11@verizon.net>
	<6dce9a0b0704171047u6c0d46e8sfefaf8437e413ec5@mail.gmail.com>
Message-ID: <46266AD0.3070209@ucla.edu>

Hey Lincoln,

This looks good but the configuration step is about to change for 
Biopackages.  I'm writing config RPMs today so the end user can just 
install the config RPM for their distro and they don't have to manually 
change the yum.conf file.  It will also install the biopackages gpg key 
too so we can support signed packages.  I'll update the wiki when these 
config RPMs are available.

--Brian

Lincoln Stein wrote:

> Hi,
>
> I've been updating the WIKI in anticipation of a new GBrowse release 
> and have added a "stub" for the biopackages.net 
> <http://biopackages.net> install. Since I don't use yum (I've been 
> running Slackware for ages and have recently started working with 
> Ubuntu) I'm not sure I got the details right. Could someone check?
>
>
>         http://www.gmod.org/wiki/index.php/GBrowse_RPM_HOWTO
>
> Also, I think some verbiage on how to use yum to install MySQL and 
> Apache would be great, since it will be consistent with the Ubuntu 
> install page.
>
> Thanks,
>
> Lincoln
>
> On 3/31/07, *Brian Osborne* <bosborne11 at verizon.net 
> <mailto:bosborne11 at verizon.net>> wrote:
>
>     Allen et al.,
>
>     What happened to the "GMOD" package or packages? I've had some
>     conversations
>     in the past few months with you-all suggesting that a GMOD package, or
>     packages, would be useful.
>
>     Brian O.
>
>
>
>
>     On 3/30/07 8:30 PM, "Allen Day" <allenday at gmail.com
>     <mailto:allenday at gmail.com>> wrote:
>
>     > Hi Alex,
>     >
>     > You've aptly noted that there are several classes of packages being
>     > discussed here, and that they should not be treated
>     equally.  From my
>     > point of view and of specific relevance to the Bioperl community we
>     > have at least:
>     >
>     > 1) "regular" CPAN dependencies and their occassional C/C++/Fortran
>     > dependencies.  These should all be in Fedora Extras, as they are of
>     > general utility.  Biopackages.net <http://Biopackages.net>
>     currently hosts about 200 packages
>     > (.spec files, specifically) that are like this.  Maybe 80 of
>     these are
>     > needed for Bioperl.
>     >
>     > 2) academic packages, such as BLAT, NCBI Toolkit, CLUSTAL, genscan,
>     > etc.  From what I've seen, these typically have strange/custom
>     > licenses that may not be valid for some users.  BLAT has a dual
>     > licensing scheme for academic and non-academic licensees, for
>     > instance.  These packages are not of general utility.  For these two
>     > reasons, my stance is that they should not be included in Fedora
>     > Extras.
>     >
>     > 3) Bioperl packages.  Several subsets here.  The Bioperl-run
>     libraries
>     > depend directly on type (2) packages, so aren't appropriate to
>     include
>     > in Fedora Extras.  Bioperl-live is not really that useful
>     without type
>     > (2) packages.  It is also sensible to all of the keep the Bioperl-*
>     > packages in the same repository.  For these reasons, my stance
>     is that
>     > they should not be included in Fedora Extras.
>     >
>     > 4) Bioinformatics / Comp. Bio. data sets.  These don't have
>     licensing
>     > problems, but they tend to be large.  Usually in the 10E7 -
>     10E10 byte
>     > range.  RPM can not even generate correct metadata for some of them
>     > correctly if the files are too large (overflow problems).  Probably
>     > not appropriate to put in Fedora Extras because they are too
>     large and
>     > not generally useful.
>     >
>     > 5) Bioinformatics-specific System databases / daemons.  These
>     > high-level packages depend on types (2), (3), and (4), and so
>     are not
>     > appropriate to put into Fedora Extras.  An example is a BLAT daemon,
>     > which relies on the BLAT server, as well as NIB-formatted genome
>     > sequence files.
>     >
>     > That said, there are a lot of type (1) packages in the
>     Biopackages.net <http://Biopackages.net>
>     > repository.  If you're interested in migrating the spec files
>     from our
>     > repository to the Fedora project it would save us (the
>     Biopackages.net <http://Biopackages.net>
>     > maintainers) a ton of build and maintenance time, so please feel
>     free
>     > to take them, just let us know.  If we can reach some agreement on
>     > where the bioinformatics-specific packages should be
>     maintained/built
>     > we may be able to work together on these as well.
>     >
>     > -Allen
>     >
>     >
>     > On 3/30/07, Alex Lancaster < alexl at users.sourceforge.net
>     <mailto:alexl at users.sourceforge.net>> wrote:
>     >>>>>>> "AD" == Allen Day  writes:
>     >>
>     >> AD> Hi Alex, The Biopackages.net <http://Biopackages.net>
>     project is still active, we are
>     >> AD> regularly adding packages to it, mostly R packages
>     lately.  Most
>     >> AD> of the systems we use are running CentOS at this point,
>     which is
>     >> AD> why you have not seen support for FC6 yet.  There is nothing
>     >> AD> preventing building FC6 packages aside from lack of time to
>     set up
>     >> AD> the FC6 build farm nodes.
>     >>
>     >> Hi Allen and other,
>     >>
>     >> Great news to hear that Biopackages.net
>     <http://Biopackages.net> is still active!  I would like
>     >> to help out if possible.  I don't believe in "FUD" either... ;)
>     >>
>     >> AD> If you're interested in packaging BioPerl or other
>     >> AD> bioinformatics-related software, please join the Biopackages
>     >> AD> project on SourceForge.  We object to the Fedora Extras FUD
>     >> AD> tactics used to discourage people from using 3rd party
>     >> AD> repositories, and suspect they may not want to host some of our
>     >> AD> data packages, such as the >2GB genome packages.  Biopackages
>     >> AD> project is likely to partially merge with RPMForge.  We are
>     >> AD> already discussing with them how best to do it.
>     >>
>     >> The packages that I created which are currently available in Fedora
>     >> Packages are Perl dependencies which, as I said are useful for
>     >> packages outside the bioinformatics purview.  I do have a (base)
>     >> bioperl package in review, but it is not yet released.
>     >>
>     >> As for third-party repos, I don't object to them at all, and
>     for some
>     >> kinds of projects they are indeed appropriate. (e.g. for non-free
>     >> stuff like Livna or Freshrpms).  However I do have practical
>     concerns
>     >> about repository mixing, but I think that it does need to be
>     handled
>     >> carefully but that co-operation between Fedora and third-party
>     repos
>     >> can make it work.
>     >>
>     >> For example, one practical concern is that as of the
>     >> soon-to-be-released Fedora 7, Core+Extras will be merged, so there
>     >> will be no distinction at the repository-level between formerly
>     Extras
>     >> packages and formerly Core packages (as of now there are only
>     "Fedora
>     >> Packages"), which means that it will not be possible for
>     third-party
>     >> repos to limit their dependencies to just those in a former
>     base set
>     >> (i.e. excluding Extras).
>     >>
>     >> I agree that a few years ago (circa 2003-2004) there was
>     concern about
>     >> the way some third party repositories were treated somewhat
>     badly by
>     >> the (then) Fedora Extras (with some people going so far as to
>     say that
>     >> third-party repos were bad in principle and should always be
>     ignored
>     >> which I disagree with too).  But it seems to me that culture has
>     >> shifted since, with some notable packagers such as Matthias
>     Saou (of
>     >> Freshrpms) and Axel Thimm (of Atrpms) now contributing packages to
>     >> Fedora itself.  The process of contributing has also become much
>     >> simpler and reviews are conducted speedily and efficiently, I had
>     >> packages in the repository in a matter of a few days from initial
>     >> submission.  Freshrpms itself now enables and depends on the (old)
>     >> Extras.
>     >>
>     >> The real question for me, then is what packages it makes sense
>     to go
>     >> in Fedora, and what packages go in third party
>     repositories.  It seems
>     >> to me that in the case of Perl packages which could be
>     dependencies
>     >> for other packages not specific to the third-party repo in
>     question,
>     >> it makes sense for them to go into Fedora itself, so I think I will
>     >> continue to package them.  This lessens the load on the
>     third-party
>     >> repo, while making them available for all other third-party repos.
>     >> (This is approach that Freshrpms seems to be taking, Matthias has
>     >> contributed most packages back to Fedora now other than the
>     non-free
>     >> ones).
>     >>
>     >> At the other end of the spectrum are packages like you mention,
>     genome
>     >> packages, which may be of concern because of their size and/or
>     highly
>     >> specialised nature, and, as you say, may make sense to go in a
>     >> third-party repo like Biopackages.net
>     <http://Biopackages.net>.  Also packages which can't be
>     >> packaged by Fedora for legal reasons like Clustal could/should
>     go in
>     >> Biopackages.net <http://Biopackages.net>.
>     >>
>     >> In the middle are packages like bioperl itself which are
>     potentially
>     >> useful to perhaps a wider group of people than the genome
>     packages but
>     >> may not necessarily be dependencies for other packages.  I lean
>     >> towards making them part of Fedora so that they will be
>     available of
>     >> out the box on the planned "Everything" DVD ISO, but I welcome a
>     >> discussion on this.
>     >>
>     >> As I said, I'm glad to hear that Biopackages.net
>     <http://Biopackages.net> is alive and well and
>     >> I welcome a discussion on how upstream Fedora can usefully interact
>     >> with Biopackages.net <http://Biopackages.net> (I guess perhaps
>     on the Biopackages.net <http://Biopackages.net> list).
>     >>
>     >> Regards,
>     >> Alex
>     >>
>     >> PS.  As the upstream author If you could clarify the license on
>     >> perl-SVG-Graph, on CPAN (or on the mailing list) that would be
>     great.
>     >> --
>     >> Alex Lancaster, Ph.D. | Ecology & Evolutionary Biology,
>     University of Arizona
>     >>
>     >>
>     >>
>     >> _______________________________________________
>     >> Bioperl-l mailing list
>     >> Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>     >>
>     > _______________________________________________
>     > Bioperl-l mailing list
>     > Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>     _______________________________________________
>     Bioperl-l mailing list
>     Bioperl-l at lists.open-bio.org <mailto:Bioperl-l at lists.open-bio.org>
>     http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> -- 
> Lincoln D. Stein
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724
> (516) 367-8380 (voice)
> (516) 367-8389 (fax)
> FOR URGENT MESSAGES & SCHEDULING,
> PLEASE CONTACT MY ASSISTANT,
> SANDRA MICHELSEN, AT michelse at cshl.edu <mailto:michelse at cshl.edu> 


From alexl at users.sourceforge.net  Thu Apr 19 01:17:34 2007
From: alexl at users.sourceforge.net (Alex Lancaster)
Date: Wed, 18 Apr 2007 18:17:34 -0700
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <4626058B.8090801@sendu.me.uk> (Sendu Bala's message of "Wed\,
	18 Apr 2007 12\:48\:27 +0100")
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
	<4626058B.8090801@sendu.me.uk>
Message-ID: <e43b2x6u35.fsf@delpy.biol.berkeley.edu>

>>>>> "SB" == Sendu Bala  writes:

[...]

SB> I can remove the modules from cvs and create
SB> bioperl-run-1.5.2_101, resolving the packaging issue. I plan on
SB> doing precisely this within the next seven days unless someone
SB> puts a hand up to stop me.

In the meantime, until bioperl-run-1.5.2_101 is available, is it safe
just to remove these four .pm files during the packaging so they
don't get installed?  It looks like these four files are
self-contained and are only required/used by each other:

$ grep -r AccessorMaker *
Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar class min_version)]);
Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(input_file output_file)]);

$ grep -r AbstractRunner *
Tools/Run/JavaRunner.pm:use Bio::Tools::Run::AbstractRunner;
Tools/Run/JavaRunner.pm:our @ISA=qw(Bio::Tools::Run::AbstractRunner);
Tools/Run/AbstractRunner.pm:package Bio::Tools::Run::AbstractRunner;
Tools/Run/AbstractRunner.pm:Bio::Tools::Run::AbstractRunner

$ grep -r JavaRunner *
Tools/Run/Phylo/Forester/SDI.pm:use Bio::Tools::Run::JavaRunner;
Tools/Run/Phylo/Forester/SDI.pm:our @ISA=qw(Bio::Tools::Run::JavaRunner);
Tools/Run/JavaRunner.pm:package Bio::Tools::Run::JavaRunner;
Tools/Run/JavaRunner.pm: Usage   : $runner = Bio::Tools::Run::JavaRunner->new(-jar => $jar)
Tools/Run/JavaRunner.pm: Function: Builds a new Bio::Tools::Run::JavaRunner object
Tools/Run/JavaRunner.pm: Returns : Bio::Tools::Run::JavaRunner
Tools/Run/JavaRunner.pm:Bio::Tools::Run::JavaRunner - run java programs
Tools/Run/JavaRunner.pm:   my $runner = Bio::Tools::Run::JavaRunner->new(-jar => $jar);

$ grep -r Forester *
Tools/Run/Phylo/Forester/SDI.pm:package Bio::Tools::Run::Phylo::Forester::SDI;
Tools/Run/Phylo/Forester/SDI.pm:Bio::Tools::Run::Phylo::Forester::SDI
Tools/Run/Phylo/Forester/SDI.pm:    my $runner = Bio::Tools::Run::Phylo::Forester::SDI->new();
Tools/Run/Phylo/Forester/SDI.pm:This wrapper is for SDI in Forester package. 
Tools/Run/Phylo/Forester/SDI.pm:For more details on Forester, please see 

Alex


From sac at bioperl.org  Thu Apr 19 05:14:02 2007
From: sac at bioperl.org (Steve Chervitz)
Date: Wed, 18 Apr 2007 22:14:02 -0700
Subject: [Bioperl-l] GenericHit->start/end needs tiled hsps?
In-Reply-To: <461F3FBA.2010101@sendu.me.uk>
References: <461F3FBA.2010101@sendu.me.uk>
Message-ID: <8f200b4c0704182214j77a4accy72f71b2061764d5b@mail.gmail.com>

Sendu,

Your thinking here seems correct and in fact agrees with the documentation
for those methods:

start():  If there is more than one HSP, the lowest start
           value of all HSPs is returned.

end():  If there is more than one HSP, the largest end
          value of all HSPs is returned.

It would be fine with me to change the implementation in GenericHit as you
suggest and to not tile the HSPs. Tiling is only necessary for data that is
summed across the region covered by all HSPs, as is done by these methods:
matches(), gaps(), frac_* and percent_*.

Steve

On 4/13/07, Sendu Bala <bix at sendu.me.uk> wrote:
>
> Hi all,
>
> I want to double-check my thinking regarding
> Bio::Search::Hit::GenericHit->start() and end(). Right now the docs
> claim that hsps of the hit object must be tiled before the answer can be
> produced. The code is implemented in that way
> (Bio::Search::SearchUtils::tile_hsps($self)).
>
> Yet as far as I can see, all you need to do is loop through all hsps and
> pick out the smallest start and largest end respectively in terms of
> subject and query.
>
> This comes up because I have a blast report where a single hit contains
> over 80000 hsps and the tiling takes over an hour (I gave up on it,
> don't know how long it really takes). The simple loop through hsps takes
> seconds or less.
>
> Now in this situation the answer isn't especially useful (to me). An
> alternative way of fixing the problem would be to re-write the tiling
> algorithm (again) to somehow make it hundreds of times faster, then
> provide some way in start() and end() for the user to request the start
> and end of the best contig, or other contig of choice. Easier said than
> done though!
>
>
> What do people think?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bix at sendu.me.uk  Thu Apr 19 10:52:45 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 11:52:45 +0100
Subject: [Bioperl-l] Immediate-effect deprecations
In-Reply-To: <e43b2x6u35.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>	<4626058B.8090801@sendu.me.uk>
	<e43b2x6u35.fsf@delpy.biol.berkeley.edu>
Message-ID: <462749FD.3080603@sendu.me.uk>

Alex Lancaster wrote:
>>>>>> "SB" == Sendu Bala  writes:
> 
> [...]
> 
> SB> I can remove the modules from cvs and create
> SB> bioperl-run-1.5.2_101, resolving the packaging issue. I plan on
> SB> doing precisely this within the next seven days unless someone
> SB> puts a hand up to stop me.
> 
> In the meantime, until bioperl-run-1.5.2_101 is available, is it safe
> just to remove these four .pm files during the packaging so they
> don't get installed?

Sure, go ahead with that.


From bix at sendu.me.uk  Thu Apr 19 10:51:53 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 11:51:53 +0100
Subject: [Bioperl-l] To be deprecated: Bio::Tools::Run::AbstractRunner,
 Bio::Tools::Run::Phylo::Forester::SDI and
 Bio::Tools::Run::JavaRunner
In-Reply-To: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
References: <av8xcqdq1g.fsf@delpy.biol.berkeley.edu>
Message-ID: <462749C9.1040503@sendu.me.uk>

[repost under new subject to make sure it is seen by those it may concern]

[BCC: Juguang Xiao at a variety of possible email addresses]


Alex Lancaster wrote:
> In packaging bioperl-run for Fedora, I think I stumbled across a bug
> in the bioperl-run package.  It appears from this edit:
> 
> http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/Bio/Root/Attic/AccessorMaker.pm?hideattic=0&cvsroot=bioperl
> 
> that Bio::Root::AccessorMaker was removed in bioperl 1.5.x, but
> bioperl-run 1.5.2_100 still contains modules that use this module:
> 
> $ cd bioperl-run-1.5.2_100
> $ grep -r AccessorMaker  *
> Bio/Tools/Run/Phylo/Forester/SDI.pm:use Bio::Root::AccessorMaker (
> Bio/Tools/Run/JavaRunner.pm:use Bio::Root::AccessorMaker ('$'=>[qw(jar
> class min_version)]);
> Bio/Tools/Run/AbstractRunner.pm:use Bio::Root::AccessorMaker
> ('$'=>[qw(input_file output_file)]);

It looks like I've implemented a similar idea to AccessorMaker and
AbstractRunner in Bio::Root::Root->_set_from_args() and
Bio::Tools::Run::WrapperBase->_setparams(). Since nothing uses
AbstractRunner I propose deprecating it immediately.

Forester::SDI and JavaRunner have no tests which is why we didn't notice
the problem. Since they've been out of use for a number of years now I
also propose their immediate deprecation. Alternatively, it may not be
too difficult to just update them to use _set_from_args and _setparams,
but I've nothing to test against (and JavaRunner is self-described as
"probably incomplete").


I can remove the modules from cvs and create bioperl-run-1.5.2_101,
resolving the packaging issue. I plan on doing precisely this within the
next seven days unless someone puts a hand up to stop me.


From bix at sendu.me.uk  Thu Apr 19 12:17:19 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 13:17:19 +0100
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
In-Reply-To: <200704050059.l350xNF07452@cricket.bio.indiana.edu>
References: <200704050059.l350xNF07452@cricket.bio.indiana.edu>
Message-ID: <46275DCF.6030103@sendu.me.uk>

Don Gilbert wrote:
> Dear Bioperl list,
> 
> There is a small bug in what I think is the current Bio::Tools::GFF.pm,
> that blocks output of Target attributes (in gff3 at least).  See a patch
> here
> 
> http://wiki.gmod.org/index.php/Load_BLAST_Into_Chado#Convert_BLAST_analysis_to_GFF

The patch was applied by Brian but is currently generating this warning:

./Build test --test_files t/GbrowseGFF.t --verbose
t/GbrowseGFF....1..5
ok 1 - use Bio::SearchIO;
ok 2 - use Bio::SearchIO::Writer::GbrowseGFF;
ok 3 - use Bio::Root::IO;
ok 4
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
Use of uninitialized value in sprintf at 
/.../core/blib/lib/Bio/Tools/GFF.pm line 1020, <GEN1> line 193.
ok 5
ok
All tests successful.

Can this patch be looked at again and rolled-back if the problem can't 
be fixed?


Cheers,
Sendu.


From sm8 at sanger.ac.uk  Thu Apr 19 11:49:30 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 12:49:30 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>

 is there an existing method for copying a Bio::Tree::Tree object by
value?

All the best,
Stephen


From bix at sendu.me.uk  Thu Apr 19 12:43:44 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 13:43:44 +0100
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
Message-ID: <46276400.2020207@sendu.me.uk>

Stephen Montgomery wrote:
>  is there an existing method for copying a Bio::Tree::Tree object by
> value?

What do you mean? Describe in a little more detail what you're trying to do.


From sm8 at sanger.ac.uk  Thu Apr 19 13:13:44 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 14:13:44 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>

my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object

as an example, a method like
my $tree_copy = $tree->clone; #copies by value (this method doesn't
exist) or
my $tree_copy = Storable::dclone($tree); 

Cheers,
Stephen

-----Original Message-----
From: Sendu Bala [mailto:bix at sendu.me.uk] 
Sent: 19 April 2007 13:44
To: Stephen Montgomery
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] tree copy by-value

Stephen Montgomery wrote:
>  is there an existing method for copying a Bio::Tree::Tree object by
> value?

What do you mean? Describe in a little more detail what you're trying to
do.


From jason at bioperl.org  Thu Apr 19 13:19:05 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 19 Apr 2007 06:19:05 -0700
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B03867599@exchsrv2.internal.sanger.ac.uk>
Message-ID: <35813ADC-6597-46FC-8FB8-C70AA3541BEC@bioperl.org>

I don't think so, worst case you serialize to/from TreeIO and get a  
new one, but the _internal_id of the nodes will be necessarily  
different (and new).

-jason
On Apr 19, 2007, at 4:49 AM, Stephen Montgomery wrote:

>  is there an existing method for copying a Bio::Tree::Tree object by
> value?
>
> All the best,
> Stephen
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From bix at sendu.me.uk  Thu Apr 19 13:24:41 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 19 Apr 2007 14:24:41 +0100
Subject: [Bioperl-l] tree copy by-value
In-Reply-To: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>
References: <A8AB69F227E96F4DBED773D3D70A295B038675E3@exchsrv2.internal.sanger.ac.uk>
Message-ID: <46276D99.2060108@sendu.me.uk>

Stephen Montgomery wrote:
> my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object
> 
> as an example, a method like
> my $tree_copy = $tree->clone; #copies by value (this method doesn't
> exist) or
> my $tree_copy = Storable::dclone($tree); 

Right, sorry for being a little slow on the uptake. As a matter of fact 
I recently added _clone() to Bio::Tree::TreeFunctionsI which does a 
"safe tree clone that doesn't seg fault". Its undocumented and I thought 
would only be needed by simplify_to_leaves_string(), but I guess I can 
document it and make it public (ie. remove the underscore from the name) 
if this might be popular.

Oh, it's also not that well tested, so proceed with caution and provide 
feedback if you can.


Cheers,
Sendu.


From ewijaya at gmail.com  Thu Apr 19 13:27:45 2007
From: ewijaya at gmail.com (Edward Wijaya)
Date: Thu, 19 Apr 2007 21:27:45 +0800
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
	Enable Connector
Message-ID: <3521d3670704190627u6aba98b1nc3892833b6a77c1c@mail.gmail.com>

Dear expert,

My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png
is created with the script (down below).

How can I modify the script such that:

1. The arrow track is represented in negative form.
    I.e. instead of 1 to 300, we use -300 to 0.

I tried this, but won't do:

my $flen = Bio::SeqFeature::Generic->new(
        -start => -300,
        -end => 0, );

And how can I make these number to appear
for every gridpoints (not just two as I have now).


2. How can I enable the connector with grid just like
   I had in the first panel? (as you can see, my script
   has connector added, but still doesn't show).

All, in all, I am trying to mimic this figure:
http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg

And here is my script:

__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
use List::Compare;
use List::Util qw(max);

my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
my @seqid = keys %nofseq;
my @lenlist = values %nofseq;
my $maxlen = max (@lenlist);
#print Dumper \@seqid ;

my $panel = Bio::Graphics::Panel->new(
    -length    => 300,
    -width     => 500,
    -pad_left  => 70,
    -pad_right => 70,
    -key_style => 'left',
    -connector => 'solid',
);

my $flen = Bio::SeqFeature::Generic->new(
        -start => 1,   # tried -300
        -end => 300, # and 0, but failed.
);

    my $track1 = $panel->add_track(
        $flen,
        -glyph   => 'arrow',
        -tick    => 2,
        -fgcolor => 'black',
        -double  => 1,
    );


my %nlist;

while ( <DATA> ) {
    chomp;
    next if /^\#/;
    my ($sqi,$pos,$str,$progname) = split /\,/;
    my $start = $pos + $nofseq{$sqi};
    my $end = $start + length($str) + 1;
    push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
}

# Check which sequence has no motifs;
my @bssi = keys %nlist;

my $lc = List::Compare->new(\@seqid, \@bssi);
my @comp = $lc->get_unique;


foreach my $comp ( @comp  ) {
    push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";

}

my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );

foreach my $seqid ( sort keys %nlist ) {


    my $track = $panel->add_track(
        -glyph     => 'graded_segments',
        -key       => "SEQ ". $seqid,
        -connector => "dashed"
        -label     => 1,
        -bgcolor   => 'blue',
		-bump      =>  +1,
		-height    =>  8,
        -min_score => 0,
        -max_score => 5000
    );


    foreach my $range ( @{$nlist{$seqid}} ) {

        my ($st,$en,$progname) = split(" ", $range);
        my $dname = " ";
        if ( $st != 0 and $en !=0  ) {
           $dname = "Seq ". $seqid;
        }

        my $score;
        if ( $progname eq "WEEDER" ) {
            $score = $prog_color{$progname};

        }
        elsif ($progname eq "MEME" ) {
            $score = $prog_color{$progname};
        }

        my $feature = Bio::SeqFeature::Generic->new(
            -display_name => $dname,
            -start        => $st,
            -end          => $en,
            -score        => $score
        );

        $track->add_feature($feature);

    }

}

print $panel->png;

#The DATA is simply just list of string and its location in their
respective sequence.
# The figure is just the plot of it out.
__DATA__
# sequence number,pos,binding sites,program
4,-63,AGCTTTCTCT,MEME
0,-22,AACTTTGTAC,WEEDER
1,-13,AAGTTTCTCT,WEEDER
5,-228,ACCTTTGCCA,MEME
5,-121,AAGTTTGTCT,WEEDER
5,-88,AAGTTTTTCC,SPACE
3,-148,AACTTAGTCA,MEME
0,-184,AACTTTGTCT,MEME
__END__


Thanks and hope to hear from you again.

--
Regards,
Edward WIJAYA


From sm8 at sanger.ac.uk  Thu Apr 19 13:33:18 2007
From: sm8 at sanger.ac.uk (Stephen Montgomery)
Date: Thu, 19 Apr 2007 14:33:18 +0100
Subject: [Bioperl-l] tree copy by-value
Message-ID: <A8AB69F227E96F4DBED773D3D70A295B038675FB@exchsrv2.internal.sanger.ac.uk>

Thanks Sendu!  That is perfect.
Cheers
Stephen

-----Original Message-----
From: Sendu Bala [mailto:bix at sendu.me.uk] 
Sent: 19 April 2007 14:25
To: Stephen Montgomery
Cc: bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] tree copy by-value

Stephen Montgomery wrote:
> my $tree_copy = $tree;  #copies by reference a Bio::Tree::Tree object
> 
> as an example, a method like
> my $tree_copy = $tree->clone; #copies by value (this method doesn't
> exist) or
> my $tree_copy = Storable::dclone($tree); 

Right, sorry for being a little slow on the uptake. As a matter of fact 
I recently added _clone() to Bio::Tree::TreeFunctionsI which does a 
"safe tree clone that doesn't seg fault". Its undocumented and I thought

would only be needed by simplify_to_leaves_string(), but I guess I can 
document it and make it public (ie. remove the underscore from the name)

if this might be popular.

Oh, it's also not that well tested, so proceed with caution and provide 
feedback if you can.


Cheers,
Sendu.


From ewijaya at i2r.a-star.edu.sg  Thu Apr 19 13:59:05 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Thu, 19 Apr 2007 21:59:05 +0800
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
	Enable Connector
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>


Dear expert,

My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png <http://defiant.i2r.a-star.edu.sg/%7Eewijaya/misc/foo2.png> 
is created with the script (down below).

How can I modify the script such that:

1. The arrow track is represented in negative form.
   I.e. instead of 1 to 300, we use -300 to 0.

I tried this, but won't do:

my $flen = Bio::SeqFeature::Generic->new(
       -start => -300,
       -end => 0, );

And how can I make these number to appear
for every gridpoints (not just two as I have now).


2. How can I enable the connector with grid just like
  I had in the first panel? (as you can see, my script
  has connector added, but still doesn't show).

All, in all, I am trying to mimic this figure:
http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg <http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg> 

And here is my script:

__BEGIN__
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
use Bio::Graphics;
use Bio::SeqFeature::Generic;
use List::Compare;
use List::Util qw(max);

my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
my @seqid = keys %nofseq;
my @lenlist = values %nofseq;
my $maxlen = max (@lenlist);
#print Dumper \@seqid ;

my $panel = Bio::Graphics::Panel->new(
   -length    => 300,
   -width     => 500,
   -pad_left  => 70,
   -pad_right => 70,
   -key_style => 'left',
   -connector => 'solid',
);

my $flen = Bio::SeqFeature::Generic->new(
       -start => 1,   # tried -300
       -end => 300, # and 0, but failed.
);

   my $track1 = $panel->add_track(
       $flen,
       -glyph   => 'arrow',
       -tick    => 2,
       -fgcolor => 'black',
       -double  => 1,
   );


my %nlist;

while ( <DATA> ) {
   chomp;
   next if /^\#/;
   my ($sqi,$pos,$str,$progname) = split /\,/;
   my $start = $pos + $nofseq{$sqi};
   my $end = $start + length($str) + 1;
   push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
}

# Check which sequence has no motifs;
my @bssi = keys %nlist;

my $lc = List::Compare->new(\@seqid, \@bssi);
my @comp = $lc->get_unique;


foreach my $comp ( @comp  ) {
   push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";

}

my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );

foreach my $seqid ( sort keys %nlist ) {


   my $track = $panel->add_track(
       -glyph     => 'graded_segments',
       -key       => "SEQ ". $seqid,
       -connector => "dashed"
       -label     => 1,
       -bgcolor   => 'blue',
               -bump      =>  +1,
               -height    =>  8,
       -min_score => 0,
       -max_score => 5000
   );


   foreach my $range ( @{$nlist{$seqid}} ) {

       my ($st,$en,$progname) = split(" ", $range);
       my $dname = " ";
       if ( $st != 0 and $en !=0  ) {
          $dname = "Seq ". $seqid;
       }

       my $score;
       if ( $progname eq "WEEDER" ) {
           $score = $prog_color{$progname};

       }
       elsif ($progname eq "MEME" ) {
           $score = $prog_color{$progname};
       }

       my $feature = Bio::SeqFeature::Generic->new(
           -display_name => $dname,
           -start        => $st,
           -end          => $en,
           -score        => $score
       );

       $track->add_feature($feature);

   }

}

print $panel->png;

#The DATA is simply just list of string and its location in their
respective sequence.
# The figure is just the plot of it out.
__DATA__
# sequence number,pos,binding sites,program
4,-63,AGCTTTCTCT,MEME
0,-22,AACTTTGTAC,WEEDER
1,-13,AAGTTTCTCT,WEEDER
5,-228,ACCTTTGCCA,MEME
5,-121,AAGTTTGTCT,WEEDER
5,-88,AAGTTTTTCC,SPACE
3,-148,AACTTAGTCA,MEME
0,-184,AACTTTGTCT,MEME
__END__


Thanks and hope to hear from you again.

--
Regards,

Edward WIJAYA

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From ioanniskirmitzoglou at gmail.com  Thu Apr 19 14:06:06 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Thu, 19 Apr 2007 17:06:06 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
Message-ID: <b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>

I have reported it as a bug on the bugzilla but due to bugzilla problems I
was not able to attach my code and/or sample m10 files.
Nevertheless here is the code that converts an m10 fasta output to an m8
BLAST output which is parseable by the vast majority of software.

<----------- CODE BEGINS HERE ------------------->

#!/usr/bin/perl -w

=head1 NAME

fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular output

=head1 SYNOPSIS

 fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...

=head1 DESCRIPTION

Command line options:
  --header                -- boolean flag to print column header
  -o/--out                -- optional outputfile to write data,
                             otherwise will write to STDOUT
  -h/--help               -- show this documentation

Not technically a SearchIO script as this doesn't use any Bioperl
components but is a useful and fast.  The output is tabular output
with the standard NCBI -m8 columns.

 queryname
 hit name
 percent identity
 alignment length
 number mismatches
 number gaps
 query start  (if on rev-strand start > end)
 query end
 hit start (if on rev-strand start > end)
 hit end
 evalue
 bit score

Additionally 4 more columns are provided
 percent similar
 query length
 hit length
 query gaps
 hit gaps

=head1 AUTHOR - Ioannis Kirmitzoglou

Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org

=head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou

Headers as well as portions of code were taken
from fastam9_to_table.pl by Jason Stajich

=head1 DISCLAIMER

Copyright (c) <2007> <Ioannis Kirmitzolgou>

Permission to use, copy, modify, merge, publish and distribute
this software and its documentation, with or without modification,
for any purpose, and without fee or royalty to the copyright holder(s)
is hereby granted with no restictions and/or prerequisites.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

=cut

use strict;
use Getopt::Long;

my %data=();

my $outfile=''; my $header='';
GetOptions(
    'header'              => \$header,
    'o|out|outfile:s'     => \$outfile,
    'h|help'              => sub { exec('perldoc',$0); exit; }
       );

my $outfh;
if( $outfile ) {
    open($outfh, ">$outfile") || die("$outfile: $!");
} else {
    $outfh = \*STDOUT;
}


$/="\n>>>";

my @fields = qw(qname hname percid alen mmcount gapcount
        qstart qend hstart hend evalue bits percsim qlen hlen qgap hgap);

print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)), "\n" if
$header;

while (<>) {

        chomp;
        if ($_=~/^>/ || $_=~/^\#/) {next;}
        my @hits = split(/\d+>>/, $_);
        @hits= split("\n>>", $hits[0]);

        my $hit = shift @hits;

        ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d+)/));

        foreach my $align (@hits) {

            my @details= split ("\n>", $align);
           my $detail = shift @details;
            ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
            $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
            $data{'bits'}=$1;
            $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
            $data{'evalue'}=$1;

            my $term = quotemeta("; sw_score");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'score'}=$1;

            $term = quotemeta("; sw_ident:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'percid'}=$1;

            $term = quotemeta("; sw_sim:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'percsim'}=$1;

            $term = quotemeta("; sw_overlap:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'alen'}=$1;

            $detail = shift @details;

            $term = quotemeta("; al_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'qstart'}=$1;

            $term = quotemeta("; al_stop:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'qend'}=$1;

            $term = quotemeta("; al_display_start:");
            $term =~ s/\\ /\\s/;
            my $lakis ='';
            $detail=~ /$term.+\s\-*([\-\w\s]+)/g;

            $data{'qgap'}=($1 =~ tr/\-//);

            $detail = shift @details;

            $term = quotemeta("; sq_len:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hlen'}=$1;

            $term = quotemeta("; al_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hstart'}=$1;

            $term = quotemeta("; al_stop:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term\s+(\S+)/;
            $data{'hend'}=$1;

            $term = quotemeta("; al_display_start:");
            $term =~ s/\\ /\\s/;
            $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
            $data{'hgap'}=($1 =~ tr/-//);
            $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
            $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
$data{'alen'}) + $data{'gapcount'});

for ( $data{'percid'}, $data{'percsim'} ) {
    $_ = sprintf("%.2f",$_*100);
}

            print $outfh join( "\t",map { $data{$_} } @fields),"\n"
        }

}

<----------------- CODE ENDS HERE ---------------------->

-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


From gilbertd at cricket.bio.indiana.edu  Thu Apr 19 17:38:05 2007
From: gilbertd at cricket.bio.indiana.edu (Don Gilbert)
Date: Thu, 19 Apr 2007 12:38:05 -0500 (EST)
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
Message-ID: <200704191738.l3JHc5s10658@cricket.bio.indiana.edu>


I'm not sure what kind of test data would have bad Target strings,
but this should clear up those warnings -- insert the '+' line:

  sub _gff3_string:
    for my $tag ( @all_tags ) {
       ##dgg.patch.was# next if $tag eq 'Target';
      if ($tag eq 'Target'
         and ! $origfeat->isa('Bio::SeqFeature::FeaturePair'))
       {  
       my($target_id, $b,$e,$strand)= $feat->get_tag_values($tag); 
+       next unless(defined($e) && defined($b) && $target_id);
       ($b,$e)= ($e,$b) if(defined $strand && $strand<0);
       $target_id =~ s/([\t\n\r%&\=;,])/sprintf("%%%X",ord($1))/ge;    
       push @groups, sprintf("Target=%s %d %d", $target_id,$b,$e);
       next;
       }

-- Don


From stefan.kirov at bms.com  Thu Apr 19 18:01:28 2007
From: stefan.kirov at bms.com (Stefan Kirov)
Date: Thu, 19 Apr 2007 14:01:28 -0400
Subject: [Bioperl-l] How to Create Sequence and TFBS Graph with Perl
In-Reply-To: <4626E1A3.4070405@i2r.a-star.edu.sg>
References: <462473B7.4070905@i2r.a-star.edu.sg> <4624D9F3.5050805@bms.com>
	<4626E1A3.4070405@i2r.a-star.edu.sg>
Message-ID: <4627AE78.200@bms.com>

I will see if I can post it or perhaps commit something to the bp 
scripts. In any case it won't be before Monday- I have deadlines to meet.
Stefan
Edward WIJAYA wrote:
>
> Hi Stefan,
>> I believe you can use Bio::Graphics for this. I have done so in the 
>> past and I find it quite straightforward.
> Do you still have that sample script? I don't find it simple to do.
> I was thinking of doing something like this:
>
> http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg 
>
>
> Appreciate if you can share it with us.
>
> -- 
> Edward
>
>
>>
>>
>> Edward WIJAYA wrote:
>>> Dear all,
>>>
>>> How do you usually construct a graph for TFBS (binding sites) position
>>> within their sequences? I was thinking to build something like this 
>>> kind of
>>> visualization tool:
>>>
>>> http://research.i2r.a-star.edu.sg/Dragon/Motif_Search/cgi-bin/tmp/29740M1.html 
>>>
>>>
>>> or
>>>
>>> http://wingless.cs.washington.edu:8080/assessment/servlet?filenameID=submission/SPACE.D9F26D506DE90E9A0A0010BB6BCCAEF3&pageType=visualizationForm&action=Visualize+It 
>>>
>>>
>>> Is there a BioPerl module to do that?
>>>
>>> -- 
>>> Edward
>>>
>>>
>>>
>>> ------------ Institute For Infocomm Research - Disclaimer -------------
>>> This email is confidential and may be privileged.  If you are not 
>>> the intended recipient, please delete it and notify us immediately. 
>>> Please do not copy or use it for any purpose, or disclose its 
>>> contents to any other person. Thank you.
>>> --------------------------------------------------------
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>   
>>
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the 
> intended recipient, please delete it and notify us immediately. Please 
> do not copy or use it for any purpose, or disclose its contents to any 
> other person. Thank you.
> --------------------------------------------------------
>


From shameer at ncbs.res.in  Fri Apr 20 11:45:23 2007
From: shameer at ncbs.res.in (Shameer Khadar)
Date: Fri, 20 Apr 2007 17:15:23 +0530 (IST)
Subject: [Bioperl-l] Protparam using BioPerl
In-Reply-To: <200704180718.48811.sdavis2@mail.nih.gov>
References: <4624E32A.6010704@bms.com>
	<36480.192.168.1.186.1176891367.squirrel@mail.ncbs.res.in>
	<200704180718.48811.sdavis2@mail.nih.gov>
Message-ID: <45682.192.168.1.1.1177069523.squirrel@mail.ncbs.res.in>

Hi,

I would like to know whether Bioperl have a wrapper for protparam from
Expasy.
I need to calculate Instability Index using Guruprasad et.al 1990 values
(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=PubMed&list_uids=2075190&dopt=Abstract)
for 100 sequences I did some googling and I didnt get any valid
information.

Thanks,
-- 
Shameer Khadar
Prof. R. Sowdhamini's Lab (# 25) The Computational Biology Group
National Centre for Biological Sciences (TIFR)
GKVK Campus, Bellary Road, Bangalore - 65, Karnataka - India
T - 91-080-23666001 EXT - 6251
W - http://www.ncbs.res.in


From basu at pharm.sunysb.edu  Fri Apr 20 16:37:57 2007
From: basu at pharm.sunysb.edu (Siddhartha Basu)
Date: Fri, 20 Apr 2007 12:37:57 -0400
Subject: [Bioperl-l] Bio::Graphics - Howto Show Negative Start-End and
 Enable Connector
In-Reply-To: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
References: <3ACF03E372996C4EACD542EA8A05E66A06168A@mailbe01.teak.local.net>
Message-ID: <4628EC65.7070505@pharm.sunysb.edu>

Hi,

Wijaya Edward wrote:
> Dear expert,
> 
> My figure here: http://defiant.i2r.a-star.edu.sg/~ewijaya/misc/foo2.png <http://defiant.i2r.a-star.edu.sg/%7Eewijaya/misc/foo2.png> 
> is created with the script (down below).
> 
> How can I modify the script such that:
> 
> 1. The arrow track is represented in negative form.
>    I.e. instead of 1 to 300, we use -300 to 0.
> 
> I tried this, but won't do:
> 
> my $flen = Bio::SeqFeature::Generic->new(
>        -start => -300,
>        -end => 0, );

It works if you pass the 'SeqFeature' object to the '-segment' option of 
  "Bio::Graphics::Panel".

  my $panel = Bio::Graphics::Panel->new(
    -length    => 300,
   -width     => 500,
   -pad_left  => 70,
   -pad_right => 70,
    -key_style => 'left',
   -connector => 'solid',
      -segment => $flen,
);

For more, read one of the previous posting,
http://article.gmane.org/gmane.comp.lang.perl.bio.general/1721/match=negative+seqfeature

-siddhartha

> 
> And how can I make these number to appear
> for every gridpoints (not just two as I have now).
> 
> 
> 2. How can I enable the connector with grid just like
>   I had in the first panel? (as you can see, my script
>   has connector added, but still doesn't show).
> 
> All, in all, I am trying to mimic this figure:
> http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg <http://nar.oxfordjournals.org/content/vol31/issue13/images/large/gkg56702.jpeg> 
> 
> And here is my script:
> 
> __BEGIN__
> #!/usr/bin/perl -w
> use strict;
> use Data::Dumper;
> use Bio::Graphics;
> use Bio::SeqFeature::Generic;
> use List::Compare;
> use List::Util qw(max);
> 
> my %nofseq = ( 0 => 300, 1 => 300, 2 => 300, 3 => 300, 4 => 300, 5 => 300 );
> my @seqid = keys %nofseq;
> my @lenlist = values %nofseq;
> my $maxlen = max (@lenlist);
> #print Dumper \@seqid ;
> 
> my $panel = Bio::Graphics::Panel->new(
>    -length    => 300,
>    -width     => 500,
>    -pad_left  => 70,
>    -pad_right => 70,
>    -key_style => 'left',
>    -connector => 'solid',
> );
> 
> my $flen = Bio::SeqFeature::Generic->new(
>        -start => 1,   # tried -300
>        -end => 300, # and 0, but failed.
> );
> 
>    my $track1 = $panel->add_track(
>        $flen,
>        -glyph   => 'arrow',
>        -tick    => 2,
>        -fgcolor => 'black',
>        -double  => 1,
>    );
> 
> 
> 
> my %nlist;
> 
> while ( <DATA> ) {
>    chomp;
>    next if /^\#/;
>    my ($sqi,$pos,$str,$progname) = split /\,/;
>    my $start = $pos + $nofseq{$sqi};
>    my $end = $start + length($str) + 1;
>    push @{$nlist{$sqi}}, $start." ".$end." ".$progname;
> }
> 
> # Check which sequence has no motifs;
> my @bssi = keys %nlist;
> 
> my $lc = List::Compare->new(\@seqid, \@bssi);
> my @comp = $lc->get_unique;
> 
> 
> foreach my $comp ( @comp  ) {
>    push @{$nlist{$comp}}, '0'." ".'0'." "."NONE";
> 
> }
> 
> my %prog_color = ( "WEEDER" => 3000, "MEME" => 200, "NONE" => 0 );
> 
> foreach my $seqid ( sort keys %nlist ) {
> 
> 
>    my $track = $panel->add_track(
>        -glyph     => 'graded_segments',
>        -key       => "SEQ ". $seqid,
>        -connector => "dashed"
>        -label     => 1,
>        -bgcolor   => 'blue',
>                -bump      =>  +1,
>                -height    =>  8,
>        -min_score => 0,
>        -max_score => 5000
>    );
> 
> 
>    foreach my $range ( @{$nlist{$seqid}} ) {
> 
>        my ($st,$en,$progname) = split(" ", $range);
>        my $dname = " ";
>        if ( $st != 0 and $en !=0  ) {
>           $dname = "Seq ". $seqid;
>        }
> 
>        my $score;
>        if ( $progname eq "WEEDER" ) {
>            $score = $prog_color{$progname};
> 
>        }
>        elsif ($progname eq "MEME" ) {
>            $score = $prog_color{$progname};
>        }
> 
>        my $feature = Bio::SeqFeature::Generic->new(
>            -display_name => $dname,
>            -start        => $st,
>            -end          => $en,
>            -score        => $score
>        );
> 
>        $track->add_feature($feature);
> 
>    }
> 
> }
> 
> print $panel->png;
> 
> #The DATA is simply just list of string and its location in their
> respective sequence.
> # The figure is just the plot of it out.
> __DATA__
> # sequence number,pos,binding sites,program
> 4,-63,AGCTTTCTCT,MEME
> 0,-22,AACTTTGTAC,WEEDER
> 1,-13,AAGTTTCTCT,WEEDER
> 5,-228,ACCTTTGCCA,MEME
> 5,-121,AAGTTTGTCT,WEEDER
> 5,-88,AAGTTTTTCC,SPACE
> 3,-148,AACTTAGTCA,MEME
> 0,-184,AACTTTGTCT,MEME
> __END__
> 
> 
> Thanks and hope to hear from you again.
> 
> --
> Regards,
> 
> Edward WIJAYA
> 
> ------------ Institute For Infocomm Research - Disclaimer -------------
> This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Fri Apr 20 19:47:30 2007
From: bosborne11 at verizon.net (Brian Osborne)
Date: Fri, 20 Apr 2007 15:47:30 -0400
Subject: [Bioperl-l] Small bug in Bio::Tools::GFF.pm - Target output
In-Reply-To: <200704191738.l3JHc5s10658@cricket.bio.indiana.edu>
Message-ID: <C24E9112.DD2B%bosborne11@verizon.net>

Applied.


On 4/19/07 1:38 PM, "Don Gilbert" <gilbertd at cricket.bio.indiana.edu> wrote:

> 
> I'm not sure what kind of test data would have bad Target strings,
> but this should clear up those warnings -- insert the '+' line:
> 
>   sub _gff3_string:
>     for my $tag ( @all_tags ) {
>        ##dgg.patch.was# next if $tag eq 'Target';
>       if ($tag eq 'Target'
>          and ! $origfeat->isa('Bio::SeqFeature::FeaturePair'))
>        {  
>        my($target_id, $b,$e,$strand)= $feat->get_tag_values($tag);
> +       next unless(defined($e) && defined($b) && $target_id);
>        ($b,$e)= ($e,$b) if(defined $strand && $strand<0);
>        $target_id =~ s/([\t\n\r%&\=;,])/sprintf("%%%X",ord($1))/ge;
>        push @groups, sprintf("Target=%s %d %d", $target_id,$b,$e);
>        next;
>        }
> 
> -- Don
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From ewijaya at i2r.a-star.edu.sg  Sat Apr 21 14:44:08 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sat, 21 Apr 2007 22:44:08 +0800
Subject: [Bioperl-l] Getting Gene Sequences with Bioperl
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A06168D@mailbe01.teak.local.net>


Hi all,
 
Is there a BioPerl module that allow us to extract
gene sequence given a list of gene names (gene symbol)?
 
In particular we would pass window size of the sequence,
then returning  upstream, downstream or ORF sequences for that list of genes.
We may also prespecify the on specific organism or all organsims.
 
Is there also a freely downloadable gene database that support
BioPerl module for that task?
 
Thanks and hope to hear from you again.
 
--
Edward WIJAYA
SINGAPORE

------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From hlapp at gmx.net  Sat Apr 21 17:14:10 2007
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 21 Apr 2007 13:14:10 -0400
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
Message-ID: <19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>

I haven't kept track of this - did this go anywhere? Do we not have  
an -m10 fasta output parser in SearchIO? (I.e., my first thought  
would be that that would be the desired solution; am I misled in this?)

	-hilmar

On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:

> I have reported it as a bug on the bugzilla but due to bugzilla  
> problems I
> was not able to attach my code and/or sample m10 files.
> Nevertheless here is the code that converts an m10 fasta output to  
> an m8
> BLAST output which is parseable by the vast majority of software.
>
> <----------- CODE BEGINS HERE ------------------->
>
> #!/usr/bin/perl -w
>
> =head1 NAME
>
> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular  
> output
>
> =head1 SYNOPSIS
>
>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>
> =head1 DESCRIPTION
>
> Command line options:
>   --header                -- boolean flag to print column header
>   -o/--out                -- optional outputfile to write data,
>                              otherwise will write to STDOUT
>   -h/--help               -- show this documentation
>
> Not technically a SearchIO script as this doesn't use any Bioperl
> components but is a useful and fast.  The output is tabular output
> with the standard NCBI -m8 columns.
>
>  queryname
>  hit name
>  percent identity
>  alignment length
>  number mismatches
>  number gaps
>  query start  (if on rev-strand start > end)
>  query end
>  hit start (if on rev-strand start > end)
>  hit end
>  evalue
>  bit score
>
> Additionally 4 more columns are provided
>  percent similar
>  query length
>  hit length
>  query gaps
>  hit gaps
>
> =head1 AUTHOR - Ioannis Kirmitzoglou
>
> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>
> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>
> Headers as well as portions of code were taken
>> from fastam9_to_table.pl by Jason Stajich
>
> =head1 DISCLAIMER
>
> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>
> Permission to use, copy, modify, merge, publish and distribute
> this software and its documentation, with or without modification,
> for any purpose, and without fee or royalty to the copyright holder(s)
> is hereby granted with no restictions and/or prerequisites.
>
> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>
> =cut
>
> use strict;
> use Getopt::Long;
>
> my %data=();
>
> my $outfile=''; my $header='';
> GetOptions(
>     'header'              => \$header,
>     'o|out|outfile:s'     => \$outfile,
>     'h|help'              => sub { exec('perldoc',$0); exit; }
>        );
>
> my $outfh;
> if( $outfile ) {
>     open($outfh, ">$outfile") || die("$outfile: $!");
> } else {
>     $outfh = \*STDOUT;
> }
>
>
> $/="\n>>>";
>
> my @fields = qw(qname hname percid alen mmcount gapcount
>         qstart qend hstart hend evalue bits percsim qlen hlen qgap  
> hgap);
>
> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),  
> "\n" if
> $header;
>
> while (<>) {
>
>         chomp;
>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>         my @hits = split(/\d+>>/, $_);
>         @hits= split("\n>>", $hits[0]);
>
>         my $hit = shift @hits;
>
>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d 
> +)/));
>
>         foreach my $align (@hits) {
>
>             my @details= split ("\n>", $align);
>            my $detail = shift @details;
>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>             $data{'bits'}=$1;
>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>             $data{'evalue'}=$1;
>
>             my $term = quotemeta("; sw_score");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'score'}=$1;
>
>             $term = quotemeta("; sw_ident:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percid'}=$1;
>
>             $term = quotemeta("; sw_sim:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'percsim'}=$1;
>
>             $term = quotemeta("; sw_overlap:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'alen'}=$1;
>
>             $detail = shift @details;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'qend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             my $lakis ='';
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>
>             $data{'qgap'}=($1 =~ tr/\-//);
>
>             $detail = shift @details;
>
>             $term = quotemeta("; sq_len:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hlen'}=$1;
>
>             $term = quotemeta("; al_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hstart'}=$1;
>
>             $term = quotemeta("; al_stop:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term\s+(\S+)/;
>             $data{'hend'}=$1;
>
>             $term = quotemeta("; al_display_start:");
>             $term =~ s/\\ /\\s/;
>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>             $data{'hgap'}=($1 =~ tr/-//);
>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>             $data{'mmcount'} = $data{'alen'} - ( int($data{'percid'} *
> $data{'alen'}) + $data{'gapcount'});
>
> for ( $data{'percid'}, $data{'percsim'} ) {
>     $_ = sprintf("%.2f",$_*100);
> }
>
>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>         }
>
> }
>
> <----------------- CODE ENDS HERE ---------------------->
>
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From jason at bioperl.org  Sat Apr 21 17:44:00 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 21 Apr 2007 10:44:00 -0700
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
Message-ID: <E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>

We don't have one yet. This is a new format introduced in the most  
recent release of FASTA.  Hopefully someone can make some time to add  
some code to SearchIO::fasta for it.

I do find that I when I need a fast FASTA to TAB converter that the  
simple script (fastam9_to_table) is more efficient that SearchIO  
framework so Ioannis is making a parallel one for the new m10  
output.  So I think having both is useful.

-jason
On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:

> I haven't kept track of this - did this go anywhere? Do we not have
> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> would be that that would be the desired solution; am I misled in  
> this?)
>
> 	-hilmar
>
> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>
>> I have reported it as a bug on the bugzilla but due to bugzilla
>> problems I
>> was not able to attach my code and/or sample m10 files.
>> Nevertheless here is the code that converts an m10 fasta output to
>> an m8
>> BLAST output which is parseable by the vast majority of software.
>>
>> <----------- CODE BEGINS HERE ------------------->
>>
>> #!/usr/bin/perl -w
>>
>> =head1 NAME
>>
>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
>> output
>>
>> =head1 SYNOPSIS
>>
>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>
>> =head1 DESCRIPTION
>>
>> Command line options:
>>   --header                -- boolean flag to print column header
>>   -o/--out                -- optional outputfile to write data,
>>                              otherwise will write to STDOUT
>>   -h/--help               -- show this documentation
>>
>> Not technically a SearchIO script as this doesn't use any Bioperl
>> components but is a useful and fast.  The output is tabular output
>> with the standard NCBI -m8 columns.
>>
>>  queryname
>>  hit name
>>  percent identity
>>  alignment length
>>  number mismatches
>>  number gaps
>>  query start  (if on rev-strand start > end)
>>  query end
>>  hit start (if on rev-strand start > end)
>>  hit end
>>  evalue
>>  bit score
>>
>> Additionally 4 more columns are provided
>>  percent similar
>>  query length
>>  hit length
>>  query gaps
>>  hit gaps
>>
>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>
>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>
>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>
>> Headers as well as portions of code were taken
>>> from fastam9_to_table.pl by Jason Stajich
>>
>> =head1 DISCLAIMER
>>
>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>
>> Permission to use, copy, modify, merge, publish and distribute
>> this software and its documentation, with or without modification,
>> for any purpose, and without fee or royalty to the copyright holder 
>> (s)
>> is hereby granted with no restictions and/or prerequisites.
>>
>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND  
>> NONINFRINGEMENT.
>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>
>> =cut
>>
>> use strict;
>> use Getopt::Long;
>>
>> my %data=();
>>
>> my $outfile=''; my $header='';
>> GetOptions(
>>     'header'              => \$header,
>>     'o|out|outfile:s'     => \$outfile,
>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>>        );
>>
>> my $outfh;
>> if( $outfile ) {
>>     open($outfh, ">$outfile") || die("$outfile: $!");
>> } else {
>>     $outfh = \*STDOUT;
>> }
>>
>>
>> $/="\n>>>";
>>
>> my @fields = qw(qname hname percid alen mmcount gapcount
>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
>> hgap);
>>
>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>> "\n" if
>> $header;
>>
>> while (<>) {
>>
>>         chomp;
>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>>         my @hits = split(/\d+>>/, $_);
>>         @hits= split("\n>>", $hits[0]);
>>
>>         my $hit = shift @hits;
>>
>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> +)/));
>>
>>         foreach my $align (@hits) {
>>
>>             my @details= split ("\n>", $align);
>>            my $detail = shift @details;
>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>>             $data{'bits'}=$1;
>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>>             $data{'evalue'}=$1;
>>
>>             my $term = quotemeta("; sw_score");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'score'}=$1;
>>
>>             $term = quotemeta("; sw_ident:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percid'}=$1;
>>
>>             $term = quotemeta("; sw_sim:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'percsim'}=$1;
>>
>>             $term = quotemeta("; sw_overlap:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'alen'}=$1;
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'qend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             my $lakis ='';
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>
>>             $data{'qgap'}=($1 =~ tr/\-//);
>>
>>             $detail = shift @details;
>>
>>             $term = quotemeta("; sq_len:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hlen'}=$1;
>>
>>             $term = quotemeta("; al_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hstart'}=$1;
>>
>>             $term = quotemeta("; al_stop:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term\s+(\S+)/;
>>             $data{'hend'}=$1;
>>
>>             $term = quotemeta("; al_display_start:");
>>             $term =~ s/\\ /\\s/;
>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>             $data{'hgap'}=($1 =~ tr/-//);
>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>>             $data{'mmcount'} = $data{'alen'} - ( int($data 
>> {'percid'} *
>> $data{'alen'}) + $data{'gapcount'});
>>
>> for ( $data{'percid'}, $data{'percsim'} ) {
>>     $_ = sprintf("%.2f",$_*100);
>> }
>>
>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>>         }
>>
>> }
>>
>> <----------------- CODE ENDS HERE ---------------------->
>>
>> -- 
>>
>> *Ioannis Kirmitzoglou*, MSc
>> PhD. Student,
>> Bioinformatics Research Laboratory
>> Department of Biological Sciences
>> University of Cyprus
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From akozik at atgc.org  Sat Apr 21 17:40:47 2007
From: akozik at atgc.org (Alexander Kozik)
Date: Sat, 21 Apr 2007 10:40:47 -0700
Subject: [Bioperl-l] ncbi blast -V T option
Message-ID: <462A4C9F.8010902@atgc.org>

Hi all,

It was many postings about parsing problems of stand-alone (local) NCBI 
Blast output of version 2.2.15 or later. Recently, I (re?)-discovered 
that Blast option '-V T' fixes the problem with old parsers I have. 
Option '-V T' generates detailed statistics after _each_ query sequence 
in Blast output, like:
... ... ...
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Hits to DB: 17,650,109
Number of Sequences: 26534
Number of extensions: 430364
Number of successful extensions: 1496
Number of sequences better than 1.0e-020: 1
Number of HSP's better than  0.0 without gapping: 1400
Number of HSP's successfully gapped in prelim test: 0
Number of HSP's that attempted gapping in prelim test: 0
Number of HSP's gapped (non-prelim): 1495
length of database: 11,047,616
effective HSP length: 96
effective length of database: 8,500,352
effective search space used: 1275052800
frameshift window, decay const: 40,  0.1
... ... ...

Option '-V F' (default) will generate statistics at the end of batch 
Blast output summarizing all query hits together.

Did I miss something from previous postings?
Sorry, if it was already discussed.

-Alex

-- 
Alexander Kozik
Bioinformatics Specialist
Genome and Biomedical Sciences Facility
451 East Health Sciences Drive
University of California
Davis, CA 95616-8816
Phone: (530) 754-9127
email#1: akozik at atgc.org
email#2: akozik at gmail.com
web: http://www.atgc.org/


From gdorjee at hotmail.com  Sat Apr 21 19:14:05 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Sat, 21 Apr 2007 12:14:05 -0700 (PDT)
Subject: [Bioperl-l] error while remote blast against swissprot db
In-Reply-To: <54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>
References: <9997343.post@talk.nabble.com>
	<ABDA8ACF-BE80-494C-882F-3C2C22A5FE0E@wustl.edu>
	<bba689ec0704150912k7f56f252xc4bbc36bd6b989df@mail.gmail.com>
	<10008507.post@talk.nabble.com>
	<2A974EF5-F46F-46A7-8B0C-A6C0FCF4AE70@wustl.edu>
	<5685E65E-BA7E-40B3-873D-7B882D6EB24F@uiuc.edu>
	<10022463.post@talk.nabble.com>
	<5E36D7FB-5BA1-4D7E-88E3-D64A7EB9A6B1@uiuc.edu>
	<10024333.post@talk.nabble.com>
	<54A71CCC-F75A-4A40-92C9-B7F84FA9B9E5@uiuc.edu>
Message-ID: <10120148.post@talk.nabble.com>


hi
how do i check to see if i've installed the bioperl on my system properly. i
think i installed the bioperl-1.5.2_101 version, but i can't say for sure.
althought i can use some of the modules like Bio::SearchIO and
Bio::SearchIO, i can't seem to get the remote blast working for some reason.
is this something to do with the bioperl installation? i'm using perl v5.6.1
built for sun4-solaris-64int. 
i tried to install the same bioperl version on my Linux machine which has
perl v5.8.5 built for i386-linux-thread-multi, and it seem to give me the
same problem with the remote blast.
your help would be much appreciated.
thanks


Chris Fields wrote:
> 
> What version of bioperl are you using?  I get an error but it is b/c  
> the ID doesn't exist.
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: acc KPYK_ECOLI does not exist
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /Users/cjfields/src/bioperl-live/Bio/ 
> Root/Root.pm:359
> STACK: Bio::DB::WebDBSeqI::get_Seq_by_acc /Users/cjfields/src/bioperl- 
> live/Bio/DB/WebDBSeqI.pm:181
> STACK: genpept.pl:21
> -----------------------------------------------------------
> 
> The actual accession is 'KPYK1_ECOLI'.
> 
> chris
> 
> On Apr 16, 2007, at 3:42 PM, DeeGee wrote:
> 
>>
>> hi
>> i tried the following code just to check the network, and it worked  
>> fine
>> except for the SwissProt part, for which i got the error message  
>> instead of
>> the sequence:
>>
>> ------------- EXCEPTION  -------------
>> MSG: swissprot stream with no ID. Not swissprot in my book
>> STACK Bio::SeqIO::swiss::next_seq
>> /usr/perl5/5.6.1/lib/Bio/SeqIO/swiss.pm:179
>> STACK Bio::DB::WebDBSeqI::get_Seq_by_acc
>> /usr/perl5/5.6.1/lib/Bio/DB/WebDBSeqI.pm:187
>> STACK toplevel bbbbb.pl:21
>> --------------------------------------
>>
>> #### check #####
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::GenBank;
>> use Bio::DB::SwissProt;
>> use Bio::DB::GenPept;
>> use Bio::SeqIO;
>>
>> my $genpeptdb = new Bio::DB::GenPept();
>> my $genbankdb = new Bio::DB::GenBank();
>> my $swissdb = new Bio::DB::SwissProt();
>>
>> my $seqio = new Bio::SeqIO(-format => 'fasta',
>>                            -fh     => \*STDOUT);
>>
>> my $protseq = $genpeptdb->get_Seq_by_acc('O26717');
>> $seqio->write_seq($protseq);
>>
>> my $seq = $genbankdb->get_Seq_by_acc('AF303112');
>> $seqio->write_seq($seq);
>>
>> $protseq = $swissdb->get_Seq_by_acc('KPY1_ECOLI');
>> $seqio->write_seq($protseq);
>>
>> thanks a lot.
>>
>>
>> Chris Fields wrote:
>>>
>>> The 'verbose' setting doesn't change the way the BLAST query is sent,
>>> it just sends the raw output from the repeated attempts to retrieve
>>> the report (using the RID) to STDERR.  The error you saw won't be
>>> fixed by doing so.
>>>
>>> What I was interested in was the raw HTML output dumped to the
>>> screen.  If it is querying the NCBI server it should dump stuff that
>>> includes something like this:
>>>
>>> ...
>>> <HTML>
>>> <p></p>
>>> <!--
>>> QBlastInfoBegin
>>>          Status=WAITING
>>> QBlastInfoEnd
>>> --><p></p>
>>> <SCRIPT LANGUAGE="JavaScript"><!--
>>> ...
>>>
>>> which indicates you have a request in the BLAST queue.  If you aren't
>>> seeing anything then the problem is likely network-related on your
>>> end, so getting the latest RemoteBlast won't help.  Do any other
>>> BioPerl modules requiring network access work (Bio::DB::GenBank, for
>>> instance)?  If not it could be a proxy issue...
>>>
>>> Just in case, here's the browsable CVS location for RemoteBlast:
>>>
>>> http://cvs.bioperl.org/cgi-bin/viewcvs/viewcvs.cgi/bioperl-live/Bio/
>>> Tools/Run/RemoteBlast.pm?cvsroot=bioperl
>>>
>>> Click on the download link and save over your local version.
>>>
>>> chris
>>>
>>> On Apr 16, 2007, at 2:10 PM, DeeGee wrote:
>>>
>>>>
>>>> hi Chris,
>>>> thanks for your reply. i set the RemoteBlast factory to a verbosity
>>>> of 1,
>>>> and i get the same error message. i'm new to all these. so, could
>>>> you plz
>>>> tell me how can i do the RemoteBlast in CVS that you've suggested.
>>>>
>>>> cheers!!!
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>
>> -- 
>> View this message in context: http://www.nabble.com/error-while- 
>> remote-blast-against-swissprot-db-tf3577674.html#a10024333
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 

-- 
View this message in context: http://www.nabble.com/error-while-remote-blast-against-swissprot-db-tf3577674.html#a10120148
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Sat Apr 21 20:09:48 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sat, 21 Apr 2007 15:09:48 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
Message-ID: <A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>

Ioannis's fastm10_to_table script is available in the bugzilla  
enhancement request (as an attachment) if anyone's interested:

http://bugzilla.open-bio.org/show_bug.cgi?id=2278

I haven't had a chance to really look into m10 output yet but it  
looks easy enough to parse; may not be hard to get something SearchIO- 
based up and running.

chris

On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:

> We don't have one yet. This is a new format introduced in the most
> recent release of FASTA.  Hopefully someone can make some time to add
> some code to SearchIO::fasta for it.
>
> I do find that I when I need a fast FASTA to TAB converter that the
> simple script (fastam9_to_table) is more efficient that SearchIO
> framework so Ioannis is making a parallel one for the new m10
> output.  So I think having both is useful.
>
> -jason
> On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
>
>> I haven't kept track of this - did this go anywhere? Do we not have
>> an -m10 fasta output parser in SearchIO? (I.e., my first thought
>> would be that that would be the desired solution; am I misled in
>> this?)
>>
>> 	-hilmar
>>
>> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>>
>>> I have reported it as a bug on the bugzilla but due to bugzilla
>>> problems I
>>> was not able to attach my code and/or sample m10 files.
>>> Nevertheless here is the code that converts an m10 fasta output to
>>> an m8
>>> BLAST output which is parseable by the vast majority of software.
>>>
>>> <----------- CODE BEGINS HERE ------------------->
>>>
>>> #!/usr/bin/perl -w
>>>
>>> =head1 NAME
>>>
>>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
>>> output
>>>
>>> =head1 SYNOPSIS
>>>
>>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
>>>
>>> =head1 DESCRIPTION
>>>
>>> Command line options:
>>>   --header                -- boolean flag to print column header
>>>   -o/--out                -- optional outputfile to write data,
>>>                              otherwise will write to STDOUT
>>>   -h/--help               -- show this documentation
>>>
>>> Not technically a SearchIO script as this doesn't use any Bioperl
>>> components but is a useful and fast.  The output is tabular output
>>> with the standard NCBI -m8 columns.
>>>
>>>  queryname
>>>  hit name
>>>  percent identity
>>>  alignment length
>>>  number mismatches
>>>  number gaps
>>>  query start  (if on rev-strand start > end)
>>>  query end
>>>  hit start (if on rev-strand start > end)
>>>  hit end
>>>  evalue
>>>  bit score
>>>
>>> Additionally 4 more columns are provided
>>>  percent similar
>>>  query length
>>>  hit length
>>>  query gaps
>>>  hit gaps
>>>
>>> =head1 AUTHOR - Ioannis Kirmitzoglou
>>>
>>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>>>
>>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>>>
>>> Headers as well as portions of code were taken
>>>> from fastam9_to_table.pl by Jason Stajich
>>>
>>> =head1 DISCLAIMER
>>>
>>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>>>
>>> Permission to use, copy, modify, merge, publish and distribute
>>> this software and its documentation, with or without modification,
>>> for any purpose, and without fee or royalty to the copyright holder
>>> (s)
>>> is hereby granted with no restictions and/or prerequisites.
>>>
>>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
>>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>>> NONINFRINGEMENT.
>>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
>>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
>>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>>>
>>> =cut
>>>
>>> use strict;
>>> use Getopt::Long;
>>>
>>> my %data=();
>>>
>>> my $outfile=''; my $header='';
>>> GetOptions(
>>>     'header'              => \$header,
>>>     'o|out|outfile:s'     => \$outfile,
>>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>>>        );
>>>
>>> my $outfh;
>>> if( $outfile ) {
>>>     open($outfh, ">$outfile") || die("$outfile: $!");
>>> } else {
>>>     $outfh = \*STDOUT;
>>> }
>>>
>>>
>>> $/="\n>>>";
>>>
>>> my @fields = qw(qname hname percid alen mmcount gapcount
>>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
>>> hgap);
>>>
>>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
>>> "\n" if
>>> $header;
>>>
>>> while (<>) {
>>>
>>>         chomp;
>>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>>>         my @hits = split(/\d+>>/, $_);
>>>         @hits= split("\n>>", $hits[0]);
>>>
>>>         my $hit = shift @hits;
>>>
>>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>>> +)/));
>>>
>>>         foreach my $align (@hits) {
>>>
>>>             my @details= split ("\n>", $align);
>>>            my $detail = shift @details;
>>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>>>             $data{'bits'}=$1;
>>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>>>             $data{'evalue'}=$1;
>>>
>>>             my $term = quotemeta("; sw_score");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'score'}=$1;
>>>
>>>             $term = quotemeta("; sw_ident:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'percid'}=$1;
>>>
>>>             $term = quotemeta("; sw_sim:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'percsim'}=$1;
>>>
>>>             $term = quotemeta("; sw_overlap:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'alen'}=$1;
>>>
>>>             $detail = shift @details;
>>>
>>>             $term = quotemeta("; al_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'qstart'}=$1;
>>>
>>>             $term = quotemeta("; al_stop:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'qend'}=$1;
>>>
>>>             $term = quotemeta("; al_display_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             my $lakis ='';
>>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>>
>>>             $data{'qgap'}=($1 =~ tr/\-//);
>>>
>>>             $detail = shift @details;
>>>
>>>             $term = quotemeta("; sq_len:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hlen'}=$1;
>>>
>>>             $term = quotemeta("; al_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hstart'}=$1;
>>>
>>>             $term = quotemeta("; al_stop:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term\s+(\S+)/;
>>>             $data{'hend'}=$1;
>>>
>>>             $term = quotemeta("; al_display_start:");
>>>             $term =~ s/\\ /\\s/;
>>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>>>             $data{'hgap'}=($1 =~ tr/-//);
>>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>>>             $data{'mmcount'} = $data{'alen'} - ( int($data
>>> {'percid'} *
>>> $data{'alen'}) + $data{'gapcount'});
>>>
>>> for ( $data{'percid'}, $data{'percsim'} ) {
>>>     $_ = sprintf("%.2f",$_*100);
>>> }
>>>
>>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
>>>         }
>>>
>>> }
>>>
>>> <----------------- CODE ENDS HERE ---------------------->
>>>
>>> -- 
>>>
>>> *Ioannis Kirmitzoglou*, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From ewijaya at i2r.a-star.edu.sg  Sun Apr 22 11:59:28 2007
From: ewijaya at i2r.a-star.edu.sg (Wijaya Edward)
Date: Sun, 22 Apr 2007 19:59:28 +0800
Subject: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with	
	Perl
References: <3ACF03E372996C4EACD542EA8A05E66A061682@mailbe01.teak.local.net>
	<bba689ec0704160810y63a754c4g68544923ce4fd244@mail.gmail.com><3ACF03E372996
	C4EACD542EA8A05E66A061684@mailbe01.teak.local.net><AAF82F3A-3C75-4D51-AFD4-
	FDE358391A03@fruitfly.org>
Message-ID: <3ACF03E372996C4EACD542EA8A05E66A061690@mailbe01.teak.local.net>


Hi Chris,
 
I've downloaded GO Database.
Which of these we should install in our MySQL database,
so that it can be used for GO::AppHandle task below?
 
-rw-rw-r--   1 ewijaya ewijaya 1.6G Apr  9 12:23 go_200704-assocdb-data
-rw-rw-r--   1 ewijaya ewijaya 483M Apr  9 12:23 go_200704-assocdb.rdf-xml
-rw-rw-r--   1 ewijaya ewijaya 3.2K Apr  9 12:23 go_200704-assocdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  7 00:41 go_200704-assocdb-tables
-rw-rw-r--   1 ewijaya ewijaya 3.3K Apr  9 12:23 go_200704-obo-xml.dtd
-rw-rw-r--   1 ewijaya ewijaya 4.5K Apr  9 12:23 go_200704-rdf.dtd
-rw-rw-r--   1 ewijaya ewijaya  29K Apr  9 12:23 go_200704-schema-mysql.sql
-rw-rw-r--   1 ewijaya ewijaya 3.1G Apr  9 12:25 go_200704-seqdb-data
-rw-rw-r--   1 ewijaya ewijaya  93M Apr  9 12:26 go_200704-seqdb.fasta
-rw-rw-r--   1 ewijaya ewijaya 3.2K Apr  9 12:25 go_200704-seqdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  8 05:38 go_200704-seqdb-tables
-rw-rw-r--   1 ewijaya ewijaya  51M Apr  9 12:26 go_200704-termdb-data
-rw-rw-r--   1 ewijaya ewijaya  18M Apr  9 12:26 go_200704-termdb.obo-xml
-rw-rw-r--   1 ewijaya ewijaya  39M Apr  9 12:26 go_200704-termdb.owl
-rw-rw-r--   1 ewijaya ewijaya  29M Apr  9 12:26 go_200704-termdb.rdf-xml
-rw-rw-r--   1 ewijaya ewijaya  749 Apr  9 12:26 go_200704-termdb-summary.txt
drwxrwxr-x   2 ewijaya ewijaya 4.0K Apr  2 00:31 go_200704-termdb-tables
drwxrwxr-x  22 ewijaya ewijaya 4.0K Apr  1 23:35 go_200704-utilities-src

Or is there a way we can upload all of them automatically to mysql database?
Thanks and hope to hear from you again.
 
--
Edward
 

________________________________

From: Chris Mungall [mailto:cjm at fruitfly.org]
Sent: Tue 4/17/2007 2:49 AM
To: Wijaya Edward
Cc: spiros at lokku.com; bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) with Perl


Download:
http://search.cpan.org/~cmungall/go-db-perl

or do:

cpan GO::AppHandle

The API call you want is here:
http://search.cpan.org/~cmungall/go-db-perl/GO/
AppHandle.pm#get_deep_products

Here is an example snippet:

   use GO::AppHandle;
   my $apph=GO::AppHandle->connect(@ARGV);
   my $go_acc = shift @ARGV;
   my $gps = $apph->get_deep_products({term=>{acc=>$go_acc}});
   foreach my $gp (@$gps) {
     printf "%s %s\n", $gp->xref->acc, $gp->symbol;
   }

You will need to download the GO Database.

Cheers
Chris

On Apr 16, 2007, at 8:14 AM, Wijaya Edward wrote:

>
> Hi Spiros,
>
> Thanks for your reply. I am interested to apply it for
> all the kind of organisms related to that particular GO ID.
>
> Do you have a CPAN module for that?
> --
> Edward WIJAYA
> SINGAPORE
>
> ________________________________
>
> From: s.denaxas at gmail.com on behalf of Spiros Denaxas
> Sent: Mon 4/16/2007 11:10 PM
> To: Wijaya Edward
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Extracting Gene Names Genome Ontology (GO) 
> with Perl
>
>
>
> Hi Edward,
>
> What organism are you interested in? I have some code from my PhD
> based on the Saccharomyces cerevisiae genome. Basically uses the SGD
> flat files and a local MySQL instance of GO. Might be worth turning
> into modules if people are interested in it, although it is pretty
> organism oriented and the lack of abstraction might introduce a number
> of problems.
>
> Spiros
>
> On 4/16/07, Wijaya Edward <ewijaya at i2r.a-star.edu.sg> wrote:
>>
>> Dear all,
>>
>> Given a GO id, is there a way to extract all
>> the related gene names from that id with Perl?
>>
>> Anybody has experience with that?
>> I've looked through GO module in CPAN, but can't seem
>> to find any tool that facilitated that searc
>>
>> Look forward very much for your advice.
>>
>> --
>> Edward WIJAYA
>> SINGAPORE
>>
>> ------------ Institute For Infocomm Research - Disclaimer 
>> -------------
>> This email is confidential and may be privileged.  If you are not 
>> the intended recipient, please delete it and notify us 
>> immediately. Please do not copy or use it for any purpose, or 
>> disclose its contents to any other person. Thank you.
>> --------------------------------------------------------
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>
>
> ------------ Institute For Infocomm Research - Disclaimer 
> -------------
> This email is confidential and may be privileged.  If you are not 
> the intended recipient, please delete it and notify us immediately. 
> Please do not copy or use it for any purpose, or disclose its 
> contents to any other person. Thank you.
> --------------------------------------------------------
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


------------ Institute For Infocomm Research - Disclaimer -------------
This email is confidential and may be privileged.  If you are not the intended recipient, please delete it and notify us immediately. Please do not copy or use it for any purpose, or disclose its contents to any other person. Thank you.
--------------------------------------------------------


From ioanniskirmitzoglou at gmail.com  Sun Apr 22 17:11:35 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Sun, 22 Apr 2007 20:11:35 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
Message-ID: <b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>

I agree with Jason. Both scripts (fastam9_to_table and fastam10_to_table)
are way faster and easier to use than the searchIO. Still, there are a lot
of cases where searchIO support for m10 would be useful (e.g when trying to
represent the alignment in a graphical way).
Nevertheless I do think that FASTA needs an output similar to the BLAST m8
one which is really compact. Although I haven't tried it yet I do believe
that both scripts can be piped, so one easy and rather fast way to produce
an tabular output from FASTA would be to pipe its output directly to one of
the scripts.
-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


On 21/04/07, Chris Fields <cjfields at uiuc.edu> wrote:
>
> Ioannis's fastm10_to_table script is available in the bugzilla
> enhancement request (as an attachment) if anyone's interested:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2278
>
> I haven't had a chance to really look into m10 output yet but it
> looks easy enough to parse; may not be hard to get something SearchIO-
> based up and running.
>
> chris
>
> On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:
>
> > We don't have one yet. This is a new format introduced in the most
> > recent release of FASTA.  Hopefully someone can make some time to add
> > some code to SearchIO::fasta for it.
> >
> > I do find that I when I need a fast FASTA to TAB converter that the
> > simple script (fastam9_to_table) is more efficient that SearchIO
> > framework so Ioannis is making a parallel one for the new m10
> > output.  So I think having both is useful.
> >
> > -jason
> > On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
> >
> >> I haven't kept track of this - did this go anywhere? Do we not have
> >> an -m10 fasta output parser in SearchIO? (I.e., my first thought
> >> would be that that would be the desired solution; am I misled in
> >> this?)
> >>
> >>      -hilmar
> >>
> >> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
> >>
> >>> I have reported it as a bug on the bugzilla but due to bugzilla
> >>> problems I
> >>> was not able to attach my code and/or sample m10 files.
> >>> Nevertheless here is the code that converts an m10 fasta output to
> >>> an m8
> >>> BLAST output which is parseable by the vast majority of software.
> >>>
> >>> <----------- CODE BEGINS HERE ------------------->
> >>>
> >>> #!/usr/bin/perl -w
> >>>
> >>> =head1 NAME
> >>>
> >>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8 tabular
> >>> output
> >>>
> >>> =head1 SYNOPSIS
> >>>
> >>>  fastam10_to_table [--header] [-o outfile] inputfile1 inputfile2 ...
> >>>
> >>> =head1 DESCRIPTION
> >>>
> >>> Command line options:
> >>>   --header                -- boolean flag to print column header
> >>>   -o/--out                -- optional outputfile to write data,
> >>>                              otherwise will write to STDOUT
> >>>   -h/--help               -- show this documentation
> >>>
> >>> Not technically a SearchIO script as this doesn't use any Bioperl
> >>> components but is a useful and fast.  The output is tabular output
> >>> with the standard NCBI -m8 columns.
> >>>
> >>>  queryname
> >>>  hit name
> >>>  percent identity
> >>>  alignment length
> >>>  number mismatches
> >>>  number gaps
> >>>  query start  (if on rev-strand start > end)
> >>>  query end
> >>>  hit start (if on rev-strand start > end)
> >>>  hit end
> >>>  evalue
> >>>  bit score
> >>>
> >>> Additionally 4 more columns are provided
> >>>  percent similar
> >>>  query length
> >>>  hit length
> >>>  query gaps
> >>>  hit gaps
> >>>
> >>> =head1 AUTHOR - Ioannis Kirmitzoglou
> >>>
> >>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
> >>>
> >>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
> >>>
> >>> Headers as well as portions of code were taken
> >>>> from fastam9_to_table.pl by Jason Stajich
> >>>
> >>> =head1 DISCLAIMER
> >>>
> >>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
> >>>
> >>> Permission to use, copy, modify, merge, publish and distribute
> >>> this software and its documentation, with or without modification,
> >>> for any purpose, and without fee or royalty to the copyright holder
> >>> (s)
> >>> is hereby granted with no restictions and/or prerequisites.
> >>>
> >>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> >>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> >>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
> >>> NONINFRINGEMENT.
> >>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
> >>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
> >>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
> >>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
> >>>
> >>> =cut
> >>>
> >>> use strict;
> >>> use Getopt::Long;
> >>>
> >>> my %data=();
> >>>
> >>> my $outfile=''; my $header='';
> >>> GetOptions(
> >>>     'header'              => \$header,
> >>>     'o|out|outfile:s'     => \$outfile,
> >>>     'h|help'              => sub { exec('perldoc',$0); exit; }
> >>>        );
> >>>
> >>> my $outfh;
> >>> if( $outfile ) {
> >>>     open($outfh, ">$outfile") || die("$outfile: $!");
> >>> } else {
> >>>     $outfh = \*STDOUT;
> >>> }
> >>>
> >>>
> >>> $/="\n>>>";
> >>>
> >>> my @fields = qw(qname hname percid alen mmcount gapcount
> >>>         qstart qend hstart hend evalue bits percsim qlen hlen qgap
> >>> hgap);
> >>>
> >>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) } @fields)),
> >>> "\n" if
> >>> $header;
> >>>
> >>> while (<>) {
> >>>
> >>>         chomp;
> >>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
> >>>         my @hits = split(/\d+>>/, $_);
> >>>         @hits= split("\n>>", $hits[0]);
> >>>
> >>>         my $hit = shift @hits;
> >>>
> >>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
> >>> +)/));
> >>>
> >>>         foreach my $align (@hits) {
> >>>
> >>>             my @details= split ("\n>", $align);
> >>>            my $detail = shift @details;
> >>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
> >>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
> >>>             $data{'bits'}=$1;
> >>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
> >>>             $data{'evalue'}=$1;
> >>>
> >>>             my $term = quotemeta("; sw_score");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'score'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_ident:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'percid'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_sim:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'percsim'}=$1;
> >>>
> >>>             $term = quotemeta("; sw_overlap:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'alen'}=$1;
> >>>
> >>>             $detail = shift @details;
> >>>
> >>>             $term = quotemeta("; al_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'qstart'}=$1;
> >>>
> >>>             $term = quotemeta("; al_stop:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'qend'}=$1;
> >>>
> >>>             $term = quotemeta("; al_display_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             my $lakis ='';
> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
> >>>
> >>>             $data{'qgap'}=($1 =~ tr/\-//);
> >>>
> >>>             $detail = shift @details;
> >>>
> >>>             $term = quotemeta("; sq_len:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hlen'}=$1;
> >>>
> >>>             $term = quotemeta("; al_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hstart'}=$1;
> >>>
> >>>             $term = quotemeta("; al_stop:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term\s+(\S+)/;
> >>>             $data{'hend'}=$1;
> >>>
> >>>             $term = quotemeta("; al_display_start:");
> >>>             $term =~ s/\\ /\\s/;
> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
> >>>             $data{'hgap'}=($1 =~ tr/-//);
> >>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
> >>>             $data{'mmcount'} = $data{'alen'} - ( int($data
> >>> {'percid'} *
> >>> $data{'alen'}) + $data{'gapcount'});
> >>>
> >>> for ( $data{'percid'}, $data{'percsim'} ) {
> >>>     $_ = sprintf("%.2f",$_*100);
> >>> }
> >>>
> >>>             print $outfh join( "\t",map { $data{$_} } @fields),"\n"
> >>>         }
> >>>
> >>> }
> >>>
> >>> <----------------- CODE ENDS HERE ---------------------->
> >>>
> >>> --
> >>>
> >>> *Ioannis Kirmitzoglou*, MSc
> >>> PhD. Student,
> >>> Bioinformatics Research Laboratory
> >>> Department of Biological Sciences
> >>> University of Cyprus
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> ===========================================================
> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> >> ===========================================================
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > --
> > Jason Stajich
> > jason at bioperl.org
> > http://jason.open-bio.org/
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From jason at bioperl.org  Sun Apr 22 20:24:23 2007
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 22 Apr 2007 13:24:23 -0700
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<44255ea80704170710k4972e50bw53b5df53274b8e4c@mail.gmail.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
Message-ID: <69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>

I do think that m9 is pretty compact if you don't need to see the  
alignment and just want the pairwise statistics and is analogous to  
BLAST m8/9 format.   I typically just use that + fastam9_to_table for  
input to MCL and other systems that can process tabular formats.

I cleaned up a few things in SearchIO::fasta but have not been able  
to see whether we can auto-detect m10 format and insert the necessary  
code just yet.

-jason
On Apr 22, 2007, at 10:11 AM, Ioannis Kirmitzoglou wrote:

> I agree with Jason. Both scripts (fastam9_to_table and  
> fastam10_to_table)
> are way faster and easier to use than the searchIO. Still, there  
> are a lot
> of cases where searchIO support for m10 would be useful (e.g when  
> trying to
> represent the alignment in a graphical way).
> Nevertheless I do think that FASTA needs an output similar to the  
> BLAST m8
> one which is really compact. Although I haven't tried it yet I do  
> believe
> that both scripts can be piped, so one easy and rather fast way to  
> produce
> an tabular output from FASTA would be to pipe its output directly  
> to one of
> the scripts.
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
>
>
>
> On 21/04/07, Chris Fields <cjfields at uiuc.edu> wrote:
>>
>> Ioannis's fastm10_to_table script is available in the bugzilla
>> enhancement request (as an attachment) if anyone's interested:
>>
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2278
>>
>> I haven't had a chance to really look into m10 output yet but it
>> looks easy enough to parse; may not be hard to get something  
>> SearchIO-
>> based up and running.
>>
>> chris
>>
>> On Apr 21, 2007, at 12:44 PM, Jason Stajich wrote:
>>
>> > We don't have one yet. This is a new format introduced in the most
>> > recent release of FASTA.  Hopefully someone can make some time  
>> to add
>> > some code to SearchIO::fasta for it.
>> >
>> > I do find that I when I need a fast FASTA to TAB converter that the
>> > simple script (fastam9_to_table) is more efficient that SearchIO
>> > framework so Ioannis is making a parallel one for the new m10
>> > output.  So I think having both is useful.
>> >
>> > -jason
>> > On Apr 21, 2007, at 10:14 AM, Hilmar Lapp wrote:
>> >
>> >> I haven't kept track of this - did this go anywhere? Do we not  
>> have
>> >> an -m10 fasta output parser in SearchIO? (I.e., my first thought
>> >> would be that that would be the desired solution; am I misled in
>> >> this?)
>> >>
>> >>      -hilmar
>> >>
>> >> On Apr 19, 2007, at 10:06 AM, Ioannis Kirmitzoglou wrote:
>> >>
>> >>> I have reported it as a bug on the bugzilla but due to bugzilla
>> >>> problems I
>> >>> was not able to attach my code and/or sample m10 files.
>> >>> Nevertheless here is the code that converts an m10 fasta  
>> output to
>> >>> an m8
>> >>> BLAST output which is parseable by the vast majority of software.
>> >>>
>> >>> <----------- CODE BEGINS HERE ------------------->
>> >>>
>> >>> #!/usr/bin/perl -w
>> >>>
>> >>> =head1 NAME
>> >>>
>> >>> fastam10_to_table  - turn FASTA -m 10 output into NCBI -m 8  
>> tabular
>> >>> output
>> >>>
>> >>> =head1 SYNOPSIS
>> >>>
>> >>>  fastam10_to_table [--header] [-o outfile] inputfile1  
>> inputfile2 ...
>> >>>
>> >>> =head1 DESCRIPTION
>> >>>
>> >>> Command line options:
>> >>>   --header                -- boolean flag to print column header
>> >>>   -o/--out                -- optional outputfile to write data,
>> >>>                              otherwise will write to STDOUT
>> >>>   -h/--help               -- show this documentation
>> >>>
>> >>> Not technically a SearchIO script as this doesn't use any Bioperl
>> >>> components but is a useful and fast.  The output is tabular  
>> output
>> >>> with the standard NCBI -m8 columns.
>> >>>
>> >>>  queryname
>> >>>  hit name
>> >>>  percent identity
>> >>>  alignment length
>> >>>  number mismatches
>> >>>  number gaps
>> >>>  query start  (if on rev-strand start > end)
>> >>>  query end
>> >>>  hit start (if on rev-strand start > end)
>> >>>  hit end
>> >>>  evalue
>> >>>  bit score
>> >>>
>> >>> Additionally 4 more columns are provided
>> >>>  percent similar
>> >>>  query length
>> >>>  hit length
>> >>>  query gaps
>> >>>  hit gaps
>> >>>
>> >>> =head1 AUTHOR - Ioannis Kirmitzoglou
>> >>>
>> >>> Ioannis Kirmitzoglou IoannisKirmitzoglou_at_gmail-dot-org
>> >>>
>> >>> =head1 ACKNOWLEDGMENTS - Ioannis Kirmitzoglou
>> >>>
>> >>> Headers as well as portions of code were taken
>> >>>> from fastam9_to_table.pl by Jason Stajich
>> >>>
>> >>> =head1 DISCLAIMER
>> >>>
>> >>> Copyright (c) <2007> <Ioannis Kirmitzolgou>
>> >>>
>> >>> Permission to use, copy, modify, merge, publish and distribute
>> >>> this software and its documentation, with or without  
>> modification,
>> >>> for any purpose, and without fee or royalty to the copyright  
>> holder
>> >>> (s)
>> >>> is hereby granted with no restictions and/or prerequisites.
>> >>>
>> >>> THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
>> >>> EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE  
>> WARRANTIES OF
>> >>> MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
>> >>> NONINFRINGEMENT.
>> >>> IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE  
>> FOR ANY
>> >>> CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF  
>> CONTRACT,
>> >>> TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
>> >>> SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
>> >>>
>> >>> =cut
>> >>>
>> >>> use strict;
>> >>> use Getopt::Long;
>> >>>
>> >>> my %data=();
>> >>>
>> >>> my $outfile=''; my $header='';
>> >>> GetOptions(
>> >>>     'header'              => \$header,
>> >>>     'o|out|outfile:s'     => \$outfile,
>> >>>     'h|help'              => sub { exec('perldoc',$0); exit; }
>> >>>        );
>> >>>
>> >>> my $outfh;
>> >>> if( $outfile ) {
>> >>>     open($outfh, ">$outfile") || die("$outfile: $!");
>> >>> } else {
>> >>>     $outfh = \*STDOUT;
>> >>> }
>> >>>
>> >>>
>> >>> $/="\n>>>";
>> >>>
>> >>> my @fields = qw(qname hname percid alen mmcount gapcount
>> >>>         qstart qend hstart hend evalue bits percsim qlen hlen  
>> qgap
>> >>> hgap);
>> >>>
>> >>> print $outfh "#",uc(join("", map{ sprintf("%-10s",$_) }  
>> @fields)),
>> >>> "\n" if
>> >>> $header;
>> >>>
>> >>> while (<>) {
>> >>>
>> >>>         chomp;
>> >>>         if ($_=~/^>/ || $_=~/^\#/) {next;}
>> >>>         my @hits = split(/\d+>>/, $_);
>> >>>         @hits= split("\n>>", $hits[0]);
>> >>>
>> >>>         my $hit = shift @hits;
>> >>>
>> >>>         ($data{'qname'}, $data{'qlen'} ) = ($hit=~ (/(\S+)\,\s(\d
>> >>> +)/));
>> >>>
>> >>>         foreach my $align (@hits) {
>> >>>
>> >>>             my @details= split ("\n>", $align);
>> >>>            my $detail = shift @details;
>> >>>             ($data{'hname'}) = ($detail =~ (/^(\S+)\s/));
>> >>>             $detail=~ /\;\s(?:fa|sw)\_bits\:\s+(\S+)/;
>> >>>             $data{'bits'}=$1;
>> >>>             $detail=~ /\;\s(?:fa|sw)\_expect\:\s+(\S+)/;
>> >>>             $data{'evalue'}=$1;
>> >>>
>> >>>             my $term = quotemeta("; sw_score");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'score'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_ident:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'percid'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_sim:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'percsim'}=$1;
>> >>>
>> >>>             $term = quotemeta("; sw_overlap:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'alen'}=$1;
>> >>>
>> >>>             $detail = shift @details;
>> >>>
>> >>>             $term = quotemeta("; al_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'qstart'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_stop:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'qend'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_display_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             my $lakis ='';
>> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>> >>>
>> >>>             $data{'qgap'}=($1 =~ tr/\-//);
>> >>>
>> >>>             $detail = shift @details;
>> >>>
>> >>>             $term = quotemeta("; sq_len:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hlen'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hstart'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_stop:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term\s+(\S+)/;
>> >>>             $data{'hend'}=$1;
>> >>>
>> >>>             $term = quotemeta("; al_display_start:");
>> >>>             $term =~ s/\\ /\\s/;
>> >>>             $detail=~ /$term.+\s\-*([\-\w\s]+)/g;
>> >>>             $data{'hgap'}=($1 =~ tr/-//);
>> >>>             $data{'gapcount'} = $data{'qgap'} + $data{'hgap'};
>> >>>             $data{'mmcount'} = $data{'alen'} - ( int($data
>> >>> {'percid'} *
>> >>> $data{'alen'}) + $data{'gapcount'});
>> >>>
>> >>> for ( $data{'percid'}, $data{'percsim'} ) {
>> >>>     $_ = sprintf("%.2f",$_*100);
>> >>> }
>> >>>
>> >>>             print $outfh join( "\t",map { $data{$_} }  
>> @fields),"\n"
>> >>>         }
>> >>>
>> >>> }
>> >>>
>> >>> <----------------- CODE ENDS HERE ---------------------->
>> >>>
>> >>> --
>> >>>
>> >>> *Ioannis Kirmitzoglou*, MSc
>> >>> PhD. Student,
>> >>> Bioinformatics Research Laboratory
>> >>> Department of Biological Sciences
>> >>> University of Cyprus
>> >>> _______________________________________________
>> >>> Bioperl-l mailing list
>> >>> Bioperl-l at lists.open-bio.org
>> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> >> --
>> >> ===========================================================
>> >> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> >> ===========================================================
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >
>> > --
>> > Jason Stajich
>> > jason at bioperl.org
>> > http://jason.open-bio.org/
>> >
>> >
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From ioanniskirmitzoglou at gmail.com  Mon Apr 23 09:45:53 2007
From: ioanniskirmitzoglou at gmail.com (Ioannis Kirmitzoglou)
Date: Mon, 23 Apr 2007 12:45:53 +0300
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
References: <10034698.post@talk.nabble.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
	<69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
Message-ID: <b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>

I don't know about older versions but the latest version of FASTA starts its
output with a line similar to those:
# fasta34.exe -m9 -d0 -Q test.faa test.faa OR
# fasta34.exe -m10 -Q test.faa test.faa

This very first line is also the only one in the output that starts with
'#'.
Isn't this an easy way to determine the output type?


-- 

*Ioannis Kirmitzoglou*, MSc
PhD. Student,
Bioinformatics Research Laboratory
Department of Biological Sciences
University of Cyprus


From cjfields at uiuc.edu  Mon Apr 23 12:46:40 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Apr 2007 07:46:40 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>
References: <10034698.post@talk.nabble.com>
	<b72662da0704170919u10133d2bnb73fc964be68b46a@mail.gmail.com>
	<b72662da0704170921h7d9f9a0do6ef2458479e538e7@mail.gmail.com>
	<639E3C4F-4A0B-4752-AC9F-9C96870550E1@uiuc.edu>
	<b72662da0704190706y106f1765sed632d350b231629@mail.gmail.com>
	<19646C47-F6A5-4FBD-BF72-D015F484BB1F@gmx.net>
	<E3D662F9-578F-4BE2-B509-1AB6E2C96F68@bioperl.org>
	<A5BEE2BE-B280-442A-9A15-3125BA886977@uiuc.edu>
	<b72662da0704221011h7b2a3f90sac21c32691014377@mail.gmail.com>
	<69873028-E766-46A2-A7A4-FEBE8650E1B7@bioperl.org>
	<b72662da0704230245g65ba31c4hd9b078c93bb845fd@mail.gmail.com>
Message-ID: <333A7BEF-71E3-4E15-B2EC-384AEBAA13B7@uiuc.edu>

That's true, but older versions of fasta don't do this.  For  
instance, the example files in the bioperl distribution in t/data  
(HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this  
line.

 From the fasta changelog:

-------------------------------------------------------------
 >>Nov 14-22, 2002  CVS fa34t20b6

Include compile-time define (-DPGM_DOC) that causes all the fasta
programs to provide the same command line echo that is provided by the
PVM and MPI parallel programs.  Thus, if you run the program:

     fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12

the first lines of output from FASTA will be:

     # fasta34_t -q gtt1_drome.aa /slib/swissprot
      FASTA searches a protein or DNA sequence data bank
      version 3.4t20 Nov 10, 2002
     Please cite:
      W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

This has been turned on by default in most FASTA Makefiles.
-------------------------------------------------------------

We could only support newer fasta output (newer that the above  
version) since there have been several bug fixes and changes to  
output; not sure how everyone else feels about this.

chris

On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:

> I don't know about older versions but the latest version of FASTA  
> starts its
> output with a line similar to those:
> # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
> # fasta34.exe -m10 -Q test.faa test.faa
>
> This very first line is also the only one in the output that starts  
> with
> '#'.
> Isn't this an easy way to determine the output type?
>
>
> -- 
>
> *Ioannis Kirmitzoglou*, MSc
> PhD. Student,
> Bioinformatics Research Laboratory
> Department of Biological Sciences
> University of Cyprus
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Apr 23 13:49:45 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 23 Apr 2007 08:49:45 -0500
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>
References: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>
Message-ID: <12707EA8-F245-4AE7-BFD1-EE861F431F3D@uiuc.edu>

Aaron,

I find -m 10 defined way back in fasta2 notes:

--------------------------------------------------------------
Changes with 2.0x4  (January, 1996)

The major change in with 2.0x4 is the ability to get a parseable
output from FASTA/TFASTA/SSEARCH.  This can be done using output
option -m 10.  ...
--------------------------------------------------------------

It goes on to define it in more detail (which is nice to have  
around!).  It's possible it wasn't implemented until recently for  
fasta3 but I find references to it in the various fasta3 notes going  
back to at least 2001, so maybe it wasn't not compiled by default  
until recently?  The extra '#' line was added in 2002 to all output  
as far as I can tell.

We could just have SearchIO::fasta fall back to default parsing if  
'#' isn't present.  The default format and m10 are sufficiently  
different enough that we probably want to separate m10 parsing into  
it's own parser subroutine so we don't screw with the default parsing  
too much.

chris

On Apr 23, 2007, at 8:29 AM, aaron.j.mackey at gsk.com wrote:

> Since -m10 is newer than PGM_DOC, you should be fine to use the  
> first line
> as a detection for m10, when that first line exists (when it does  
> not, the
> format cannot be m10, unless someone has re-compiled FASTA with an
> undefined PGM_DOC).
>
> -Aaron
>
> bioperl-l-bounces at lists.open-bio.org wrote on 04/23/2007 08:46:40 AM:
>
>> That's true, but older versions of fasta don't do this.  For
>> instance, the example files in the bioperl distribution in t/data
>> (HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this
>> line.
>>
>>  From the fasta changelog:
>>
>> -------------------------------------------------------------
>>>> Nov 14-22, 2002  CVS fa34t20b6
>>
>> Include compile-time define (-DPGM_DOC) that causes all the fasta
>> programs to provide the same command line echo that is provided by  
>> the
>> PVM and MPI parallel programs.  Thus, if you run the program:
>>
>>      fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12
>>
>> the first lines of output from FASTA will be:
>>
>>      # fasta34_t -q gtt1_drome.aa /slib/swissprot
>>       FASTA searches a protein or DNA sequence data bank
>>       version 3.4t20 Nov 10, 2002
>>      Please cite:
>>       W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
>>
>> This has been turned on by default in most FASTA Makefiles.
>> -------------------------------------------------------------
>>
>> We could only support newer fasta output (newer that the above
>> version) since there have been several bug fixes and changes to
>> output; not sure how everyone else feels about this.
>>
>> chris
>>
>> On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:
>>
>>> I don't know about older versions but the latest version of FASTA
>>> starts its
>>> output with a line similar to those:
>>> # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
>>> # fasta34.exe -m10 -Q test.faa test.faa
>>>
>>> This very first line is also the only one in the output that starts
>>> with
>>> '#'.
>>> Isn't this an easy way to determine the output type?
>>>
>>>
>>> -- 
>>>
>>> *Ioannis Kirmitzoglou*, MSc
>>> PhD. Student,
>>> Bioinformatics Research Laboratory
>>> Department of Biological Sciences
>>> University of Cyprus
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Robert Switzer
>> Dept of Biochemistry
>> University of Illinois Urbana-Champaign
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From aaron.j.mackey at gsk.com  Mon Apr 23 13:29:39 2007
From: aaron.j.mackey at gsk.com (aaron.j.mackey at gsk.com)
Date: Mon, 23 Apr 2007 09:29:39 -0400
Subject: [Bioperl-l] Parsing FASTA m10 output
In-Reply-To: <333A7BEF-71E3-4E15-B2EC-384AEBAA13B7@uiuc.edu>
Message-ID: <OFD1E9158F.539B37D5-ON852572C6.0049F684-852572C6.004A196E@gsk.com>

Since -m10 is newer than PGM_DOC, you should be fine to use the first line 
as a detection for m10, when that first line exists (when it does not, the 
format cannot be m10, unless someone has re-compiled FASTA with an 
undefined PGM_DOC).

-Aaron

bioperl-l-bounces at lists.open-bio.org wrote on 04/23/2007 08:46:40 AM:

> That's true, but older versions of fasta don't do this.  For 
> instance, the example files in the bioperl distribution in t/data 
> (HUMBETGLOA.FASTA, cysprot1.fasta, cysprot_vs_gadfly.fasta) lack this 
> line.
> 
>  From the fasta changelog:
> 
> -------------------------------------------------------------
>  >>Nov 14-22, 2002  CVS fa34t20b6
> 
> Include compile-time define (-DPGM_DOC) that causes all the fasta
> programs to provide the same command line echo that is provided by the
> PVM and MPI parallel programs.  Thus, if you run the program:
> 
>      fasta34_t -q -S gtt1_drome.aa /slib/swissprot 12
> 
> the first lines of output from FASTA will be:
> 
>      # fasta34_t -q gtt1_drome.aa /slib/swissprot
>       FASTA searches a protein or DNA sequence data bank
>       version 3.4t20 Nov 10, 2002
>      Please cite:
>       W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448
> 
> This has been turned on by default in most FASTA Makefiles.
> -------------------------------------------------------------
> 
> We could only support newer fasta output (newer that the above 
> version) since there have been several bug fixes and changes to 
> output; not sure how everyone else feels about this.
> 
> chris
> 
> On Apr 23, 2007, at 4:45 AM, Ioannis Kirmitzoglou wrote:
> 
> > I don't know about older versions but the latest version of FASTA 
> > starts its
> > output with a line similar to those:
> > # fasta34.exe -m9 -d0 -Q test.faa test.faa OR
> > # fasta34.exe -m10 -Q test.faa test.faa
> >
> > This very first line is also the only one in the output that starts 
> > with
> > '#'.
> > Isn't this an easy way to determine the output type?
> >
> >
> > -- 
> >
> > *Ioannis Kirmitzoglou*, MSc
> > PhD. Student,
> > Bioinformatics Research Laboratory
> > Department of Biological Sciences
> > University of Cyprus
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From bix at sendu.me.uk  Tue Apr 24 10:21:29 2007
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 24 Apr 2007 11:21:29 +0100
Subject: [Bioperl-l] WrapperBase / StandAloneBlast executable() method
	confusion
Message-ID: <462DDA29.4090104@sendu.me.uk>

Hi,

I'm a little unsure of the intent for executable() in wrapper modules. 
The WrapperBase version of the method and the StandAloneBlast version 
have the same POD but different implementations.

WrapperBase takes as a first arg an 'exe' which it will blindly trust is 
the path to a working executable. (That doesn't seem sensible already.) 
It is only capable of storing one such path.

If no arg is supplied it uses program_path() (which uses program_name()) 
to find the executable. Failing that it does a further direct test on 
program_name() to see if its executable.


StandAloneBlast takes as a first arg merely the name of your exe and 
also (undocumented) the path to the corresponding executable (which is 
tested to see if it really executable). It can store executable paths 
for multiple different exenames (corresponding better with the docs for 
the first arg: "name of executable to set path to").

If no second arg is supplied it does something similar to WrapperBase, 
except that it uses the first arg exename (or a default if that wasn't 
supplied) in place of program_name().


I'm trying to generalize this so StandAloneBlast can just use the 
WrapperBase version (and so other wrappers can then store executable 
paths for different sub-programs). Any suggestions for a good way of 
melding these two together whilst somehow retaining backward compatibility?


From cjfields at uiuc.edu  Tue Apr 24 12:55:43 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 24 Apr 2007 07:55:43 -0500
Subject: [Bioperl-l] WrapperBase / StandAloneBlast executable() method
	confusion
In-Reply-To: <462DDA29.4090104@sendu.me.uk>
References: <462DDA29.4090104@sendu.me.uk>
Message-ID: <8F1427D6-8654-461E-B9AA-E51CC3A20318@uiuc.edu>

I'm not sure, but you might want to bring Torsten in on this as he  
took over maintaining StandAloneBlast.  Much of the confusion may  
stem from the independent evolution of StandAloneBlast and WrapperBase.

Also, (a bit unrelated), there were plans for unifying the  
Bio::Tools::Run BLAST modules described here:

http://www.bioperl.org/wiki/Module:Bio::Tools::Run::RemoteBlast

Seemed like there was a general consensus at the time on the need to  
refactor StandAloneBlast and RemoteBlast code, so maybe the best  
place to start is StandAloneBlast (the others could be added in from  
there).  We could just deprecate use of the older modules at some  
point in favor of the new scheme.

chris

On Apr 24, 2007, at 5:21 AM, Sendu Bala wrote:

> Hi,
>
> I'm a little unsure of the intent for executable() in wrapper modules.
> The WrapperBase version of the method and the StandAloneBlast version
> have the same POD but different implementations.
>
> WrapperBase takes as a first arg an 'exe' which it will blindly  
> trust is
> the path to a working executable. (That doesn't seem sensible  
> already.)
> It is only capable of storing one such path.
>
> If no arg is supplied it uses program_path() (which uses  
> program_name())
> to find the executable. Failing that it does a further direct test on
> program_name() to see if its executable.
>
>
> StandAloneBlast takes as a first arg merely the name of your exe and
> also (undocumented) the path to the corresponding executable (which is
> tested to see if it really executable). It can store executable paths
> for multiple different exenames (corresponding better with the docs  
> for
> the first arg: "name of executable to set path to").
>
> If no second arg is supplied it does something similar to WrapperBase,
> except that it uses the first arg exename (or a default if that wasn't
> supplied) in place of program_name().
>
>
> I'm trying to generalize this so StandAloneBlast can just use the
> WrapperBase version (and so other wrappers can then store executable
> paths for different sub-programs). Any suggestions for a good way of
> melding these two together whilst somehow retaining backward  
> compatibility?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From avilella at gmail.com  Tue Apr 24 16:10:19 2007
From: avilella at gmail.com (Albert Vilella)
Date: Tue, 24 Apr 2007 17:10:19 +0100
Subject: [Bioperl-l] lack of markers for some genotypes in some
	Bio::PopGen::Statistics methods
Message-ID: <358f4d650704240910u4c90864cqd6c4e38ecedef4c5@mail.gmail.com>

Hi,

I have some genotype data where some individuals don't have a given marker
in the population.

This means that some methods in Bio::PopGen::Statistics will fail when
trying to get them, so I've added a couple of "next unless (defined($sth));"
around to overcome this. But I am not sure if this breaks any assumption
made when implementing the methods.

Anyone able to check this?

Thanks,

    Albert.

avilella at magneto:~$ diff -u
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm.modif
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm
---
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm.modif
2007-04-24 15:05:51.000000000 +0100
+++
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Population.pm
2007-04-22 16:03:24.000000000 +0100
@@ -546,7 +546,6 @@
        # separate genotypes into 'chromosomes'
        for my $marker_name( @marker_names ) {
           my ($genotype) = $ind->get_Genotypes(-marker => $marker_name);
-           next unless defined($genotype); #FIXME -- is this correct?
           my $i =0;
           for my $allele ( $genotype->get_Alleles ) {
               push @{$chromosomes[$i]},

avilella at magneto:~$ diff -u
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm.modif
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm
---
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm.modif
2007-04-24 15:04:51.000000000 +0100
+++
/home/avilella/bioperl/vanilla/bioperl-live/Bio/PopGen/Statistics.pm
2007-04-22 16:03:24.000000000 +0100
@@ -656,8 +656,6 @@
                return 0;
            }
            foreach my $m ( @marker_names ) {
-              my $genotype = $ind->get_Genotypes($m);
-              next unless defined($genotype); #FIXME -- is this correct?
                foreach my $allele (map { $_->get_Alleles}
                               $ind->get_Genotypes($m) ) {
                    $data{$m}->{$allele}++;


From MEC at stowers-institute.org  Thu Apr 26 16:48:45 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Thu, 26 Apr 2007 11:48:45 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
Message-ID: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>

Lincoln, et al,

I find that the gff3_string for Bio::DB::SeqFeature objects retreived
from a Bio::DB::SeqFeature::Store that were initially created with
-seqments (i.e. whose location was discontiguous) does not display any
other attributes in column 9 than "Name".

What do you think of the following patch to Bio::Graphics::FeatureBase,
whose effect is to "contrive to return (duplicated) common group values"
(which otherwise get lost when "collapsing" "homogenous" parent/child
features) 

Another approach would be to copy the attributes from the parent to the
children when the -seqments are first created.

Another approach would be to use Bio::SeqFeature::Generic  as the db's
-seqfeature_class and save with -location being a Bio::Location::Split,
but this was wrougth with other problems.

Any other suggestions?  Do you want me to commit this patch?

Cheers,

Malcolm
 
Patch follows:


Index: FeatureBase.pm
===================================================================
RCS file:
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
retrieving revision 1.29
diff -c -r1.29 FeatureBase.pm
*** FeatureBase.pm	16 Apr 2007 19:55:33 -0000	1.29
--- FeatureBase.pm	26 Apr 2007 16:30:23 -0000
***************
*** 581,587 ****
      foreach (@children) { 
        s/Parent=/ID=/g; 
      } # replace Parent tag with ID
!     return join "\n", at children;
    }
  
    return join("\n",$p, at children);
--- 581,589 ----
      foreach (@children) { 
        s/Parent=/ID=/g; 
      } # replace Parent tag with ID
!     #return join "\n", at children;
!     # Instead of above, additionally, contrive to return (duplicated)
common group values
!     return(join("$group\n", at children) . $group);
    }
  
    return join("\n",$p, at children);


From emeric.sevin at univ-rennes1.fr  Thu Apr 26 08:48:37 2007
From: emeric.sevin at univ-rennes1.fr (Emeric Sevin)
Date: Thu, 26 Apr 2007 10:48:37 +0200
Subject: [Bioperl-l] rpsblast results unsupported by
	Bio::SearchIO::Writer
In-Reply-To: <7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>
References: <46028EA0.7070901@crs4.it>
	<8015924160e6b1f3af747fe2a906503a@univ-rennes1.fr>
	<60b0ac03aedc2a3e61f4638e96edaa7a@univ-rennes1.fr>
	<7F2B71E5-6473-402C-B0AA-56AE619293E1@bioperl.org>
Message-ID: <4ef54906af35b3cbf231303285527055@univ-rennes1.fr>

hi! sorry for the delay, took a little vacation ;-)

indeed I don't see any trouble in coding a supplementary test, I'm just 
not at all familiar with the patch release/bioperl package update and 
would prefer leave that to you. For that purpouse I'll take care of 
that bug post in the coming hours!
Thank you very much
Emeric

Le 13 avr. 07, ? 22:13, Jason Stajich a ?crit :

> I think it just needs an edit the code in the to_string which checks
> for the type of algorithm.  You'd need to add to the if/elsif cascade
> and add something for the RPSBLAST type and codes the query and
> target dbs and query and target sequence types properly.  This would
> be very trivial to code in, have you tried adding this to see if it
> works?
>
> if you submit a bug with and example report we'd be able to make
> appropriate changes faster.
>
> -jason
> On Apr 11, 2007, at 6:32 AM, Emeric Sevin wrote:
>
>> Hi everybody,
>>
>> I'm sorry to bug, but either I missed something so obvious nobody
>> bothered to answer, either I'm being a little boycotted here...
>> A little help would be very much appreciated
>>
>> Le 22 mars 07, ? 16:07, Emeric Sevin a ?crit :
>>
>>> Hello,
>>>
>>> I am new to this community, and apologize if this subject has been
>>> posted before.
>>>
>>> I want to print out only selected results from a multiple blast-
>>> alignments results file. Problem is, the algorithm used is
>>> rpsblast. The parsing (with Bio::SearchIO) goes fine, but the
>>> actual writing task yields "unclean" warnings. Although an ouput
>>> is actually written, the writer
>>> (Bio::SearchIO::Writer::TextResultWriter) seems to be disturbed by
>>> the fact rpsblast DBs are not labeled with
>>> "protein"/"nucleic"/"translated".
>>> Does anybody know of an easy fix to that bug, or of another way to
>>> come around it?
>>>
>>> Thank you very much
>>>
>>> Emeric SEVIN
>>> Universit? de Rennes 1_______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From budd at embl-heidelberg.de  Thu Apr 26 10:18:11 2007
From: budd at embl-heidelberg.de (Aidan Budd)
Date: Thu, 26 Apr 2007 12:18:11 +0200 (CEST)
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
Message-ID: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>

Hi Bioperlers,

I'm trying to parse a FASTA search output file (see attached .out file) 
using Bioperl 1.4. My Bioperl installation has otherwise been working 
fine, however I currently get the following error when running a simple 
script that attempts to access result from this outfile via bioperl.

Is this a problem with the parser?
Or have I executed FASTA wrongly creating output that isn't covered by the 
parser?

Any suggestions on how to deal with this much appreciated.

Best wishes,

Aidan

Script:

#!/usr/bin/perl -w
$^W=1;
use strict;
use Bio::SearchIO;

my $fasta_report = new Bio::SearchIO ('-format' => 'fasta',
                                      '-file'   => $ARGV[0]);
                                      
my $result = $fasta_report->next_result();            

Errors:

Use of uninitialized value in concatenation (.) or string at 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/GenericHSP.pm 
line 231, <GEN3> line 47.

------------- EXCEPTION  -------------
MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm 
FASTP -score 62.4 -hit_frame 0 -hsp_length 180 -hit_seq  -hit_length 0 
-query_length 128 -query_frame 0 -swscore 122 -rank 1 -query_seq 
GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS--PALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD-LYCHKSD 
-homology_seq                              
MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKRAKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RKCSL...LL.SVNL.K.ADHE.A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTMEPATLSPKSMR 
-hit_name YFL031W -bits 19.4 -query_name CREB1_MONKEY -evalue 1.1 (qs='
STACK Bio::Search::HSP::GenericHSP::new 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/GenericHSP.pm:231
STACK Bio::Search::HSP::FastaHSP::new 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/FastaHSP.pm:97
STACK Bio::Factory::ObjectFactory::create_object 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Factory/ObjectFactory.pm:150
STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/SearchResultEventBuilder.pm:275
STACK Bio::SearchIO::fasta::end_element 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/fasta.pm:872
STACK Bio::SearchIO::fasta::next_result 
/Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/fasta.pm:403
STACK toplevel 
/Users/budd/scripts/test_scripts/test_parsing_fasta_output.pl:22

--------------------------------------

-- 
----------------------------------------------------------------------
Aidan Budd, PhD                               tel:+49 (0)6221 387 8530
EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
Meyerhofstr. 1, 69117 Heidelberg, Germany

URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html
-------------- next part --------------
# fasta34 -m 2 creb1_human.fasta yeast_bzips_from_ensembl.fasta
FASTA searches a protein or DNA sequence data bank
 version 34.26 January 12, 2007
Please cite:
 W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448

Query library creb1_human.fasta vs yeast_bzips_from_ensembl.fasta library
searching yeast_bzips_from_ensembl.fasta library

  1>>>CREB1_MONKEY 341 aa - 341 aa
 vs  yeast_bzips_from_ensembl.fasta library

   3683 residues in    10 sequences
 MLE_cen statistics: Lambda= 0.0338;  K=8.757e-05 (cen=0)

FASTA (3.5 Sept 2006) function [optimized, BL50 matrix (15:-5)] ktup: 2
 join: 37, opt: 25, open/ext: -10/-2, width:  16
 Scan time:  0.000
The best scores are:                                      opt bits E(10)
YFL031W                                            ( 238)  122 19.4     1.1
YEL009C                                            ( 281)  121 19.4     1.3
YIL036W                                            ( 587)  129 19.8       2
YIR017C                                            ( 187)   83 17.5     2.9
YVNL167C                                           ( 647)  119 19.3     2.9
YIR018W                                            ( 245)   67 16.7     5.3
YER045C                                            ( 489)   73 17.0     7.1
YDR259C                                            ( 383)   62 16.5     7.5
YOR028C                                            ( 296)   41 15.5     8.9
YHL009C                                            ( 330)   33 15.1     9.6

>>YFL031W                                                 (238 aa)
 initn: 107 init1: 107 opt: 122  Z-score: 62.4  bits: 19.4 E():  1.1
Smith-Waterman score: 122;  27.660% identity (63.830% similar) in 94 aa overlap (248-337:2-95)

       220       230       240       250       260       270       
CREB1_ GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS--PALP
YFL031                              MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKR

         280       290       300       310       320        330    
CREB1_ TQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD
YFL031 AKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RKCSL...LL.SVNL.K.ADHE.

           340                                                     
CREB1_ -LYCHKSD                                                    
YFL031 A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTMEPATLSPKSMR

>>YEL009C                                                 (281 aa)
 initn: 138 init1:  83 opt: 121  Z-score: 60.8  bits: 19.4 E():  1.3
Smith-Waterman score: 121;  29.412% identity (55.462% similar) in 119 aa overlap (219-335:165-277)

      190       200       210       220       230       240        
CREB1_ GAIQLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGD
YEL009 VSLADKAIESTEEVSLVPSNLEVSTTSFLP.PV.ED.KL.QTRKVKK.NS--..KKSHHV

      250       260       270         280       290       300      
CREB1_ VQTYQIRTAPTSTIAPGVVMASSPALPTQP--AEEAARKREVRLMKNREAARECRRKKKE
YEL009 GKDDES.LDHLGVV.YNRKQR.I.LS.IV.ESSDP..L..----AR.T....RS.AR.LQ

        310       320       330       340 
CREB1_ YVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
YEL009 RM.Q..DK.EE.LSK.YH.EN.VAR..K.VGER  

>>YIL036W                                                 (587 aa)
 initn: 132 init1:  70 opt: 129  Z-score: 57.2  bits: 19.8 E():    2
Smith-Waterman score: 129;  18.750% identity (55.682% similar) in 352 aa overlap (2-335:137-477)

                                            10        20           
CREB1_                              MTMESGAENQQSGDAAVTEAENQQM--TVQA
YIL036 RVVKPSANSNYQQAAYLRQQQQQDQRQQSPS.KTEE.S.LY..ILMNSGVV.D.HQNLAT

      30        40        50        60        70        80         
CREB1_ QPQIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGVIQAAQPSVIQSPQVQTVQSS
YIL036 HTNLSQ.SSTRKS.PNDSTT...-NASNIA.--.AS.NKQMYFMNMNMNNN.HALNDP.I

      90         100       110         120       130       140     
CREB1_ CKDLKRLFS--GTQISTIAESEDS--QESVDSVTDSQKRREILSRRPSYRKILNDL----
YIL036 LET.SPF.QPF.VDVAHLPMTNPPIF.S.LPGCDEPIR..R.SISNGQISQLGE.IETLE

                150       160          170        180       190    
CREB1_ ---SSDAPGVPRIEEEKSEEET---SAPAITTVTVP-TPIYQTSSGQYIAITQGGAIQLA
YIL036 NLHNTQP.PM.NFHNYNGLSQ.RNV.NKPVFNQA..VSS.P.YNAKKV.NP.KDS.--.G

          200       210       220       230       240       250    
CREB1_ NNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQI
YIL036 DQSVIYSKSQ.RNFVNAPSKNT.AES.----SDLE.MTTFA.TTGGENRGK.ALRESHSN

           260       270       280       290       300       310   
CREB1_ RT-APTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLEN
YIL036 PSFT.K.QGSHLNLA.NTQGN.I-.GT-T.W..ARL.ER..I..SK..QR..VAQLQ.QK

           320       330       340                                 
CREB1_ RVAVLENQNKTLIEELKALKDLYCHKSD                                
YIL036 EFNEIKDE.RI.LKK.NYYEK.ISKFKKFSKIHLREHEKLNKDSDNNVNGTNSSNKNESM

>>YIR017C                                                 (187 aa)
 initn:  43 init1:  43 opt:  83  Z-score: 54.0  bits: 17.5 E():  2.9
Smith-Waterman score: 84;  22.785% identity (56.962% similar) in 158 aa overlap (176-330:9-148)

         150       160       170       180       190       200     
CREB1_ PGVPRIEEEKSEEETSAPAITTVTVPTPIYQTSSGQYIAITQGGAIQLANNGTDGVQGLQ
YIR017                       MSAKQGWEKK.TNID..SRK.MNV---..LSEHL.N.I

         210       220       230       240        250       260    
CREB1_ TLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQVVVQAASG-DVQTYQIRTAPTS--TI
YIR017 S------SDSEL.SRL.SLLLVSS.N-----AEELISMINN.Q..SQFKKLRE.RKGKVA

            270       280       290       300       310       320  
CREB1_ APGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQN
YIR017 .TTA.VVKEEEA.VSTSN.LDKIKQE.RR..T..SQRF.IR..Q--.NF..-MNK.Q.L.

            330       340                             
CREB1_ KTLIEELKALKDLYCHKSD                            
YIR017 -.Q.NK.RDRIEQLNKENEFWKAKLNDINEIKSLKLLNDIKRRNMGR

>>YVNL167C                                                (647 aa)
 initn: 142 init1: 119 opt: 119  Z-score: 53.8  bits: 19.3 E():  2.9
Smith-Waterman score: 119;  39.623% identity (62.264% similar) in 53 aa overlap (280-332:426-478)

     250       260       270       280       290       300         
CREB1_ QTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVK
YVNL16 RKNSAVTTAPAQKDDVENNKISNNVTLDEN..QE...KEF.ER..V..SKF.KR....I.

     310       320       330       340                             
CREB1_ CLENRVAVLENQNKTLIEELKALKDLYCHKSD                            
YVNL16 KI..DLQFY.SEYDD.TQVIGK.CGIIPSSSSNSQFNVNVSTPSSSSPPSTSLIALLESS

>>YIR018W                                                 (245 aa)
 initn:  61 init1:  61 opt:  67  Z-score: 47.6  bits: 16.7 E():  5.3
Smith-Waterman score: 67;  25.455% identity (61.818% similar) in 55 aa overlap (280-334:55-109)

     250       260       270       280       290       300         
CREB1_ QTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVK
YIR018 SKNWKLPPRLPHRAAQRRKRVHRLHEDYET..NDEELQKKKRQ..D.Q.AY.ER.NNKLQ

     310       320       330       340                             
CREB1_ CLENRVAVLENQNKTLIEELKALKDLYCHKSD                            
YIR018 V..ETIES.SKVV.NYETK.NR.QNELQAKESENHALKQKLETLTLKQASVPAQDPILQN

>>YER045C                                                 (489 aa)
 initn: 111 init1:  70 opt:  73  Z-score: 43.8  bits: 17.0 E():  7.1
Smith-Waterman score: 97;  22.826% identity (67.391% similar) in 92 aa overlap (3-92:210-300)

                                           10        20         30 
CREB1_                             MTMESGAENQQSGDAAVTEAE-NQQMTVQAQP
YER045 QTGSKNIYAAMTPYDSNIKLNIPAVAATCDIP.ATPSIP...STMNQ.YI.M.LRL...M

              40        50        60         70        80        90
CREB1_ QIATLAQVSMPAAHATSSAPTVTLVQLPNGQTVQVHGV-IQAAQPSVIQSPQVQTVQSSC
YER045 .TKAWKNAQL-NV.PCTP.SNSSVSSSSSC.NIND.NIEN.SVHS.ISHGVNHH..NN..

              100       110       120       130       140       150
CREB1_ KDLKRLFSGTQISTIAESEDSQESVDSVTDSQKRREILSRRPSYRKILNDLSSDAPGVPR
YER045 QNAELNISSSLPYESKCPDVNLTHANSKPQYKDATSALKNNINSEKDVHTAPFSSMHTTA

>>YDR259C                                                 (383 aa)
 initn:  84 init1:  52 opt:  62  Z-score: 42.8  bits: 16.5 E():  7.5
Smith-Waterman score: 81;  33.333% identity (64.583% similar) in 48 aa overlap (289-330:227-274)

      260       270       280       290       300       310        
CREB1_ TSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVL
YDR259 NDNNDNVTKPVPDKDTQLISSSGKTLRNTR.AAQ..T.QKAF.QR.EK.I.N..QKSKIF

           320        330       340                                
CREB1_ -----ENQN-KTLIEELKALKDLYCHKSD                               
YDR259 DDLLA..N.F.S.NDS.RNDNNILIAQHEAIRNAITMLRSEYDVLCNENNMLKNENSIIK

>>YOR028C                                                 (296 aa)
 initn:  35 init1:  35 opt:  41  Z-score: 39.3  bits: 15.5 E():  8.9
Smith-Waterman score: 80;  33.962% identity (66.038% similar) in 53 aa overlap (289-334:243-295)

      260       270       280       290       300       310        
CREB1_ TSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVL
YOR028 LSEQVFNEGERYNNDGQLIGKTGKPLRNTK.AAQ..S.QKAF.QRREK.I.N..EKSKLF

           320        330        340 
CREB1_ -----ENQN-KTLIEELKA-LKDLYCHKSD
YOR028 DGLMK..SEL.KM..S..SK..E*      

>>YHL009C                                                 (330 aa)
 initn:  33 init1:  33 opt:  33  Z-score: 36.4  bits: 15.1 E():  9.6
Smith-Waterman score: 91;  21.667% identity (57.500% similar) in 120 aa overlap (222-333:79-194)

             200       210       220       230             240     
CREB1_ QLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQI-LVP-----SNQVVVQAA
YHL009 EQTAPFPILEDQCPALNLDRSNNDLLLQNNISFPKGS.L.A.Q.T.ISGDY.TY.MADNN

         250         260       270       280       290       300   
CREB1_ SGDVQTYQIRT--APTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAARECRRK
YHL009 NN.NDS.SNTNYFSKNNG.S.SSRSP.VAHNENV.DDSK.K.KA----Q..A.QKAF.ER

           310       320       330       340                       
CREB1_ KKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD                      
YHL009 .EARM.E.QDKLLES.RNRQS.LK.IEE.RKANTEINAENRLLLRSGNENFSKDIEDDTN


341 residues in 1 query   sequences
3683 residues in 10 library sequences
 Scomplib [34.26]
 start: Thu Apr 26 11:52:16 2007 done: Thu Apr 26 11:52:16 2007
 Total Scan time:  0.000 Total Display time:  0.010

Function used was FASTA [version 34.26 January 12, 2007]
-------------- next part --------------
>CREB1_MONKEY
MTMESGAENQQSGDAAVTEAENQQMTVQAQPQIATLAQVSMPAAHATSSAPTVTLVQLPN
GQTVQVHGVIQAAQPSVIQSPQVQTVQSSCKDLKRLFSGTQISTIAESEDSQESVDSVTD
SQKRREILSRRPSYRKILNDLSSDAPGVPRIEEEKSEEETSAPAITTVTVPTPIYQTSSG
QYIAITQGGAIQLANNGTDGVQGLQTLTMTNAAATQPGTTILQYAQTTDGQQILVPSNQV
VVQAASGDVQTYQIRTAPTSTIAPGVVMASSPALPTQPAEEAARKREVRLMKNREAAREC
RRKKKEYVKCLENRVAVLENQNKTLIEELKALKDLYCHKSD
-------------- next part --------------
>YIL036W
MFTGQEYHSVDSNSNKQKDNNKRGIDDTSKILNNKIPHSVSDTSAAATTTSTMNNSALSR
SLDPTDINYSTNMAGVVDQIHDYTTSNRNSLTPQYSIAAGNVNSHDRVVKPSANSNYQQA
AYLRQQQQQDQRQQSPSMKTEEESQLYGDILMNSGVVQDMHQNLATHTNLSQLSSTRKSA
PNDSTTAPTNASNIANTASVNKQMYFMNMNMNNNPHALNDPSILETLSPFFQPFGVDVAH
LPMTNPPIFQSSLPGCDEPIRRRRISISNGQISQLGEDIETLENLHNTQPPPMPNFHNYN
GLSQTRNVSNKPVFNQAVPVSSIPQYNAKKVINPTKDSALGDQSVIYSKSQQRNFVNAPS
KNTPAESISDLEGMTTFAPTTGGENRGKSALRESHSNPSFTPKSQGSHLNLAANTQGNPI
PGTTAWKRARLLERNRIAASKCRQRKKVAQLQLQKEFNEIKDENRILLKKLNYYEKLISK
FKKFSKIHLREHEKLNKDSDNNVNGTNSSNKNESMTVDSLKIIEELLMIDSDVTEVDKDT
GKIIAIKHEPYSQRFGSDTDDDDIDLKPVEGGKDPDNQSLPNSEKIK
>YIR017C
MSAKQGWEKKSTNIDIASRKGMNVNNLSEHLQNLISSDSELGSRLLSLLLVSSGNAEELI
SMINNGQDVSQFKKLREPRKGKVAATTAVVVKEEEAPVSTSNELDKIKQERRRKNTEASQ
RFRIRKKQKNFENMNKLQNLNTQINKLRDRIEQLNKENEFWKAKLNDINEIKSLKLLNDI
KRRNMGR
>YVNL167C
MSSEERSRQPSTVSTFDLEPNPFEQSFASSKKALSLPGTISHPSLPKELSRNNSTSTITQ
HSQRSTHSLNSIPEENGNSTVTDNSNHNDVKKDSPSFLPGQQRPTIISPPILTPGGSKRL
PPLLLSPSILYQANSTTNPSQNSHSVSVSNSNPSAIGVSSTSGSLYPNSSSPSGTSLIRQ
PRNSNVTTSNSGNGFPTNDSQMPGFLLNLSKSGLTPNESNIRTGLTPGILTQSYNYPVLP
SINKNTITGSKNVNKSVTVNGSIENHPHVNIMHPTVNGTPLTPGLSSLLNLPSTGVLANP
VFKSTPTTNTTDGTVNNSISNSNFSPNTSTKAAVKMDNPAEFNAIEHSAHNHKENENLTT
QIENNDQFNNKTRKRKRRMSSTSSTSKASRKNSISRKNSAVTTAPAQKDDVENNKISNNV
TLDENEEQERKRKEFLERNRVAASKFRKRKKEYIKKIENDLQFYESEYDDLTQVIGKLCG
IIPSSSSNSQFNVNVSTPSSSSPPSTSLIALLESSISRSDYSSAMSVLSNMKQLICETNF
YRRGGKNPRDDMDGQEDSFNKDTNVVKSENAGYPSVNSRPIILDKKYSLNSGANISKSNT
TTNNVGNSAQNIINSCYSVTNPLVINANSDTHDTNKHDVLSTLPHNN
>YER045C
MDYKHNFATSPDSFLDGRQNPLLYTDFLSSNKELIYKQPSGPGLVDSAYNFHHQNSLHDR
SVQENLGPMFQPFGVDISHLPITNPPIFQSSLPAFDQPVYKRRISISNGQISQLGEDLET
VENLYNCQPPILSSKAQQNPNPQQVANPSAAIYPSFSSNELQNVPQPHEQATVIPEAAPQ
TGSKNIYAAMTPYDSNIKLNIPAVAATCDIPSATPSIPSGDSTMNQAYINMQLRLQAQMQ
TKAWKNAQLNVHPCTPASNSSVSSSSSCQNINDHNIENQSVHSSISHGVNHHTVNNSCQN
AELNISSSLPYESKCPDVNLTHANSKPQYKDATSALKNNINSEKDVHTAPFSSMHTTATF
QIKQEARPQKIENNTAGLKDGAKAWKRARLLERNRIAASKCRQRKKMSQLQLQREFDQIS
KENTMMKKKIENYEKLVQKMKKISRLHMQECTINGGNNSYQSLQNKDSDVNGFLKMIEEM
IRSSSLYDE
>YIR018W
MALPLIKPKESEESHLALLSKIHVSKNWKLPPRLPHRAAQRRKRVHRLHEDYETEENDEE
LQKKKRQNRDAQRAYRERKNNKLQVLEETIESLSKVVKNYETKLNRLQNELQAKESENHA
LKQKLETLTLKQASVPAQDPILQNLIENFKPMKAIPIKYNTAIKRHQHSTELPSSVKCGF
CNDNTTCVCKELETDHRKSDDGVATEQKDMSMPHAECNNKDNPNGLCSNCTNIDKSCIDI
RSIIH
>YHL009C
MTPSNMDDNTSGFMKFINPQCQEEDCCIRNSLFQEDSKCIKQQPDLLSEQTAPFPILEDQ
CPALNLDRSNNDLLLQNNISFPKGSDLQAIQLTPISGDYSTYVMADNNNNDNDSYSNTNY
FSKNNGISPSSRSPSVAHNENVPDDSKAKKKAQNRAAQKAFRERKEARMKELQDKLLESE
RNRQSLLKEIEELRKANTEINAENRLLLRSGNENFSKDIEDDTNYKYSFPTKDEFFTSMV
LESKLNHKGKYSLKDNEIMKRNTQYTDEAGRHVLTVPATWEYLYKLSEERDFDVTYVMSK
LQGQECCHTHGPAYPRSLIDFLVEEATLNE
>YOR028C
MLMQIKMDNHPFNFQPILASHSMTRDSTKPKKMTDTAFVPSPPVGFIKEENKADLHTISV
VASNVTLPQIQLPKIATLEEPGYESRTGSLTDLSGRRNSVNIGALCEDVPNTAGPHIARP
VTINNLIPPSLPRLNTYQLRPQLSDTHLNCHFNSNPYTTASHAPFESSYTTASTFTSQPA
ASYFPSNSTPATRKNSATTNLPSEERRRVSVSLSEQVFNEGERYNNDGQLIGKTGKPLRN
TKRAAQNRSAQKAFRQRREKYIKNLEEKSKLFDGLMKENSELKKMIESLKSKLKE*
>YEL009C
MSEYQPSLFALNPMGFSPLDGSKSTNENVSASTSTAKPMVGQLIFDKFIKTEEDPIIKQD
TPSNLDFDFALPQTATAPDAKTVLPIPELDDAVVESFFSSSTDSTPMFEYENLEDNSKEW
TSLFDNDIPVTTDDVSLADKAIESTEEVSLVPSNLEVSTTSFLPTPVLEDAKLTQTRKVK
KPNSVVKKSHHVGKDDESRLDHLGVVAYNRKQRSIPLSPIVPESSDPAALKRARNTEAAR
RSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER
>YDR259C
MQNPPLIRPDMYNQGSSSMATYNASEKNLNEHPSPQIAQPSTSQKLPYRINPTTTNGDTD
ISVNSNPIQPPLPNLMHLSGPSDYRSMHQSPIHPSYIIPPHSNERKQSASYNRPQNAHVS
IQPSVVFPPKSYSISYAPYQINPPLPNGLPNQSISLNKEYIAEEQLSTLPSRNTSVTTAP
PSFQNSADTAKNSADNNDNNDNVTKPVPDKDTQLISSSGKTLRNTRRAAQNRTAQKAFRQ
RKEKYIKNLEQKSKIFDDLLAENNNFKSLNDSLRNDNNILIAQHEAIRNAITMLRSEYDV
LCNENNMLKNENSIIKNEHNMSRNENENLKLENKRFHAEYIRMIEDIENTKRKEQEQRDE
IEQLKKKIRSLEEIVGRHSDSAT
>YFL031W
MEMTDFELTSNSQSNLAIPTNFKSTLPPRKRAKTKEEKEQRRIERILRNRRAAHQSREKK
RLHLQYLERKCSLLENLLNSVNLEKLADHEDALTCSHDAFVASLDEYRDFQSTRGASLDT
RASSHSSSDTFTPSPLNCTMEPATLSPKSMRDSASDQETSWELQMFKTENVPESTTLPAV
DNNNLFDAVASPLADPLCDDIAGNSLPFDNSIDLDNWRNPEAQSGLNSFELNDFFITS

From jason at bioperl.org  Thu Apr 26 19:27:24 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Apr 2007 12:27:24 -0700
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
In-Reply-To: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
References: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
Message-ID: <7C782DA2-5A80-413A-9B5A-94EEBEA9EF6E@bioperl.org>

Unfortunately there are some changes in the FASTA output in that  
version. The latest version of Bioperl 1.52 can handle it though so  
you'll need to upgrade Bioperl.

-jason
On Apr 26, 2007, at 3:18 AM, Aidan Budd wrote:

> Hi Bioperlers,
>
> I'm trying to parse a FASTA search output file (see attached .out  
> file)
> using Bioperl 1.4. My Bioperl installation has otherwise been working
> fine, however I currently get the following error when running a  
> simple
> script that attempts to access result from this outfile via bioperl.
>
> Is this a problem with the parser?
> Or have I executed FASTA wrongly creating output that isn't covered  
> by the
> parser?
>
> Any suggestions on how to deal with this much appreciated.
>
> Best wishes,
>
> Aidan
>
> Script:
>
> #!/usr/bin/perl -w
> $^W=1;
> use strict;
> use Bio::SearchIO;
>
> my $fasta_report = new Bio::SearchIO ('-format' => 'fasta',
>                                       '-file'   => $ARGV[0]);
>
> my $result = $fasta_report->next_result();
>
> Errors:
>
> Use of uninitialized value in concatenation (.) or string at
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> GenericHSP.pm
> line 231, <GEN3> line 47.
>
> ------------- EXCEPTION  -------------
> MSG: Did not specify a Query End or Query Begin -verbose 0 -algorithm
> FASTP -score 62.4 -hit_frame 0 -hsp_length 180 -hit_seq  -hit_length 0
> -query_length 128 -query_frame 0 -swscore 122 -rank 1 -query_seq
> GTTILQYAQTTDGQQILVPSNQVVVQAASGDVQTYQIRTAPTSTIAPGVVMASS-- 
> PALPTQPAEEAARKREVRLMKNREAARECRRKKKEYVKCLENRVAVLENQ-NKTLIEELKALKD- 
> LYCHKSD
> -homology_seq
> MEMTDFELTSNSQ.NL.IPTNFK.TLP.RKRAKTK..KEQR.IE.ILR..R..HQS.E..RLHLQY..RK 
> CSL...LL.SVNL.K.ADHE.A.T.SHDAFVASLDEYRDFQSTRGASLDTRASSHSSSDTFTPSPLNCTM 
> EPATLSPKSMR
> -hit_name YFL031W -bits 19.4 -query_name CREB1_MONKEY -evalue 1.1  
> (qs='
> STACK Bio::Search::HSP::GenericHSP::new
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> GenericHSP.pm:231
> STACK Bio::Search::HSP::FastaHSP::new
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Search/HSP/ 
> FastaHSP.pm:97
> STACK Bio::Factory::ObjectFactory::create_object
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/Factory/ 
> ObjectFactory.pm:150
> STACK Bio::SearchIO::SearchResultEventBuilder::end_hsp
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> SearchResultEventBuilder.pm:275
> STACK Bio::SearchIO::fasta::end_element
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> fasta.pm:872
> STACK Bio::SearchIO::fasta::next_result
> /Users/budd/perl_modules/bioperl_1_4/bioperl-1.4/Bio/SearchIO/ 
> fasta.pm:403
> STACK toplevel
> /Users/budd/scripts/test_scripts/test_parsing_fasta_output.pl:22
>
> --------------------------------------
>
> -- 
> ----------------------------------------------------------------------
> Aidan Budd, PhD                               tel:+49 (0)6221 387 8530
> EMBL - European Molecular Biology Laboratory  fax:+49 (0)6221 387 8517
> Meyerhofstr. 1, 69117 Heidelberg, Germany
>
> URL: http://www-db.embl.de/jss/EmblGroupsHD/per_1807.html
> <creb_vs_yeast_manual_fasta_changed_infile_formats.out>
> <creb1_human.fasta>
> <yeast_bzips_from_ensembl.fasta>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From dmessina at wustl.edu  Thu Apr 26 19:42:02 2007
From: dmessina at wustl.edu (David Messina)
Date: Thu, 26 Apr 2007 14:42:02 -0500
Subject: [Bioperl-l] problem parsing FASTA output - bug or my fault?
In-Reply-To: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
References: <Pine.LNX.4.44.0704261159590.28337-400000@bibo.EMBL-Heidelberg.DE>
Message-ID: <D41F5BDD-B992-4787-91C5-732B41683908@wustl.edu>

Hi Aidan,

Bioperl 1.4 is ~3 years old now, and FASTA output has probably  
changed since then. Your code should work if you install Bioperl  
1.5.2, the latest release.

	http://www.bioperl.org/wiki/Installing_BioPerl

Please let us know if that doesn't solve the problem.

Dave


From gopu_36 at yahoo.com  Fri Apr 27 01:29:03 2007
From: gopu_36 at yahoo.com (gopu_36)
Date: Thu, 26 Apr 2007 18:29:03 -0700 (PDT)
Subject: [Bioperl-l] check for the continous segments to extract the
	sequences
Message-ID: <10211951.post@talk.nabble.com>


As a newbee to programming, thx for the support from this group. Please
ignore the message if this message is not relevant to this group as my
problem may be a typical computer science recursive one! (as I am not aware)

I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000, 5001, 6000,
6001, 7000, 7001, 8000, 12001, 13000);
The above array gives the posiiton of sequences like '1' shows the start
position and the second element '1000' gives the end of the sequence and so
on. All the even positions like 0,2,4... shows the starting points of the
sequence and odd positions like 1000, 2000, 5000 gives the END positions of
the sequences to be retrieved. basically I have to see whwther any continous
segments lie in the list and add them together to form a one whole chunk.
For example 1-1000 and 1001-2000 can be joined together to extract sequences
from 1-2000. In the same way 4001-8000 should be extracted and 12001-13000
and so on. As I said earlier, after checking the position, I will be able to
extract that part of sequence from a whole genome. Thanks for taking ur
time. Any tip or help would be greatly appreciated.

Regards
Gopu 
-- 
View this message in context: http://www.nabble.com/check-for-the-continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From jason at bioperl.org  Fri Apr 27 01:54:59 2007
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 26 Apr 2007 18:54:59 -0700
Subject: [Bioperl-l] check for the continous segments to extract the
	sequences
In-Reply-To: <10211951.post@talk.nabble.com>
References: <10211951.post@talk.nabble.com>
Message-ID: <EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>

You want a connectivity algorithm.  One can be found on perlmonks.org  
as well as in Bio::Search::SearchUtils the method collapse_nums().  
You'll have to modify aspects of it to deal with ranges.

Good luck.
-jason
On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:

>
> As a newbee to programming, thx for the support from this group.  
> Please
> ignore the message if this message is not relevant to this group as my
> problem may be a typical computer science recursive one! (as I am  
> not aware)
>
> I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,  
> 5001, 6000,
> 6001, 7000, 7001, 8000, 12001, 13000);
> The above array gives the posiiton of sequences like '1' shows the  
> start
> position and the second element '1000' gives the end of the  
> sequence and so
> on. All the even positions like 0,2,4... shows the starting points  
> of the
> sequence and odd positions like 1000, 2000, 5000 gives the END  
> positions of
> the sequences to be retrieved. basically I have to see whwther any  
> continous
> segments lie in the list and add them together to form a one whole  
> chunk.
> For example 1-1000 and 1001-2000 can be joined together to extract  
> sequences
> from 1-2000. In the same way 4001-8000 should be extracted and  
> 12001-13000
> and so on. As I said earlier, after checking the position, I will  
> be able to
> extract that part of sequence from a whole genome. Thanks for  
> taking ur
> time. Any tip or help would be greatly appreciated.
>
> Regards
> Gopu
> -- 
> View this message in context: http://www.nabble.com/check-for-the- 
> continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason at bioperl.org
http://jason.open-bio.org/


From MEC at stowers-institute.org  Fri Apr 27 13:52:10 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 27 Apr 2007 08:52:10 -0500
Subject: [Bioperl-l] check for the continous segments to extract
	thesequences
In-Reply-To: <EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>
References: <10211951.post@talk.nabble.com>
	<EB2A0110-B09A-4E46-9EC6-487DACA3D988@bioperl.org>
Message-ID: <CED81D34E37D5043A1211565277A51E507E22F28@exchkc02.stowers-institute.org>

Gopu/Jason,

Another option is Set::IntSpan, available on CPAN at
http://search.cpan.org/~swmcd/Set-IntSpan-1.11/IntSpan.pm

Here's a perl one-liner that shows you how easy it is:

perl -MSet::IntSpan -e 'my @array = ( 1, 1000, 1001, 2000, 4001, 5000,
5001, 6000, 6001, 7000, 7001, 8000, 12001, 13000); my $is =
Set::IntSpan->new;  while (@array) {$is->U(shift(@array) . "-" .
shift(@array))}; print $is;'
1-2000,4001-8000,12001-13000

I use it all the time to great effect and have utility functions that
convert between bioperl split locations and IntSpans.

There is another module which extends it nicely, Set::IntSpan::Island,
worth a gander.

Cheers,

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Jason Stajich
> Sent: Thursday, April 26, 2007 8:55 PM
> To: gopu_36
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] check for the continous segments to 
> extract thesequences
> 
> You want a connectivity algorithm.  One can be found on 
> perlmonks.org  
> as well as in Bio::Search::SearchUtils the method collapse_nums().  
> You'll have to modify aspects of it to deal with ranges.
> 
> Good luck.
> -jason
> On Apr 26, 2007, at 6:29 PM, gopu_36 wrote:
> 
> >
> > As a newbee to programming, thx for the support from this group.  
> > Please
> > ignore the message if this message is not relevant to this 
> group as my
> > problem may be a typical computer science recursive one! (as I am  
> > not aware)
> >
> > I have an array like @array = (1, 1000, 1001, 2000, 4001, 5000,  
> > 5001, 6000,
> > 6001, 7000, 7001, 8000, 12001, 13000);
> > The above array gives the posiiton of sequences like '1' shows the  
> > start
> > position and the second element '1000' gives the end of the  
> > sequence and so
> > on. All the even positions like 0,2,4... shows the starting points  
> > of the
> > sequence and odd positions like 1000, 2000, 5000 gives the END  
> > positions of
> > the sequences to be retrieved. basically I have to see whwther any  
> > continous
> > segments lie in the list and add them together to form a one whole  
> > chunk.
> > For example 1-1000 and 1001-2000 can be joined together to extract  
> > sequences
> > from 1-2000. In the same way 4001-8000 should be extracted and  
> > 12001-13000
> > and so on. As I said earlier, after checking the position, I will  
> > be able to
> > extract that part of sequence from a whole genome. Thanks for  
> > taking ur
> > time. Any tip or help would be greatly appreciated.
> >
> > Regards
> > Gopu
> > -- 
> > View this message in context: http://www.nabble.com/check-for-the- 
> > continous-segments-to-extract-the-sequences-tf3655281.html#a10211951
> > Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason at bioperl.org
> http://jason.open-bio.org/
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From lstein at cshl.edu  Fri Apr 27 17:44:59 2007
From: lstein at cshl.edu (Lincoln Stein)
Date: Fri, 27 Apr 2007 13:44:59 -0400
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
References: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
Message-ID: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>

Hi Malcom,

This is absolutely ok and you can go ahead and commit. Thanks for figuring
this out!

Lincoln

On 4/26/07, Cook, Malcolm <MEC at stowers-institute.org> wrote:
>
> Lincoln, et al,
>
> I find that the gff3_string for Bio::DB::SeqFeature objects retreived
> from a Bio::DB::SeqFeature::Store that were initially created with
> -seqments (i.e. whose location was discontiguous) does not display any
> other attributes in column 9 than "Name".
>
> What do you think of the following patch to Bio::Graphics::FeatureBase,
> whose effect is to "contrive to return (duplicated) common group values"
> (which otherwise get lost when "collapsing" "homogenous" parent/child
> features)
>
> Another approach would be to copy the attributes from the parent to the
> children when the -seqments are first created.
>
> Another approach would be to use Bio::SeqFeature::Generic  as the db's
> -seqfeature_class and save with -location being a Bio::Location::Split,
> but this was wrougth with other problems.
>
> Any other suggestions?  Do you want me to commit this patch?
>
> Cheers,
>
> Malcolm
>
> Patch follows:
>
>
>
>
> Index: FeatureBase.pm
> ===================================================================
> RCS file:
> /home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
> retrieving revision 1.29
> diff -c -r1.29 FeatureBase.pm
> *** FeatureBase.pm      16 Apr 2007 19:55:33 -0000      1.29
> --- FeatureBase.pm      26 Apr 2007 16:30:23 -0000
> ***************
> *** 581,587 ****
>       foreach (@children) {
>         s/Parent=/ID=/g;
>       } # replace Parent tag with ID
> !     return join "\n", at children;
>     }
>
>     return join("\n",$p, at children);
> --- 581,589 ----
>       foreach (@children) {
>         s/Parent=/ID=/g;
>       } # replace Parent tag with ID
> !     #return join "\n", at children;
> !     # Instead of above, additionally, contrive to return (duplicated)
> common group values
> !     return(join("$group\n", at children) . $group);
>     }
>
>     return join("\n",$p, at children);
>


-- 
Lincoln D. Stein
Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724
(516) 367-8380 (voice)
(516) 367-8389 (fax)
FOR URGENT MESSAGES & SCHEDULING,
PLEASE CONTACT MY ASSISTANT,
SANDRA MICHELSEN, AT michelse at cshl.edu


From MEC at stowers-institute.org  Fri Apr 27 18:45:04 2007
From: MEC at stowers-institute.org (Cook, Malcolm)
Date: Fri, 27 Apr 2007 13:45:04 -0500
Subject: [Bioperl-l] Handling discontiguous feature locations in
	Bio::DB::SeqFeature::Store -- proposed patch to
	Bio::Graphics::FeatureBase
In-Reply-To: <6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
References: <CED81D34E37D5043A1211565277A51E507E22EFB@exchkc02.stowers-institute.org>
	<6dce9a0b0704271044w2484708n949b00c65dc841dc@mail.gmail.com>
Message-ID: <CED81D34E37D5043A1211565277A51E507E22F59@exchkc02.stowers-institute.org>

Hi Lincoln,
 
Cool.
 
The principal of what I figured out I still think holds but the
implementation is slightly broke.  Improved patch forthoming next week.
 

Malcolm Cook
Database Applications Manager - Bioinformatics
Stowers Institute for Medical Research - Kansas City, Missouri
  

________________________________

	From: lincoln.stein at gmail.com [mailto:lincoln.stein at gmail.com]
On Behalf Of Lincoln Stein
	Sent: Friday, April 27, 2007 12:45 PM
	To: Cook, Malcolm
	Cc: lstein at cshl.org; bioperl list
	Subject: Re: Handling discontiguous feature locations in
Bio::DB::SeqFeature::Store -- proposed patch to
Bio::Graphics::FeatureBase
	
	
	Hi Malcom,
	
	This is absolutely ok and you can go ahead and commit. Thanks
for figuring this out!
	
	Lincoln
	
	
	On 4/26/07, Cook, Malcolm < MEC at stowers-institute.org
<mailto:MEC at stowers-institute.org> > wrote: 

		Lincoln, et al,
		
		I find that the gff3_string for Bio::DB::SeqFeature
objects retreived 
		from a Bio::DB::SeqFeature::Store that were initially
created with
		-seqments (i.e. whose location was discontiguous) does
not display any
		other attributes in column 9 than "Name".
		
		What do you think of the following patch to
Bio::Graphics::FeatureBase, 
		whose effect is to "contrive to return (duplicated)
common group values"
		(which otherwise get lost when "collapsing" "homogenous"
parent/child
		features)
		
		Another approach would be to copy the attributes from
the parent to the 
		children when the -seqments are first created.
		
		Another approach would be to use
Bio::SeqFeature::Generic  as the db's
		-seqfeature_class and save with -location being a
Bio::Location::Split,
		but this was wrougth with other problems. 
		
		Any other suggestions?  Do you want me to commit this
patch?
		
		Cheers,
		
		Malcolm
		
		Patch follows:
		
		
		Index: FeatureBase.pm
	
=================================================================== 
		RCS file:
	
/home/repository/bioperl/bioperl-live/Bio/Graphics/FeatureBase.pm,v
		retrieving revision 1.29
		diff -c -r1.29 FeatureBase.pm
		*** FeatureBase.pm      16 Apr 2007 19:55:33 -0000
1.29
		--- FeatureBase.pm       26 Apr 2007 16:30:23 -0000
		***************
		*** 581,587 ****
		      foreach (@children) {
		        s/Parent=/ID=/g;
		      } # replace Parent tag with ID
		!     return join "\n", at children;
		    }
		
		    return join("\n",$p, at children);
		--- 581,589 ----
		      foreach (@children) {
		        s/Parent=/ID=/g;
		      } # replace Parent tag with ID
		!     #return join "\n", at children; 
		!     # Instead of above, additionally, contrive to
return (duplicated)
		common group values
		!     return(join("$group\n", at children) . $group);
		    }
		
		    return join("\n",$p, at children); 
		

	-- 
	Lincoln D. Stein
	Cold Spring Harbor Laboratory
	1 Bungtown Road
	Cold Spring Harbor, NY 11724
	(516) 367-8380 (voice)
	(516) 367-8389 (fax)
	FOR URGENT MESSAGES & SCHEDULING, 
	PLEASE CONTACT MY ASSISTANT, 
	SANDRA MICHELSEN, AT michelse at cshl.edu 


From bernd at kirx.de  Sat Apr 28 14:36:07 2007
From: bernd at kirx.de (Bernd Mueller)
Date: Sat, 28 Apr 2007 16:36:07 +0200
Subject: [Bioperl-l] bioperl::db
Message-ID: <46335BD7.8040306@kirx.de>

Hi,

I followed those instructions on bioperl.org for installing bioperl via 
cpan. But actually it is impossible for me to install the bioperl::db 
module.

How does this work?

Moreover none of these Birney distribution are installable on my system. 
After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always 
fail. So I have to install the CRAFFI bundle but it does not seem that 
Bio::DB module is included in this bundle because my programs using that 
module do not work.

Help would be appreciated :)

Cheers,
Bernd

Appendix:

cpan[6]> d /bioperl/
Distribution    BIRNEY/bioperl-1.2.1.tar.gz
Distribution    BIRNEY/bioperl-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-1.2.3.tar.gz
Distribution    BIRNEY/bioperl-1.2.tar.gz
Distribution    BIRNEY/bioperl-1.4.tar.gz
Distribution    BIRNEY/bioperl-db-0.1.tar.gz
Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
Distribution    BIRNEY/bioperl-run-1.4.tar.gz
Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
12 items found


-- 
Dipl.-Inform.(FH)
Bernd Mueller
phone: +49 179 2336692
email: bernd at kirx.de


From cydeweys at gmail.com  Sun Apr 29 13:43:55 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 09:43:55 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
Message-ID: <4634A11B.6090809@umd.edu>

I'm trying to load up a table of codon usage frequencies I've downloaded
from the web using Bio::CodonUsage::IO.  My code looks like this:

    use Bio::CodonUsage::Table;
    use Bio::CodonUsage::IO;
    # ...
    my $io = Bio::CodonUsage::IO->new(-file=>$freqFile);
    my $codonTable = $io->next_data();

Unfortunately, I can't seem to find any documentation on what format the
codon usage table file is expected to be in, and all of my best guesses
seem to be invalid, yielding the following error message:

-------------------- WARNING ---------------------
MSG: probable parsing error - should be 21 entries for 20aa + stop codon
---------------------------------------------------

I've tried using both formats that are available from the Codon Usage
Database (easily the largest source of codon usage frequencies),
available here: http://www.kazusa.or.jp/codon/

The two formats I've tried and failed look like this:

UUU 32.5( 45732)  UCU 15.3( 21588)  UAU 27.8( 39146)  UGU  6.3(  8796)
UUC 14.3( 20101)  UCC  3.2(  4458)  UAC  9.3( 13016)  UGC  2.1(  2971)
...


AND

AmAcid  Codon      Number    /1000     Fraction   ..

Gly     GGG     13198.00      9.38      0.14
Gly     GGA     34123.00     24.26      0.36
...


So, anyone know how to get this downloaded codon usage data loaded up
into a Bio::CodonUsage::Table object?  Bio::CodonUsage::IO doesn't seem
to like parsing the standard formats.  Thanks.


From cjfields at uiuc.edu  Sun Apr 29 14:05:59 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 09:05:59 -0500
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <4634A11B.6090809@umd.edu>
References: <4634A11B.6090809@umd.edu>
Message-ID: <469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>

One example file (MmCT) can be found in the test data directory in  
the bioperl distribution (t/data directory) and some tests relevant  
to codon table usage are found in DBCUTG.t.

chris

On Apr 29, 2007, at 8:43 AM, Ben McIlwain wrote:

> I'm trying to load up a table of codon usage frequencies I've  
> downloaded
> from the web using Bio::CodonUsage::IO.  My code looks like this:
>
>     use Bio::CodonUsage::Table;
>     use Bio::CodonUsage::IO;
>     # ...
>     my $io = Bio::CodonUsage::IO->new(-file=>$freqFile);
>     my $codonTable = $io->next_data();
>
> Unfortunately, I can't seem to find any documentation on what  
> format the
> codon usage table file is expected to be in, and all of my best  
> guesses
> seem to be invalid, yielding the following error message:
>
> -------------------- WARNING ---------------------
> MSG: probable parsing error - should be 21 entries for 20aa + stop  
> codon
> ---------------------------------------------------
>
> I've tried using both formats that are available from the Codon Usage
> Database (easily the largest source of codon usage frequencies),
> available here: http://www.kazusa.or.jp/codon/
>
> The two formats I've tried and failed look like this:
>
> UUU 32.5( 45732)  UCU 15.3( 21588)  UAU 27.8( 39146)  UGU  6.3(  8796)
> UUC 14.3( 20101)  UCC  3.2(  4458)  UAC  9.3( 13016)  UGC  2.1(  2971)
> ...
>
>
> AND
>
> AmAcid  Codon      Number    /1000     Fraction   ..
>
> Gly     GGG     13198.00      9.38      0.14
> Gly     GGA     34123.00     24.26      0.36
> ...
>
>
> So, anyone know how to get this downloaded codon usage data loaded up
> into a Bio::CodonUsage::Table object?  Bio::CodonUsage::IO doesn't  
> seem
> to like parsing the standard formats.  Thanks.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cydeweys at gmail.com  Sun Apr 29 14:06:12 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 10:06:12 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
Message-ID: <4634A654.7010708@gmail.com>

Chris Fields wrote:
> One example file (MmCT) can be found in the test data directory in the
> bioperl distribution (t/data directory) and some tests relevant to codon
> table usage are found in DBCUTG.t.

I still get the same warning message even when running on the given test
data?  That doesn't sound right.

-------------------- WARNING ---------------------
MSG: probable parsing error - should be 21 entries for 20aa + stop codon
---------------------------------------------------


From cjfields at uiuc.edu  Sun Apr 29 21:50:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 16:50:15 -0500
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <4634A654.7010708@gmail.com>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
	<4634A654.7010708@gmail.com>
Message-ID: <DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>

Odd, when I run 'perl -I. t/DBCUTG.t' from CVS it works fine.  Of  
course, I am assuming that you are running the latest release (1.5.2).

Could you post a bug report with a script that generates the error?

chris

On Apr 29, 2007, at 9:06 AM, Ben McIlwain wrote:

> Chris Fields wrote:
>> One example file (MmCT) can be found in the test data directory in  
>> the
>> bioperl distribution (t/data directory) and some tests relevant to  
>> codon
>> table usage are found in DBCUTG.t.
>
> I still get the same warning message even when running on the given  
> test
> data?  That doesn't sound right.
>
> -------------------- WARNING ---------------------
> MSG: probable parsing error - should be 21 entries for 20aa + stop  
> codon
> ---------------------------------------------------
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cydeweys at gmail.com  Sun Apr 29 22:15:32 2007
From: cydeweys at gmail.com (Ben McIlwain)
Date: Sun, 29 Apr 2007 18:15:32 -0400
Subject: [Bioperl-l] What file format does Bio::CodonUsage::IO expect?
In-Reply-To: <DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>
References: <4634A11B.6090809@umd.edu>
	<469A610B-90DE-451A-BDEE-688634DB735E@uiuc.edu>
	<4634A654.7010708@gmail.com>
	<DA2592CF-04C3-4F6A-AEC3-7F781B070DC8@uiuc.edu>
Message-ID: <46351904.4070202@gmail.com>

Chris Fields wrote:
> Odd, when I run 'perl -I. t/DBCUTG.t' from CVS it works fine.  Of
> course, I am assuming that you are running the latest release (1.5.2).
> 
> Could you post a bug report with a script that generates the error?

Sorry, it was my mistake.  I had turned off warnings and strict earlier
for debugging purposes and then forgot to turn them back on.  It turns
out I was trying to read in the codon frequencies when the filename was
an uninitialized string variable (I typoed the name).  Whoops.  Now that
I've spelled the variable name correctly, it is working.


From bernd at kirx.de  Sun Apr 29 22:57:53 2007
From: bernd at kirx.de (Bernd Mueller)
Date: Mon, 30 Apr 2007 00:57:53 +0200
Subject: [Bioperl-l] bioperl::db
In-Reply-To: <46335BD7.8040306@kirx.de>
References: <46335BD7.8040306@kirx.de>
Message-ID: <463522F1.2010406@kirx.de>

Hello list,

I figured out my problem. Actually it was because of problems in the 
versioning of bioperl. It is described to figure out the available 
versions of bioperl in CPAN but afterwards it is described to install a 
much higher version wich is not listed as distribution in CPAN. So it 
works fine now. Thanks anyway. Proficiency in reading results in success ;-)

But I have another question: Does anyone know how to retrieve free 
fulltext documents with EUtilities from Pubmed Central? All my queries 
result in a corpora of free and non-free articles.

Thanks and regards,

Bernd


Bernd Mueller wrote:
> Hi,
> 
> I followed those instructions on bioperl.org for installing bioperl via 
> cpan. But actually it is impossible for me to install the bioperl::db 
> module.
> 
> How does this work?
> 
> Moreover none of these Birney distribution are installable on my system. 
> After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always 
> fail. So I have to install the CRAFFI bundle but it does not seem that 
> Bio::DB module is included in this bundle because my programs using that 
> module do not work.
> 
> Help would be appreciated :)
> 
> Cheers,
> Bernd
> 
> Appendix:
> 
> cpan[6]> d /bioperl/
> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
> Distribution    BIRNEY/bioperl-1.2.tar.gz
> Distribution    BIRNEY/bioperl-1.4.tar.gz
> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
> 12 items found
> 
> 

-- 
Dipl.-Inform.(FH)
Bernd Mueller
phone: +49 179 2336692
email: bernd at kirx.de


From cjfields at uiuc.edu  Mon Apr 30 00:16:11 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:16:11 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
Message-ID: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>

Allen (or anyone),

What is the status of this module?  It requires a module not listed  
in the dependencies (WWW:Mechanize) and has no tests.

chris


From allenday at ucla.edu  Mon Apr 30 00:21:19 2007
From: allenday at ucla.edu (Allen Day)
Date: Sun, 29 Apr 2007 17:21:19 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
Message-ID: <5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>

Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
ago.  I only implemented for a few journals, so it never worked for a
large fraction of publications.  Probably it barely works or does not
work at all now b/c of how the PDF are scraped out of the HTML.  The
publisher sites are always modifying their HTML, presumably trying to
prevent automated download like this.

-Allen

On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Allen (or anyone),
>
> What is the status of this module?  It requires a module not listed
> in the dependencies (WWW:Mechanize) and has no tests.
>
> chris
>


From cjfields at uiuc.edu  Mon Apr 30 00:28:47 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:28:47 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
Message-ID: <29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>

Quick response!  Yep, I've run into this with a few publishers.   
Though they're supposed to have 'permanent' links for those of us who  
like to link to our pubs they frequently change (scary if that's  
their definition of permanent).

Did you want us to remove the code from CVS?

chris

On Apr 29, 2007, at 7:21 PM, Allen Day wrote:

> Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
> ago.  I only implemented for a few journals, so it never worked for a
> large fraction of publications.  Probably it barely works or does not
> work at all now b/c of how the PDF are scraped out of the HTML.  The
> publisher sites are always modifying their HTML, presumably trying to
> prevent automated download like this.
>
> -Allen
>
> On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
>> Allen (or anyone),
>>
>> What is the status of this module?  It requires a module not listed
>> in the dependencies (WWW:Mechanize) and has no tests.
>>
>> chris
>>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Mon Apr 30 00:31:15 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 29 Apr 2007 19:31:15 -0500
Subject: [Bioperl-l] PMC and EUtilities, was  bioperl::db
In-Reply-To: <463522F1.2010406@kirx.de>
References: <46335BD7.8040306@kirx.de> <463522F1.2010406@kirx.de>
Message-ID: <01DC1D72-8AFA-4C11-8795-D4787506C602@uiuc.edu>

There may be a way to limit the initial query to full text docs from  
esearch, then use the history to retrieve only the XML docs you  
want.  Is that what you mean?

BioPerl-based access to PMC is limited at best.  Bio::DB::EUtilities  
only returns raw PMC XML with no post-processing of raw data (for  
good reason, as EUtilities is meant to be an intermediate step).   
Allen Day's Bio::DB::Biblio::eutils module supposedly allows PMC  
queries.  I'm also pretty sure that PubMedXML != PMC XML, in other  
words the Bio::Biblio XML format parsers may not work on PMC XML.

chris

On Apr 29, 2007, at 5:57 PM, Bernd Mueller wrote:

> Hello list,
>
> I figured out my problem. Actually it was because of problems in the
> versioning of bioperl. It is described to figure out the available
> versions of bioperl in CPAN but afterwards it is described to  
> install a
> much higher version wich is not listed as distribution in CPAN. So it
> works fine now. Thanks anyway. Proficiency in reading results in  
> success ;-)
>
> But I have another question: Does anyone know how to retrieve free
> fulltext documents with EUtilities from Pubmed Central? All my queries
> result in a corpora of free and non-free articles.
>
> Thanks and regards,
>
> Bernd
>
>
> Bernd Mueller wrote:
>> Hi,
>>
>> I followed those instructions on bioperl.org for installing  
>> bioperl via
>> cpan. But actually it is impossible for me to install the bioperl::db
>> module.
>>
>> How does this work?
>>
>> Moreover none of these Birney distribution are installable on my  
>> system.
>> After typing cpan> install BIRNEY/bioperl-x.x.x.x, the tests always
>> fail. So I have to install the CRAFFI bundle but it does not seem  
>> that
>> Bio::DB module is included in this bundle because my programs  
>> using that
>> module do not work.
>>
>> Help would be appreciated :)
>>
>> Cheers,
>> Bernd
>>
>> Appendix:
>>
>> cpan[6]> d /bioperl/
>> Distribution    BIRNEY/bioperl-1.2.1.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.2.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.3.tar.gz
>> Distribution    BIRNEY/bioperl-1.2.tar.gz
>> Distribution    BIRNEY/bioperl-1.4.tar.gz
>> Distribution    BIRNEY/bioperl-db-0.1.tar.gz
>> Distribution    BIRNEY/bioperl-ext-1.4.tar.gz
>> Distribution    BIRNEY/bioperl-gui-0.7.tar.gz
>> Distribution    BIRNEY/bioperl-run-1.2.2.tar.gz
>> Distribution    BIRNEY/bioperl-run-1.4.tar.gz
>> Distribution    BOZO/Fry-Lib-BioPerl-0.15.tar.gz
>> Distribution    CRAFFI/Bundle-BioPerl-2.1.8.tar.gz
>> 12 items found
>>
>>
>
> -- 
> Dipl.-Inform.(FH)
> Bernd Mueller
> phone: +49 179 2336692
> email: bernd at kirx.de
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From allenday at ucla.edu  Mon Apr 30 00:57:55 2007
From: allenday at ucla.edu (Allen Day)
Date: Sun, 29 Apr 2007 17:57:55 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
Message-ID: <5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>

Doesn't matter to me if it stays or not.  If you're cleaning house
feel free to get rid of it.

-Allen

On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> Quick response!  Yep, I've run into this with a few publishers.
> Though they're supposed to have 'permanent' links for those of us who
> like to link to our pubs they frequently change (scary if that's
> their definition of permanent).
>
> Did you want us to remove the code from CVS?
>
> chris
>
> On Apr 29, 2007, at 7:21 PM, Allen Day wrote:
>
> > Incomplete.  I wrote it to do some bulk scraping of PDFs a few years
> > ago.  I only implemented for a few journals, so it never worked for a
> > large fraction of publications.  Probably it barely works or does not
> > work at all now b/c of how the PDF are scraped out of the HTML.  The
> > publisher sites are always modifying their HTML, presumably trying to
> > prevent automated download like this.
> >
> > -Allen
> >
> > On 4/29/07, Chris Fields <cjfields at uiuc.edu> wrote:
> >> Allen (or anyone),
> >>
> >> What is the status of this module?  It requires a module not listed
> >> in the dependencies (WWW:Mechanize) and has no tests.
> >>
> >> chris
> >>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>


From cjfields at uiuc.edu  Mon Apr 30 15:15:16 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Apr 2007 10:15:16 -0500
Subject: [Bioperl-l] PMC and EUtilities, was  bioperl::db
In-Reply-To: <4635B1BD.9030402@kirx.de>
References: <46335BD7.8040306@kirx.de> <463522F1.2010406@kirx.de>
	<01DC1D72-8AFA-4C11-8795-D4787506C602@uiuc.edu>
	<4635B1BD.9030402@kirx.de>
Message-ID: <D11CE380-EDEC-4F7F-80EA-09D915EA79F0@uiuc.edu>

Bernd,

As a pretext to this discussion, I am in the middle of refactoring  
EUtilities; the next incarnation should have a similar API but will  
likely set parameters via simpler methods (no need for all the getter/ 
setters).

You'll likely have to parse out the tags yourself, AFAIK there is no  
BioPerl XML parser for PMC XML and a quick grep search turns up  
nothing but PubMed parsers.  If you aren't familiar with XML parsing  
you could try XML::Simple to get at what you want.  I would pass the  
XML in as small chunks (maybe by retrieving them in batches of 100 or  
less) and initially use Data::Dumper to determine the data structure  
XML::Simple returns (PMC XML has attributes and elements, so the  
structure will be more complex).  Then just iterate through articles  
and grab what you want.

I think the predominant portion of articles in PubMed Central are  
free full-text access (if not all):

http://www.pubmedcentral.nih.gov/about/faq.html#q9

You can retrieve them via ftp:

ftp://ftp.ncbi.nlm.nih.gov/pub/pmc

which contains an index file of all articles and their dir. location  
(the readme gives more info).

chris

On Apr 30, 2007, at 4:07 AM, Bernd Mueller wrote:

> Hello,
>
> I think so. The ids from my wanted articles are retrieved by  
> Bio::DB::EUtilities::esearch. Afterwards I download the articles  
> with Bio::DB::EUtilities::efetch. It is only possible to download  
> in XML format from PMC. So post processing is actually needed  
> because I want the articles in plain format.
>
> But I don't know why I have results of non-free articles, i.e.  
> abstracts where full articles should be found with a query  
> constraining to only free fulltext. In the query I limit the search  
> with the filter "AND free fulltext[filter]".Probably this is a  
> matter concerning not directly bioperl but the eutilities interface  
> of PMC.
>
> Regards,
> Bernd


From allenday at ucla.edu  Mon Apr 30 16:44:12 2007
From: allenday at ucla.edu (Allen Day)
Date: Mon, 30 Apr 2007 09:44:12 -0700
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <4635FDD8.8030704@jouy.inra.fr>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
	<5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>
	<4635FDD8.8030704@jouy.inra.fr>
Message-ID: <5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>

DOI is definitely the right way to do this.  It wasn't implemented
widely at the time I wrote this module.

-Allen

On 4/30/07, St?phane T?letch?a <steletch at jouy.inra.fr> wrote:
> Allen Day a ?crit :
> > Doesn't matter to me if it stays or not.  If you're cleaning house
> > feel free to get rid of it.
> >
> > -Allen
> >
>
> I've worked on something on the other way around: get information about
> a pdf from the DOI if present. Most recent publications do have a doi,
> and i use this as a target for my request.
>
> This does not solve the problem, but may help others, feel free to ask
> if it can help the ongoing work, the code is quite dirty ...
>
> St?phane
>
>
> --
> St?phane T?letch?a, PhD.                  http://www.steletch.org
> Unit? Math?matique Informatique et G?nome http://migale.jouy.inra.fr/mig
> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>


From cjfields at uiuc.edu  Mon Apr 30 17:55:01 2007
From: cjfields at uiuc.edu (Chris Fields)
Date: Mon, 30 Apr 2007 12:55:01 -0500
Subject: [Bioperl-l] Bio::DB::Biblio::PDF - incomplete?
In-Reply-To: <5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>
References: <F2EB482D-1545-44E2-BEC6-4B7B40DE1DFB@uiuc.edu>
	<5c24dcc30704291721h3664c8afl848cfa482a1c10d8@mail.gmail.com>
	<29AD199B-5A31-43F7-B252-0967C25A9658@uiuc.edu>
	<5c24dcc30704291757l6cc4148tc41b2890bb161277@mail.gmail.com>
	<4635FDD8.8030704@jouy.inra.fr>
	<5c24dcc30704300944p5641970kcd120c5f3db381d2@mail.gmail.com>
Message-ID: <34F19F02-1B7B-41A1-90B1-F373C49BC012@uiuc.edu>

Agreed; even some seq. records may have DOI now.  PubMed and PMC XML  
contain this, so it is possible to parse the DOI out if one were  
inclined to incorporate this into Bio::Biblio (I added a doi() getter/ 
setter into Bio::Annotation::Reference a few months back).

chris

On Apr 30, 2007, at 11:44 AM, Allen Day wrote:

> DOI is definitely the right way to do this.  It wasn't implemented
> widely at the time I wrote this module.
>
> -Allen
>
> On 4/30/07, St?phane T?letch?a <steletch at jouy.inra.fr> wrote:
>> Allen Day a ?crit :
>>> Doesn't matter to me if it stays or not.  If you're cleaning house
>>> feel free to get rid of it.
>>>
>>> -Allen
>>>
>>
>> I've worked on something on the other way around: get information  
>> about
>> a pdf from the DOI if present. Most recent publications do have a  
>> doi,
>> and i use this as a target for my request.
>>
>> This does not solve the problem, but may help others, feel free to  
>> ask
>> if it can help the ongoing work, the code is quite dirty ...
>>
>> St?phane
>>
>>
>> --
>> St?phane T?letch?a, PhD.                  http://www.steletch.org
>> Unit? Math?matique Informatique et G?nome http:// 
>> migale.jouy.inra.fr/mig
>> INRA, Domaine de Vilvert                  T?l : (33) 134 652 891
>> 78352 Jouy-en-Josas cedex, France         Fax : (33) 134 652 901
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign


From gdorjee at hotmail.com  Mon Apr 30 20:05:45 2007
From: gdorjee at hotmail.com (DeeGee)
Date: Mon, 30 Apr 2007 13:05:45 -0700 (PDT)
Subject: [Bioperl-l] generate a fasta file from the blast report
Message-ID: <10259461.post@talk.nabble.com>


hi all,
if i have the following script working on my blast report, can anyone plz
tell me how can i generate a fasta format file of just the hits (subject)
sequence.
thanks alot.
 
use strict;
 use Bio::SearchIO;
   
    my $in = new Bio::SearchIO(-format => 'blast', 
                               -file   => 'report.bls');
    while( my $result = $in->next_result ) {
      while( my $hit = $result->next_hit ) {
        while( my $hsp = $hit->next_hsp ) {
          if( $hsp->length('total') > 100 &&
              $hsp->percent_identity >= 75 ) {
              print "Hit= ", $hit->name, 
                    ", len=",$hsp->length('total'), 
                    ", percent_id=", $hsp->percent_identity, "\n";
          }
        }  
      }
    }
-- 
View this message in context: http://www.nabble.com/generate-a-fasta-file-from-the-blast-report-tf3671549.html#a10259461
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From Francoise.LECOMTE at biogemma.com  Mon Apr 30 10:35:03 2007
From: Francoise.LECOMTE at biogemma.com (Francoise.LECOMTE at biogemma.com)
Date: Mon, 30 Apr 2007 12:35:03 +0200
Subject: [Bioperl-l] Pb makefile
Message-ID: <OF183C15DF.0D5F2AA0-ONC12572CD.0039B57A-C12572CD.003A3585@LGLimagrain.com>

Hi
I try to install biopoerl1.4 on Tru64 plateform and I've got a message 
"make:line too long" when I run the command make install
How can I solve it ? How disable man pages installaton in Makefile.PL if 
it can sove this problem 

Best regards 

Fran?oise Lecomte