From avilella at gmail.com  Sat Jan  2 03:57:28 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sat, 2 Jan 2010 08:57:28 +0000
Subject: [Bioperl-l] Downloading from dbEST by taxon range
Message-ID: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>

Hi all and happy 2010 for those that follow the Gregorian calendar,

A question that is a bit in between bioperl and NCBI. I would like to use
bioperl to download sequences fom dbEST. For that, my idea is to use
Bio::DB::Genbank and get the sequences by gi id.

Now, I want my script to download sequences for a given NCBI taxonomy clade.

For example, if I want to download all fish (clupeocephala) sequences in dbEST,
I can browse it around with the dbEST webpage using "clupeocephala[taxonomy]",
so I am thinking there should be a way to do it programmatically.

How can I query NCBI dbEST through bioperl to give me the list of GI ids I am
looking for given a taxon id?

Thanks in advance,

Albert.

From jason at bioperl.org  Sat Jan  2 11:35:22 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 2 Jan 2010 08:35:22 -0800
Subject: [Bioperl-l] Downloading from dbEST by taxon range
In-Reply-To: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
Message-ID: <D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>

DId you try Bio::DB::Query::GenBank ?
You'd want to use -db => 'nucest' and then you just put in an Entrez  
query as per the example.  you can include dates in the query so you  
can do updates to your locally retrieved data in a script that runs  
periodically.

-jason
On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:

> Hi all and happy 2010 for those that follow the Gregorian calendar,
>
> A question that is a bit in between bioperl and NCBI. I would like  
> to use
> bioperl to download sequences fom dbEST. For that, my idea is to use
> Bio::DB::Genbank and get the sequences by gi id.
>
> Now, I want my script to download sequences for a given NCBI  
> taxonomy clade.
>
> For example, if I want to download all fish (clupeocephala)  
> sequences in dbEST,
> I can browse it around with the dbEST webpage using  
> "clupeocephala[taxonomy]",
> so I am thinking there should be a way to do it programmatically.
>
> How can I query NCBI dbEST through bioperl to give me the list of GI  
> ids I am
> looking for given a taxon id?
>
> Thanks in advance,
>
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From avilella at gmail.com  Sun Jan  3 04:08:33 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 3 Jan 2010 09:08:33 +0000
Subject: [Bioperl-l] Downloading from dbEST by taxon range
In-Reply-To: <D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>
References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
	<D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>
Message-ID: <358f4d651001030108p6a92fb27k5fa39be6bebb3a9c@mail.gmail.com>

Thanks Jason!
For the sake of completion, here is the script I needed:

---------------------
#!/usr/bin/perl
use strict;

use Bio::SeqIO;
use Bio::DB::Taxonomy;
use Bio::DB::Query::GenBank;
use Bio::DB::GenBank;
use Bio::SeqIO;
use Getopt::Long;

my $keyword_type = 'EST';
my $outdir = '.';
my $taxon_name = undef;
my $db_type = 'nucest';

GetOptions('keyword_type:s' => \$keyword_type,
           't|taxon_name:s' => \$taxon_name,
           'db_type:s' => \$db_type,
           'outdir:s' => \$outdir);

my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]";
my $db = Bio::DB::Query::GenBank->new
  (-db => $db_type,
   -query => $query_string,
   -mindate => '2007',
   -maxdate => '2010');

my $taxon_name_string = $taxon_name; $taxon_name_string =~ s/\ /\_/g;
my $outfile = $outdir . "/" . $taxon_name_string . ".". $db_type . ".fasta";
my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta');

print $db->count,"\n";
my $gb = Bio::DB::GenBank->new();
my $stream = $gb->get_Stream_by_query($db);
while (my $seq = $stream->next_seq) {
  # Filtering reads shorter than 800
  next unless (length($seq->seq) > 800);
  $out->write_seq($seq);
}
$out->close;
---------------------

On Sat, Jan 2, 2010 at 4:35 PM, Jason Stajich <jason at bioperl.org> wrote:
> DId you try Bio::DB::Query::GenBank ?
> You'd want to use -db => 'nucest' and then you just put in an Entrez query
> as per the example. ?you can include dates in the query so you can do
> updates to your locally retrieved data in a script that runs periodically.
>
> -jason
> On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:
>
>> Hi all and happy 2010 for those that follow the Gregorian calendar,
>>
>> A question that is a bit in between bioperl and NCBI. I would like to use
>> bioperl to download sequences fom dbEST. For that, my idea is to use
>> Bio::DB::Genbank and get the sequences by gi id.
>>
>> Now, I want my script to download sequences for a given NCBI taxonomy
>> clade.
>>
>> For example, if I want to download all fish (clupeocephala) sequences in
>> dbEST,
>> I can browse it around with the dbEST webpage using
>> "clupeocephala[taxonomy]",
>> so I am thinking there should be a way to do it programmatically.
>>
>> How can I query NCBI dbEST through bioperl to give me the list of GI ids I
>> am
>> looking for given a taxon id?
>>
>> Thanks in advance,
>>
>> Albert.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From Jean-Marc.Frigerio at pierroton.inra.fr  Mon Jan  4 09:12:18 2010
From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA)
Date: Mon, 04 Jan 2010 15:12:18 +0100
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
Message-ID: <4B41F742.2030209@pierroton.inra.fr>

> Message: 1
> Date: Thu, 31 Dec 2009 11:26:45 +1800
> From: Peng Yu <pengyu.ut at gmail.com>
> Subject: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: bioperl-l at lists.open-bio.org
> Message-ID:
> 	<366c6f340912300926k5af5cc88nc3c3babda541fd1 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 30 Dec 2009 13:04:53 -0500
> From: Sean Davis <sdavis2 at mail.nih.gov>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: "bioperl-l at lists.open-bio.org" <bioperl-l at lists.open-bio.org>
> Message-ID:
> 	<264855a00912301004t396e0d4fwf9d291c5d82c3fb9 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
> 
> In perl, you can use an array to store the records.  You could also
> use a hash if you have reasonable keys for the entries.
> 
> Sean
> 
> 
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 30 Dec 2009 11:58:54 -0800
> From: Jason Stajich <jason at bioperl.org>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: BioPerl List <bioperl-l at lists.open-bio.org>
> Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B at bioperl.org>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> 
> or use a database object so you can retrieve sequences that have a  
> particular id. See Bio::DB::Fasta
> On Dec 30, 2009, at 10:04 AM, Sean Davis wrote:
> 
>> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> With Bio::SeqIO, I can only read in the records in a fasta file one  
>>> by
>>> one. This is preferable if there are many records in a file.
>>>
>>> But I also want to read all the records in. I could use a while loop
>>> to read all records in. But could somebody let me know if there is a
>>> function in bioperl that can read in all the record at once and  
>>> return
>>> me an object?
>> In perl, you can use an array to store the records.  You could also
>> use a hash if you have reasonable keys for the entries.
>>
>> Sean
>>
>>
>>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Wed, 30 Dec 2009 16:20:31 -0500
> From: "Mark A. Jensen" <maj at fortinbras.us>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: "Peng Yu" <pengyu.ut at gmail.com>, <bioperl-l at lists.open-bio.org>
> Message-ID: <2646F627E6D14AADB412A6E6B51E24DA at NewLife>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
> 
> I think you might want Bio::AlignIO:
> 
> $alnio = Bio::AlignIO->new(-file=> 'my.fas' );
> $aln = $alnio->next_aln;
> @seqs = $aln->each_seqs;
> 
> MAJ
> ----- Original Message ----- 
> From: "Peng Yu" <pengyu.ut at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 30, 2009 12:26 PM
> Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
> 
> 
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
>>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Hi,

I wrote and currently use a module I named Bio::SeqIO::multifasta, which 
is basically a copy of Bio::SeqIO::fasta plus a few methods:
get_by_id(), get_by_order(), first_seq() and previous_seq()

It would need review, validation etc. Do I submit it to Bugzilla ?

	-- jmf

From jason at bioperl.org  Mon Jan  4 11:03:45 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 4 Jan 2010 08:03:45 -0800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <4B41F742.2030209@pierroton.inra.fr>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
	<4B41F742.2030209@pierroton.inra.fr>
Message-ID: <16D7C8C1-E4BE-406F-9D60-379876178CAB@bioperl.org>

We typically think of SeqIO as parsing a stream of data, not being  
reliant on it being a file which is what these methods would be  
implying I think. Sounds a lot like a database - does Bio::DB::Fasta  
not provide some of the functionality you need by these methods?  I  
realize there isn't a by_order() but the get_by_id() is implemented to  
allow random access.

-jason

>
> Hi,
>
> I wrote and currently use a module I named Bio::SeqIO::multifasta,  
> which is basically a copy of Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
>
> It would need review, validation etc. Do I submit it to Bugzilla ?
>
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From avilella at gmail.com  Mon Jan  4 15:00:24 2010
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 4 Jan 2010 20:00:24 +0000
Subject: [Bioperl-l] indexed fastq files
Message-ID: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>

Hi all,

What is the best way to index fastq files, so that once clustered, I
can provide a list of seq_ids and get
them back in fastq format from the indexed db?

Cheers,

Albert.

From cjfields at illinois.edu  Mon Jan  4 16:59:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 4 Jan 2010 15:59:50 -0600
Subject: [Bioperl-l] indexed fastq files
In-Reply-To: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>
References: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>
Message-ID: <07EBA105-6A34-490C-B0B9-4772DF386CBA@illinois.edu>

Bio::Index::Fastq, maybe?  To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work.

chris

On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote:

> Hi all,
> 
> What is the best way to index fastq files, so that once clustered, I
> can provide a list of seq_ids and get
> them back in fastq format from the indexed db?
> 
> Cheers,
> 
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan  4 22:54:03 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 4 Jan 2010 21:54:03 -0600
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <4B41F742.2030209@pierroton.inra.fr>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
	<4B41F742.2030209@pierroton.inra.fr>
Message-ID: <1BAE5508-0DB7-41B4-92E3-49256582131F@illinois.edu>

Jean-Marc,

You can do that, yes.  Just curious, but have you looked at the various flat file indexing modules for FASTA?  Bio::DB::Fasta and Bio::Index::Fasta are commonly used and allow lookups by primary ID (and I think in some cases secondary IDs).

chris

On Jan 4, 2010, at 8:12 AM, Jean-Marc Frigerio INRA wrote:

> ...
> 
> Hi,
> 
> I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
> 
> It would need review, validation etc. Do I submit it to Bugzilla ?
> 
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fs5 at sanger.ac.uk  Wed Jan  6 17:16:13 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 06 Jan 2010 22:16:13 +0000
Subject: [Bioperl-l] Bio::DB::Sam strange behaviour for read pairs
Message-ID: <4B450BAD.3050807@sanger.ac.uk>

I'm trying to extract paired reads from a BAM file that span a given 
region. I would then like to get the two read ends of the sequenced 
clone that spans the region.
I use Bio::DB::Sam->get_features_by_location for this and it does give 
me the correct read pairs as a region match but it doesn't give me both 
read pairs in all cases.

Here is the script:

#!/usr/bin/perl
use Bio::DB::Sam;

my $usage = "usage: $0 BAMFILE CHROMOSOME STARTPOS ENDPOS\n" ;
my ($bam_file,$chrom,$start,$end) = @ARGV ;
die $usage unless $bam_file && $chrom && $start && $end;

my $bam = Bio::DB::Sam->new(-bam => $bam_file);

my @pairs = $bam->get_features_by_location(
    -type   => 'read_pair',
    -seq_id => $chrom,
    -start  => $start,
    -end    => $end);

print "region: $chrom:$start..$end\n" ;
foreach my $pair (@pairs) {
  print "  pair: id: ".$pair->id.", start".$pair->start.', 
end:'.$pair->end."\n";
  my ($first_mate,$second_mate) = $pair->get_SeqFeatures;
  print "    first_mate: start:".$first_mate->start.', 
end:'.$first_mate->end."\n";
  if ($second_mate){
    print "    second_mate: start:".$second_mate->start.', 
end:'.$second_mate->end."\n";
  } else {
    print "    no second mate\n";
  }
}

And here are the matching pairs that it produces with one of my files 
for the region tal12:22479..29232:
region: 
tal12:22479..29232                                                                                                                          

  pair: id: tal-2446c08, start17496, 
end:29423                                                                                                      

    first_mate: start:28540, 
end:29423                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2463d10, start23534, 
end:31363                                                                                                      

    first_mate: start:23534, 
end:24448                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2371c09, start20860, 
end:28230                                                                                                      

    first_mate: start:27604, 
end:28230                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2440b06, start19232, 
end:27099                                                                                                      

    first_mate: start:26025, 
end:27099                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2327g09, start18909, 
end:26129                                                                                                      

    first_mate: start:25354, 
end:26129                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2381b05, start25658, 
end:35054                                                                                                      

    first_mate: start:25658, 
end:26295                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2377c11, start20898, 
end:28230                                                                                                      

    first_mate: start:27473, 
end:28230                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2426e12, start21975, 
end:27562                                                                                                      

    first_mate: start:21975, 
end:23008                                                                                                               

    second_mate: start:26396, 
end:27562                                                                                                              

  pair: id: tal-2365h10, start22843, 
end:31944                                                                                                      

    first_mate: start:22843, 
end:23184                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2388h09, start19016, 
end:28238                                                                                                      

    first_mate: start:27475, 
end:28238                                                                                                               

    no second mate                   


So it finds a lot of pairs that span the region and the start/end from 
the pair is also correct but it only gives me both individual mates in 
one case:
  pair: id: tal-2426e12, start21975, 
end:27562                                                                                                      

    first_mate: start:21975, 
end:23008                                                                                                               

    second_mate: start:26396, end:27562

In this case, both pairs are actually inside the query region (at least 
partially) whereas in the other cases, one of the mates is not inside, 
e.g. this one:

  pair: id: tal-2388h09, start19016, 
end:28238                                                                                                      

    first_mate: start:27475, 
end:28238                                                                                                               

    no second mate
  
 > get this read pair from the BAM file:
$ samtools view clones.bam | grep tal-2388h09

tal-2388h09    99      tal12  19016   205     
36H9M1D14M1D664M1D16M1D21M1D28M1D15M1D10M1D12M1D7M1D8M1D5M      =       
27475   9223    
CTTTGGATGAAATAGTTTTTAAATAATACTTATTAAATATTAAATATATAACACATAAATAAGTATTGATGCAAATTTTAAAGTATTATAGAAAACTAGGTTTGATTATATTGTTATACTGTACTTTAAGAGGAGAGAGATAAGATATCTTTGCTCTTTTAATATATAAATTTAGATAAATATTCGTTAAATTTTCTACATAGTTATTTTTTATCTTATATATTATACTGCTATAGTTATCAATGTATATACATTCAAATAATTTATTAAAAATTCTATATTATATTAATTCTATGATAAAATAATCCTGTTTGTGATTTAAAAAATGATGATTCAATAAAAACTAATAATATAATACGAGTTAATATGGAATAATAAAATGGCATTTAACATGAATTTAGTCTTTAACCTTTTCTTTGTTTGTCAAGTTTTTTAAAACATAAAACCACACATTTCAAAATGGATTTTTAGCAAATATATAAAAATTATACATTTATAATGTATTGTTATGCGTCTTTTCGATAGAATCAATATTTAATTATATGAAGTTTCCACAATAAAATAATATTTAATATTATTTATTAGTAGAGTATTTGATTATATATATAGGCATATAATAATAACTCTAGTTCTATCTACCATATTATTTATAATTATTATAACAAAATGTGATATGAAATTTTATTATATACTTATATTATTTTTTTAACTATTTTAAAATATATTTATTTATACCTCAAAACTATAAAATTGAAATTATTAATAATAATCTAATATATACCTTTATAAAAATAAACGTATAAACTAAT   
 ><:4/+1+*)+4>BEH=9-,,66IIIIIIIIEDA>>>>A at DDFFIHHHHHITIIIIIHIIHHHHHHIYYYYYTTTYDDDHDDDDDDDIIINNTNHHHHHIYYYYIIIIIINNNNTTIIIIIIIIIIITTNTTTTTYYYTTTTTTYYNNNNNNLLLLLLLLLLLNNNNNNTTTTTTTTTTTTTNNTNNNTTTYYTLLLLLLTTTTTTTTYTTTNNNNNNTTTTTTTNNTTNNTTTTTTTTTTYYTTTTTTTNNNNNNTTTTTTTYYTTTTTTYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTNNINTTYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYTTTTTOOOIFFFIFIIOICC>>II@>>>>>>C>>>>>>CIBECCCHIIOOOOOOOOTTTIIFDDEIQQA:55839AA>99>@IIIIII>>::;;I;>>CC>>>>>@III<::=>AAA<>>>>I>:>>99:>842225006824855;5>68844//.//00:>::338:99<:/-+*-./0)((((+00+..,++(((+-()(*((((()*)***))3)''')*..+*++((*1++--+*''''((+/)*42.((***)+,+('*'''*((''''((,'%%''''''''(     
AS:i:614        MS:i:50
tal-2388h09    147     tal12  27475   205     1H764M40H       =       
19016   -9223   
ATTAAATCGGTATCGCCAACACAATGAGTATAATCATTGTCAAATATGCGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTAATTTTTGTGAGTATAATCATTGGAACG  
(((0))*,-1-../2((())03---03266300271+*.-0-*''''+*,+/+))*-05330+)..4>7=77273911**((+20+03688633:93036<8;::5:<99379>>::>>>:57:<:7--)))1435::333228>::>II>::>A>>3/.958677AA=AA:>:==IIII8338<>A>>>>IIIIIIIIYYYYYKKYYYMIFFFFEIIIMI::4..8AIIC>9>=EIQQQMCAAAAAACIIIIAICIIIOOYTIIIMOQQMIIIIC>>AAABCCCCCEAI>C>>IQQIIIIIIIIIIKKYYYYYYYYYYYYYYYYYYYYYTIIIIIIYYYYTNINNNTYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYSSYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTYYYYYTTTTTTYYYTTNNNNTTYYYYYYYYYTTTTLLTTNNTTTTYTTTTTTYYYYYYYYYYYTTOOKKKLKOOTYYYYYYYYYYYYYYYYTNNNNNNNNNTTTNYNNNNTNNNNTTYYYYYYYYTTNNNNTTYNNNNNITTTTTYYYYYYYYYYTTNNIIIIIDIIIIHTNNNNTTYYYYTNNNIIIIIITTTINIIIINNNNTTTYYYYIHHHDDHHDDIHDDGDFFFTIIINTTYYYYTTTTHHHHCCIIIHIHHHHCAI9:++**1168>ACCIIDDDDDDI>>>>>?NNN  
AS:i:688        MS:i:50

So the read in the first line starts before the start of the query 
region and is not accessible via $pair->get_SeqFeatures although this is 
a valid pair.
Am I doing something wrong, is this the desired behaviour or is it a bug?

Thanks for your help!


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From hlapp at drycafe.net  Thu Jan  7 11:55:00 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 7 Jan 2010 11:55:00 -0500
Subject: [Bioperl-l] Data missing into Annotation object using
	Bio::SeqIO (Genbank)
In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
References: <4B28EB44.3080006@pasteur.fr>
	<29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
Message-ID: <240F198A-83FA-4304-ACA8-80A702A68D8C@drycafe.net>

I don't know to what extent this was followed up on further and I  
guess it's too long ago to be of much help, but if it hasn't been  
mentioned before I wanted to point out  
Bio::SeqFeature::AnnotationAdaptor which integrates tag/value  
annotation and Bio::Annotation annotation into one  
AnnotationCollection, so it doesn't matter whether something is  
attached as a tag or as an annotation object.

	-hilmar

On Dec 16, 2009, at 10:09 AM, Chris Fields wrote:

> Emmanuel,
>
> The previous behavior in the 1.5.x series was to store feature tags  
> as Bio::Annotation.  The problem had been the way this was  
> implemented was considered unsatisfactory for various reasons, so we  
> reverted back to using simple tag-value pairs as the default.  You  
> can get at the data this way (from the Feature/Annotation HOWTO):
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>    print "primary tag: ", $feat_object->primary_tag, "\n";
>    for my $tag ($feat_object->get_all_tags) {
>        print "  tag: ", $tag, "\n";
>        for my $value ($feat_object->get_tag_values($tag)) {
>            print "    value: ", $value, "\n";
>        }
>    }
> }
>
> You can also convert all the tag-value data into a  
> Bio::Annotation::Collection using the  
> Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.
>
> chris
>
> On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote:
>
>> Hi,
>>
>> I've wrote a small Genbank parser few months ago before BioPerl  
>> release 1.6.0.
>> I tried to use my code once again but now the output of my parser  
>> is empty.
>> It looks like Annotation from seqfeatures is not filled anymore.
>>
>> Here is the code I used previously:
>>
>> while(my $seq = $streamer->next_seq()){
>>
>>   #We only want to retrieve CDS features...
>>   foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq- 
>> >get_SeqFeatures()){
>>       print $ofh join("#",
>>                       $feat->annotation()- 
>> >get_Annotations('locus_tag'),    # Acc num
>>                       $feat->annotation()->get_Annotations('gene')
>>                         ? $feat->annotation()- 
>> >get_Annotations('gene')      # Gene name
>>                         : $feat->annotation()- 
>> >get_Annotations('locus_tag'),
>>                       $feat->annotation()- 
>> >get_Annotations('product'),      # Description
>>                      ),"\n";
>>   }
>> }
>>
>> $feat is a Bio::SeqFeature::Generic object
>>
>> If I print Dumper($feat->annotation()) here is the output :
>>
>> $VAR1 = bless( {
>>                '_typemap' => bless( {
>>                                       '_type' => {
>>                                                    'comment' =>  
>> 'Bio::Annotation::Comment',
>>                                                    'reference' =>  
>> 'Bio::Annotation::Reference',
>>                                                    'dblink' =>  
>> 'Bio::Annotation::DBLink'
>>                                                  }
>>                                     },  
>> 'Bio::Annotation::TypeManager' ),
>>                '_annotation' => {}
>>              }, 'Bio::Annotation::Collection' );
>>
>> Have some changes been made into the way annotation object is  
>> populated?
>>
>> Thanks for any clue and sorry if my question look stupid
>>
>> Regards
>>
>> Emmanuel
>>
>> -- 
>> -------------------------
>> Emmanuel Quevillon
>> Biological Software and Databases Group
>> Institut Pasteur
>> +33 1 44 38 95 98
>> tuco at_ pasteur dot fr
>> -------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From rtbio.2009 at gmail.com  Fri Jan  8 10:00:21 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 8 Jan 2010 16:00:21 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
Message-ID: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>

Hello all,

I was trying Remote blast using Bioperl. My input data is a Trypanosoma
brucei sequence in Fasta format. When I was trying to submit to BLAST using
the step
$r=$factory->submit_blast($input)
It was not returning anything which I checked by debugging the code. It is
not blasting my input sequence even though I mentioned all the parameters.I
would paste the code below.

Please help me in solving put this problem. It is very urgent.

Regards
Roopa.

#!/usr/bin/perl

#path for extra camel module
use lib "/srv/www/htdocs/rain/RNAi/";
use Roopablast;


use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds. Roopa";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";


open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
 This page will automatically reload in 30 seconds Roopa <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

#$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= 'Trypanosoma Brucei';

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

            # open(OUTFILE,'>',$debugfile);
             #  print OUTFILE @params;
             # close(OUTFILE);


 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => 'Trypanosoma Brucei' );


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);    #The program stops here it
does not return any value and it does not enter the While loop,Please help
me in this regard.#
                open(OUTFILE,'>',$debugfile);
                print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
               print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
               print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
               print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
               print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);

}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

    print OUTFILE substr ($in{'Inputseq'}, $i, 1);

    if ( ($i+1)%10==0){
        print OUTFILE " ";
    }
    if ( ($i+1)%60==0){
        print OUTFILE "<br>\n";
    }
}


print OUTFILE "</font> <p>";

$z=@compseqs;

for($k=1;$k<$z;$k++) {
    print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
Sequence: <br>";

    for ($i=0; $i<length ($compseqs[$k]); $i++) {

        print OUTFILE substr ($compseqs[$k], $i, 1);

        if ( ($i+1)%10==0){
            print OUTFILE " ";
        }
        if ( ($i+1)%60==0){
            print OUTFILE "<br>\n";
        }
    }
    print OUTFILE "<p></font>";
}

print OUTFILE "<p>
Window: <br>$in{'Windowsize'}
<p>
<p>
Threshold: <br>$in{'Threshold'}
<p>";
my $j=0;

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

    if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
        if ($out[$i]->{similar}<=$in{'Threshold'}){
            $j=$in{'Windowsize'};
        }
        $height=$out[$i]->{similar}*5;
    }

    if ($j>0) {
        print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
height=\"5\">";
        $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
1)."</font>";
        $j--;
    }
    else {
        print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
height=\"5\">";
        $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
1)."</font>";
    }

    if ( ($i+1)%10==0){
        $outstring .= " ";
    }
    if ( ($i+1)%60==0){
        $outstring .= "<br>\n";

    }
    if ( ($i+1)%800==0){
        print OUTFILE "<br><br>\n";

    }
}

print OUTFILE "<br><br><font face=\"Courier, monospace font
set\">$outstring</font>";

#foreach (@out) {
#print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
#if ($_->{similar}<=$in{'Threshold'}){

#    }
#}

print OUTFILE "</BODY>\n</HTML>\n";

close OUTFILE;

#nameprint();

sub parse_form {
    local ($buffer, @pairs, $pair, $name, $value);
    # Read in text
    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
    if ($ENV{'REQUEST_METHOD'} eq "POST")
    {
        read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    }
    else
    {
        $buffer = $ENV{'QUERY_STRING'};
    }
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs)
    {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $in{$name} = $value;
    }
}

From maj at fortinbras.us  Fri Jan  8 10:36:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 8 Jan 2010 10:36:41 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
Message-ID: <F19004692A4A4350856B23DF25E09074@NewLife>

Hi Roopa--

I got your code to work with the following changes:

+# the input should be a valid FASTA file...
 ...
 open(NUC,'>',$nuc);
+print NUC ">seq (need a name line for valid fasta)\n";
 print NUC $inpu1, "\n";
 close(NUC);
...

+# you can set these header parms in the call itself...
- my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
+ my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => 
''Trypanosoma Brucei[ORGN]');

  #change a paramter
+# commented this out...
+# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma 
Brucei[ORGN]';

MAJ
----- Original Message ----- 
From: "Roopa Raghuveer" <rtbio.2009 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 08, 2010 10:00 AM
Subject: [Bioperl-l] Regarding blast in Bioperl


> Hello all,
>
> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
> brucei sequence in Fasta format. When I was trying to submit to BLAST using
> the step
> $r=$factory->submit_blast($input)
> It was not returning anything which I checked by debugging the code. It is
> not blasting my input sequence even though I mentioned all the parameters.I
> would paste the code below.
>
> Please help me in solving put this problem. It is very urgent.
>
> Regards
> Roopa.
>
> #!/usr/bin/perl
>
> #path for extra camel module
> use lib "/srv/www/htdocs/rain/RNAi/";
> use Roopablast;
>
>
> use Bio::SearchIO;
> use Bio::Search::Result::BlastResult;
> use Bio::Perl;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> $serverpath = "/srv/www/htdocs/rain/RNAi";
> $serverurl = "http://141.84.66.66/rain/RNAi";
> $outfile = $serverpath."/rnairesult_".time().".html";
> $nuc = $serverpath."/nuc".time().".txt";
> $debugfile = $serverpath."/debug_".time().".txt";
> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>
> my $outstring ="";
>
> &parse_form;
>
> print "Content-type: text/html\n\n";
> print "<HTML>\n";
> print "<head><title>RNAi Result</title>";
> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl/rnairesult_".time().".html\"> \n";
> print "</head>\n";
> print "<body>\n";
> print " Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>";
> print " Please be patient, runtime can be up to 5 minutes<br>";
> print " This page will automatically reload in 30 seconds. Roopa";
> print "</BODY>\n";
> print "</HTML>\n";
>
> defined(my $pid = fork) or die "Can't fork: $!";
> exit if $pid;
> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>
>
>
> open(OUTFILE, '>',$outfile);
>
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl//rnairesult_".time().".html\"> \n
> <meta http-equiv=\"expires\" content=\"0\">
> </head>\n
> <body>\n
>  Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>
>  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
> This page will automatically reload in 30 seconds Roopa <br>
> </BODY>\n
> </HTML>\n";
>
> close(OUTFILE);
>
>
> @compseqs = blastcode($in{'Inputseq'});
>
> $in{'Inputseq'} =~ s/>.*$//m;
> $in{'Inputseq'} =~ s/[^TAGC]//gim;
> $in{'Inputseq'} =~ tr/actg/ACTG/;
>
> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
> $in{'Threshold'});
>
>
> sub blastcode
> {
>
> $inpu1= $_[0];
>
> #$organ= $_[1];
>
> open(NUC,'>',$nuc);
> print NUC $inpu1;
> close(NUC);
>
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= 'Trypanosoma Brucei';
>
> $gb = new Bio::DB::GenBank;
>
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>         '-Organism'   => $organism );
>
>            # open(OUTFILE,'>',$debugfile);
>             #  print OUTFILE @params;
>             # close(OUTFILE);
>
>
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>
>  #change a paramter
>
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
>
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>
>  my $v = 1;
>  #$v is just to turn on and off the messages
>
> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => 'Trypanosoma Brucei' );
>
>
> while (my $input = $str->next_seq())
> {
>   #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
>
>             open(OUTFILE,'>',$debugfile);
>               print OUTFILE $input;
>              close(OUTFILE);
>
>
>   my $r = $factory->submit_blast($input);    #The program stops here it
> does not return any value and it does not enter the While loop,Please help
> me in this regard.#
>                open(OUTFILE,'>',$debugfile);
>                print OUTFILE $r;
>                close(OUTFILE);
>
>
>   print STDERR "waiting...." if($v>0);
>
>  while ( my @rids = $factory->each_rid ) {
>      open(OUTFILE,'>',$debugfile);
>               print OUTFILE "while entered";
>              close(OUTFILE);
>     foreach my $rid ( @rids ) {
>
>               open(OUTFILE,'>',$debugfile);
>               print OUTFILE "foreach entered";
>              close(OUTFILE);
>
>        my $rc = $factory->retrieve_blast($rid);
>
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         open(OUTFILE,'>',$debugfile);
>               print OUTFILE "if entered";
>              close(OUTFILE);
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>              open(OUTFILE,'>',$debugfile);
>               print OUTFILE "else entered";
>              close(OUTFILE);
>
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>
>          open(BLASTDEBUGFILE,'>',$blastdebugfile);
>          print BLASTDEBUGFILE $result->next_hit();
>          close(BLASTDEBUGFILE);
>
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
>
>         $factory->save_output($filename);
>
>       # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
>
>       $factory->remove_rid($rid);
>
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
>
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
>
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
>
>   while ( my $hit = $result->next_hit ) {
>
>            next unless ( $v > 0);
>
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
>
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
>
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
>
> return(@seqs);
>
> }
>
> open(OUTFILE, '>',$outfile) || die ;
>
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <meta http-equiv=\"expires\" content=\"0\"></head>\n
> <body>\n
> <p><font face=\"Courier, monospace font set\">
> Inputsequence: <br>";
>
> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>
>    print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>
>    if ( ($i+1)%10==0){
>        print OUTFILE " ";
>    }
>    if ( ($i+1)%60==0){
>        print OUTFILE "<br>\n";
>    }
> }
>
>
>
> print OUTFILE "</font> <p>";
>
> $z=@compseqs;
>
> for($k=1;$k<$z;$k++) {
>    print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
> Sequence: <br>";
>
>    for ($i=0; $i<length ($compseqs[$k]); $i++) {
>
>        print OUTFILE substr ($compseqs[$k], $i, 1);
>
>        if ( ($i+1)%10==0){
>            print OUTFILE " ";
>        }
>        if ( ($i+1)%60==0){
>            print OUTFILE "<br>\n";
>        }
>    }
>    print OUTFILE "<p></font>";
> }
>
> print OUTFILE "<p>
> Window: <br>$in{'Windowsize'}
> <p>
> <p>
> Threshold: <br>$in{'Threshold'}
> <p>";
> my $j=0;
>
> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>
>    if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>        if ($out[$i]->{similar}<=$in{'Threshold'}){
>            $j=$in{'Windowsize'};
>        }
>        $height=$out[$i]->{similar}*5;
>    }
>
>    if ($j>0) {
>        print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
> height=\"5\">";
>        $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
> 1)."</font>";
>        $j--;
>    }
>    else {
>        print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
> height=\"5\">";
>        $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
> 1)."</font>";
>    }
>
>    if ( ($i+1)%10==0){
>        $outstring .= " ";
>    }
>    if ( ($i+1)%60==0){
>        $outstring .= "<br>\n";
>
>    }
>    if ( ($i+1)%800==0){
>        print OUTFILE "<br><br>\n";
>
>    }
> }
>
> print OUTFILE "<br><br><font face=\"Courier, monospace font
> set\">$outstring</font>";
>
> #foreach (@out) {
> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
> #if ($_->{similar}<=$in{'Threshold'}){
>
> #    }
> #}
>
> print OUTFILE "</BODY>\n</HTML>\n";
>
> close OUTFILE;
>
> #nameprint();
>
> sub parse_form {
>    local ($buffer, @pairs, $pair, $name, $value);
>    # Read in text
>    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>    if ($ENV{'REQUEST_METHOD'} eq "POST")
>    {
>        read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>    }
>    else
>    {
>        $buffer = $ENV{'QUERY_STRING'};
>    }
>    @pairs = split(/&/, $buffer);
>    foreach $pair (@pairs)
>    {
>        ($name, $value) = split(/=/, $pair);
>        $value =~ tr/+/ /;
>        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>        $in{$name} = $value;
>    }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From julian.onions at gmail.com  Fri Jan  8 11:53:50 2010
From: julian.onions at gmail.com (Julian Onions)
Date: Fri, 8 Jan 2010 16:53:50 +0000
Subject: [Bioperl-l] Cladogram construction
Message-ID: <cbeabfd41001080853m50c75779q4155cd02af17670a@mail.gmail.com>

Does anyone have any sample code for building cladograms based on Pars (one
of Phylip tools) type format (or any other format actually)
I've got something sort of working but I get no weights on the tree -
everything appears as nan. I'd also like to set one of the species to be an
outgroup. This is the closest sample I've found so far.


#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
use Bio::Tree::DistanceFactory;
use Bio::Align::ProteinStatistics;
use Bio::TreeIO;
use Bio::Tree::Draw::Cladogram;
my $alnfile = shift @ARGV || die "need a file to run";

my $input= Bio::AlignIO->new(-format => 'fasta',
    -file    => $alnfile);

if( my $aln = $input->next_aln ) {
 my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ');
 my $stats = Bio::Align::ProteinStatistics->new;
 my $distmat = $stats->distance(-align => $aln,
         -method => 'Kimura');
 my $treeout = Bio::TreeIO->new(-format => 'newick');
 my $tree = $dfactory->make_tree($distmat);
 $treeout->write_tree($tree);
  my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree    => $tree,
                                             -compact => 0);
  $obj1->print(-file => "tree.eps");
} else {
 die "could not find any alignments in the file $alnfile";
}


Pars input looks like
3 4
Robin   101
Blackbird 100
Sparrow 100


Thanks,
Julian.

From rtbio.2009 at gmail.com  Sat Jan  9 11:57:09 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Sat, 9 Jan 2010 17:57:09 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <F19004692A4A4350856B23DF25E09074@NewLife>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<F19004692A4A4350856B23DF25E09074@NewLife>
Message-ID: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>

Hello all,

Thanks alot for your reply Mark. It was working for Trypanosoma brucei as
the organism parameter,but when I tried to use the Organism parameter from
the user,it was not working i.e., I was unable to get the target sequences.
Please help me in this regard. My code is

#!/usr/bin/perl

#path for extra camel module
use lib "/srv/www/htdocs/rain/RNAi/";
use Roopablast;


use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds. Roopa";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";

open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
 This page will automatically reload in 30 seconds Roopa <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'},$in{'Organism'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1,"\n";
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
        '-Organism'   => $organism );

             open(OUTFILE,'>',$debugfile);
             print OUTFILE $inpu1;
              close(OUTFILE);


my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
'$organ[ORGN]');

 #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => $organ );


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             #open(OUTFILE,'>',$debugfile);
              # print OUTFILE $input;
              #close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);

   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
   #   open(OUTFILE,'>',$debugfile);
    #           print OUTFILE "while entered";
     #         close(OUTFILE);
     foreach my $rid ( @rids ) {

      #         open(OUTFILE,'>',$debugfile);
       #        print OUTFILE "foreach entered";
        #      close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
         #      print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          #    open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "else entered";
            #  close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);
  # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);

}

Regards,
Roopa.


On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> Hi Roopa--
>
> I got your code to work with the following changes:
>
> +# the input should be a valid FASTA file...
> ...
> open(NUC,'>',$nuc);
> +print NUC ">seq (need a name line for valid fasta)\n";
> print NUC $inpu1, "\n";
> close(NUC);
> ...
>
> +# you can set these header parms in the call itself...
> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
> ''Trypanosoma Brucei[ORGN]');
>
>  #change a paramter
> +# commented this out...
> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
>
> MAJ
> ----- Original Message ----- From: "Roopa Raghuveer" <rtbio.2009 at gmail.com
> >
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 08, 2010 10:00 AM
> Subject: [Bioperl-l] Regarding blast in Bioperl
>
>
>  Hello all,
>>
>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
>> brucei sequence in Fasta format. When I was trying to submit to BLAST
>> using
>> the step
>> $r=$factory->submit_blast($input)
>> It was not returning anything which I checked by debugging the code. It is
>> not blasting my input sequence even though I mentioned all the
>> parameters.I
>> would paste the code below.
>>
>> Please help me in solving put this problem. It is very urgent.
>>
>> Regards
>> Roopa.
>>
>> #!/usr/bin/perl
>>
>> #path for extra camel module
>> use lib "/srv/www/htdocs/rain/RNAi/";
>> use Roopablast;
>>
>>
>> use Bio::SearchIO;
>> use Bio::Search::Result::BlastResult;
>> use Bio::Perl;
>> use Bio::Tools::Run::RemoteBlast;
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::DB::GenBank;
>>
>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>> $serverurl = "http://141.84.66.66/rain/RNAi";
>> $outfile = $serverpath."/rnairesult_".time().".html";
>> $nuc = $serverpath."/nuc".time().".txt";
>> $debugfile = $serverpath."/debug_".time().".txt";
>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>
>> my $outstring ="";
>>
>> &parse_form;
>>
>> print "Content-type: text/html\n\n";
>> print "<HTML>\n";
>> print "<head><title>RNAi Result</title>";
>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>> print "</head>\n";
>> print "<body>\n";
>> print " Your results will appear <a
>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>> print " Please be patient, runtime can be up to 5 minutes<br>";
>> print " This page will automatically reload in 30 seconds. Roopa";
>> print "</BODY>\n";
>> print "</HTML>\n";
>>
>> defined(my $pid = fork) or die "Can't fork: $!";
>> exit if $pid;
>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>
>>
>>
>> open(OUTFILE, '>',$outfile);
>>
>> print OUTFILE "<HTML>\n
>> <head><title>RNAi Result</title>
>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>> URL=$serverurl//rnairesult_".time().".html\"> \n
>> <meta http-equiv=\"expires\" content=\"0\">
>> </head>\n
>> <body>\n
>>  Your results will appear <a
>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>  Please be patient, runtime can be up to 5 minutes wait wait
>> wait......<br>
>> This page will automatically reload in 30 seconds Roopa <br>
>> </BODY>\n
>> </HTML>\n";
>>
>> close(OUTFILE);
>>
>>
>> @compseqs = blastcode($in{'Inputseq'});
>>
>> $in{'Inputseq'} =~ s/>.*$//m;
>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>
>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>> $in{'Threshold'});
>>
>>
>> sub blastcode
>> {
>>
>> $inpu1= $_[0];
>>
>> #$organ= $_[1];
>>
>> open(NUC,'>',$nuc);
>> print NUC $inpu1;
>> close(NUC);
>>
>> my $prog = 'blastn';
>> my $db   = 'refseq_rna';
>> my $e_val= '1e-10';
>> my $organism= 'Trypanosoma Brucei';
>>
>> $gb = new Bio::DB::GenBank;
>>
>> my @params = ( '-prog' => $prog,
>>        '-data' => $db,
>>        '-expect' => $e_val,
>>        '-readmethod' => 'SearchIO',
>>        '-Organism'   => $organism );
>>
>>           # open(OUTFILE,'>',$debugfile);
>>            #  print OUTFILE @params;
>>            # close(OUTFILE);
>>
>>
>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>
>>  #change a paramter
>>
>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>> Brucei[ORGN]';
>>
>> #change a paramter
>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>>
>>  my $v = 1;
>>  #$v is just to turn on and off the messages
>>
>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>> '-organism' => 'Trypanosoma Brucei' );
>>
>>
>> while (my $input = $str->next_seq())
>> {
>>  #Blast a sequence against a database:
>>   #Alternatively, you could  pass in a file with many
>>   #sequences rather than loop through sequence one at a time
>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>   #and swap the two lines below for an example of that.
>>
>>            open(OUTFILE,'>',$debugfile);
>>              print OUTFILE $input;
>>             close(OUTFILE);
>>
>>
>>  my $r = $factory->submit_blast($input);    #The program stops here it
>> does not return any value and it does not enter the While loop,Please help
>> me in this regard.#
>>               open(OUTFILE,'>',$debugfile);
>>               print OUTFILE $r;
>>               close(OUTFILE);
>>
>>
>>  print STDERR "waiting...." if($v>0);
>>
>>  while ( my @rids = $factory->each_rid ) {
>>     open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "while entered";
>>             close(OUTFILE);
>>    foreach my $rid ( @rids ) {
>>
>>              open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "foreach entered";
>>             close(OUTFILE);
>>
>>       my $rc = $factory->retrieve_blast($rid);
>>
>>       if( !ref($rc) )
>>       {
>>       if( $rc < 0 )
>>       {
>>       $factory->remove_rid($rid);
>>       }
>>        open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "if entered";
>>             close(OUTFILE);
>>        print STDERR "." if ( $v > 0 );
>>        sleep 5;
>>       }
>>      else {
>>             open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "else entered";
>>             close(OUTFILE);
>>
>>         my $result = $rc->next_result();
>>        #save the output
>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>
>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>         print BLASTDEBUGFILE $result->next_hit();
>>         close(BLASTDEBUGFILE);
>>
>>       my $filename =
>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>
>>        # open(DEBUGFILE,'>',$debugfile);
>>        # open(new,'>',$filename);
>>        # @arra=<new>;
>>        # print DEBUGFILE @arra;
>>        # close(DEBUGFILE);
>>        # close(new);
>>
>>        $factory->save_output($filename);
>>
>>      # open(BLASTDEBUGFILE,'>',$debugfile);
>>      # print BLASTDEBUGFILE  "Hello $rid";
>>      # close(BLASTDEBUGFILE);
>>
>>      $factory->remove_rid($rid);
>>
>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>      print BLASTDEBUGFILE  $organism;
>>       close(BLASTDEBUGFILE);
>>
>>   # open(OUTFILE,'>',$outfile);
>>   # print OUTFILE "Test2 $result->database_name()";
>>   # close(OUTFILE);
>>
>> #$hit = $result->next_hit;
>> #open(new,'>',$debugfile);
>> #print $hit;
>> #close(new);
>>
>>  while ( my $hit = $result->next_hit ) {
>>
>>           next unless ( $v > 0);
>>
>>         #     open(OUTFILE,'>',$debugfile);
>>          #    print OUTFILE "$hit in while hits";
>>           #  close(OUTFILE);
>>
>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>          my $dna = $sequ->seq();        # get the sequence as a string
>>                 push(@seqs,$dna);
>>         }
>>       }
>>     }
>>   }
>>  }
>>
>>  #open(OUTFILE,'>',$debugfile);
>>  #print OUTFILE $seqs[0];
>>  #close(OUTFILE);
>>
>> return(@seqs);
>>
>> }
>>
>> open(OUTFILE, '>',$outfile) || die ;
>>
>> print OUTFILE "<HTML>\n
>> <head><title>RNAi Result</title>
>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>> <body>\n
>> <p><font face=\"Courier, monospace font set\">
>> Inputsequence: <br>";
>>
>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>
>>   print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>
>>   if ( ($i+1)%10==0){
>>       print OUTFILE " ";
>>   }
>>   if ( ($i+1)%60==0){
>>       print OUTFILE "<br>\n";
>>   }
>> }
>>
>>
>>
>> print OUTFILE "</font> <p>";
>>
>> $z=@compseqs;
>>
>> for($k=1;$k<$z;$k++) {
>>   print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
>> Sequence: <br>";
>>
>>   for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>
>>       print OUTFILE substr ($compseqs[$k], $i, 1);
>>
>>       if ( ($i+1)%10==0){
>>           print OUTFILE " ";
>>       }
>>       if ( ($i+1)%60==0){
>>           print OUTFILE "<br>\n";
>>       }
>>   }
>>   print OUTFILE "<p></font>";
>> }
>>
>> print OUTFILE "<p>
>> Window: <br>$in{'Windowsize'}
>> <p>
>> <p>
>> Threshold: <br>$in{'Threshold'}
>> <p>";
>> my $j=0;
>>
>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>
>>   if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>       if ($out[$i]->{similar}<=$in{'Threshold'}){
>>           $j=$in{'Windowsize'};
>>       }
>>       $height=$out[$i]->{similar}*5;
>>   }
>>
>>   if ($j>0) {
>>       print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>> height=\"5\">";
>>       $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
>> 1)."</font>";
>>       $j--;
>>   }
>>   else {
>>       print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>> height=\"5\">";
>>       $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
>> 1)."</font>";
>>   }
>>
>>   if ( ($i+1)%10==0){
>>       $outstring .= " ";
>>   }
>>   if ( ($i+1)%60==0){
>>       $outstring .= "<br>\n";
>>
>>   }
>>   if ( ($i+1)%800==0){
>>       print OUTFILE "<br><br>\n";
>>
>>   }
>> }
>>
>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>> set\">$outstring</font>";
>>
>> #foreach (@out) {
>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
>> #if ($_->{similar}<=$in{'Threshold'}){
>>
>> #    }
>> #}
>>
>> print OUTFILE "</BODY>\n</HTML>\n";
>>
>> close OUTFILE;
>>
>> #nameprint();
>>
>> sub parse_form {
>>   local ($buffer, @pairs, $pair, $name, $value);
>>   # Read in text
>>   $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>   if ($ENV{'REQUEST_METHOD'} eq "POST")
>>   {
>>       read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>   }
>>   else
>>   {
>>       $buffer = $ENV{'QUERY_STRING'};
>>   }
>>   @pairs = split(/&/, $buffer);
>>   foreach $pair (@pairs)
>>   {
>>       ($name, $value) = split(/=/, $pair);
>>       $value =~ tr/+/ /;
>>       $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>       $in{$name} = $value;
>>   }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>

From maj at fortinbras.us  Sat Jan  9 13:05:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 9 Jan 2010 13:05:41 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com><F19004692A4A4350856B23DF25E09074@NewLife>
	<c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
Message-ID: <4C2E8133F916495B876628EF3E8FCBB2@NewLife>

I see it immediately (from making same bug many times) :

 my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
- '$organ[ORGN]');
+"$organ[ORGN]");

MAJ

----- Original Message ----- 
From: "Roopa Raghuveer" <rtbio.2009 at gmail.com>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Saturday, January 09, 2010 11:57 AM
Subject: Re: [Bioperl-l] Regarding blast in Bioperl


> Hello all,
> 
> Thanks alot for your reply Mark. It was working for Trypanosoma brucei as
> the organism parameter,but when I tried to use the Organism parameter from
> the user,it was not working i.e., I was unable to get the target sequences.
> Please help me in this regard. My code is
> 
> #!/usr/bin/perl
> 
> #path for extra camel module
> use lib "/srv/www/htdocs/rain/RNAi/";
> use Roopablast;
> 
> 
> use Bio::SearchIO;
> use Bio::Search::Result::BlastResult;
> use Bio::Perl;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> 
> $serverpath = "/srv/www/htdocs/rain/RNAi";
> $serverurl = "http://141.84.66.66/rain/RNAi";
> $outfile = $serverpath."/rnairesult_".time().".html";
> $nuc = $serverpath."/nuc".time().".txt";
> $debugfile = $serverpath."/debug_".time().".txt";
> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
> my $outstring ="";
> 
> &parse_form;
> 
> print "Content-type: text/html\n\n";
> print "<HTML>\n";
> print "<head><title>RNAi Result</title>";
> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl/rnairesult_".time().".html\"> \n";
> print "</head>\n";
> print "<body>\n";
> print " Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>";
> print " Please be patient, runtime can be up to 5 minutes<br>";
> print " This page will automatically reload in 30 seconds. Roopa";
> print "</BODY>\n";
> print "</HTML>\n";
> 
> defined(my $pid = fork) or die "Can't fork: $!";
> exit if $pid;
> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
> 
> open(OUTFILE, '>',$outfile);
> 
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl//rnairesult_".time().".html\"> \n
> <meta http-equiv=\"expires\" content=\"0\">
> </head>\n
> <body>\n
>  Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>
>  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
> This page will automatically reload in 30 seconds Roopa <br>
> </BODY>\n
> </HTML>\n";
> 
> close(OUTFILE);
> 
> 
> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
> 
> $in{'Inputseq'} =~ s/>.*$//m;
> $in{'Inputseq'} =~ s/[^TAGC]//gim;
> $in{'Inputseq'} =~ tr/actg/ACTG/;
> 
> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
> $in{'Threshold'});
> 
> 
> sub blastcode
> {
> 
> $inpu1= $_[0];
> 
> $organ= $_[1];
> 
> open(NUC,'>',$nuc);
> print NUC $inpu1,"\n";
> close(NUC);
> 
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= $organ;
> 
> $gb = new Bio::DB::GenBank;
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>        '-Organism'   => $organism );
> 
>             open(OUTFILE,'>',$debugfile);
>             print OUTFILE $inpu1;
>              close(OUTFILE);
> 
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
> '$organ[ORGN]');
> 
> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
>  #change a paramter
> 
> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
> 
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
> 
>  my $v = 1;
>  #$v is just to turn on and off the messages
> 
> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => $organ );
> 
> 
> while (my $input = $str->next_seq())
> {
>   #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
> 
>             #open(OUTFILE,'>',$debugfile);
>              # print OUTFILE $input;
>              #close(OUTFILE);
> 
> 
>   my $r = $factory->submit_blast($input);
> 
>                open(OUTFILE,'>',$debugfile);
>             #   print OUTFILE $r;
>                close(OUTFILE);
> 
>   print STDERR "waiting...." if($v>0);
> 
>  while ( my @rids = $factory->each_rid ) {
>   #   open(OUTFILE,'>',$debugfile);
>    #           print OUTFILE "while entered";
>     #         close(OUTFILE);
>     foreach my $rid ( @rids ) {
> 
>      #         open(OUTFILE,'>',$debugfile);
>       #        print OUTFILE "foreach entered";
>        #      close(OUTFILE);
> 
>        my $rc = $factory->retrieve_blast($rid);
> 
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         open(OUTFILE,'>',$debugfile);
>         #      print OUTFILE "if entered";
>              close(OUTFILE);
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>          #    open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "else entered";
>            #  close(OUTFILE);
> 
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>          open(BLASTDEBUGFILE,'>',$blastdebugfile);
>          print BLASTDEBUGFILE $result->next_hit();
>          close(BLASTDEBUGFILE);
> 
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
> 
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
> 
>         $factory->save_output($filename);
>  # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
> 
>       $factory->remove_rid($rid);
> 
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
> 
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> 
>   while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
> 
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
> 
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
> 
> return(@seqs);
> 
> }
> 
> Regards,
> Roopa.
> 
> 
> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> 
>> Hi Roopa--
>>
>> I got your code to work with the following changes:
>>
>> +# the input should be a valid FASTA file...
>> ...
>> open(NUC,'>',$nuc);
>> +print NUC ">seq (need a name line for valid fasta)\n";
>> print NUC $inpu1, "\n";
>> close(NUC);
>> ...
>>
>> +# you can set these header parms in the call itself...
>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
>> ''Trypanosoma Brucei[ORGN]');
>>
>>  #change a paramter
>> +# commented this out...
>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>> Brucei[ORGN]';
>>
>> MAJ
>> ----- Original Message ----- From: "Roopa Raghuveer" <rtbio.2009 at gmail.com
>> >
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 08, 2010 10:00 AM
>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>
>>
>>  Hello all,
>>>
>>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
>>> brucei sequence in Fasta format. When I was trying to submit to BLAST
>>> using
>>> the step
>>> $r=$factory->submit_blast($input)
>>> It was not returning anything which I checked by debugging the code. It is
>>> not blasting my input sequence even though I mentioned all the
>>> parameters.I
>>> would paste the code below.
>>>
>>> Please help me in solving put this problem. It is very urgent.
>>>
>>> Regards
>>> Roopa.
>>>
>>> #!/usr/bin/perl
>>>
>>> #path for extra camel module
>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>> use Roopablast;
>>>
>>>
>>> use Bio::SearchIO;
>>> use Bio::Search::Result::BlastResult;
>>> use Bio::Perl;
>>> use Bio::Tools::Run::RemoteBlast;
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>> use Bio::DB::GenBank;
>>>
>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>> $nuc = $serverpath."/nuc".time().".txt";
>>> $debugfile = $serverpath."/debug_".time().".txt";
>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>
>>> my $outstring ="";
>>>
>>> &parse_form;
>>>
>>> print "Content-type: text/html\n\n";
>>> print "<HTML>\n";
>>> print "<head><title>RNAi Result</title>";
>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>> print "</head>\n";
>>> print "<body>\n";
>>> print " Your results will appear <a
>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>> print " This page will automatically reload in 30 seconds. Roopa";
>>> print "</BODY>\n";
>>> print "</HTML>\n";
>>>
>>> defined(my $pid = fork) or die "Can't fork: $!";
>>> exit if $pid;
>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>
>>>
>>>
>>> open(OUTFILE, '>',$outfile);
>>>
>>> print OUTFILE "<HTML>\n
>>> <head><title>RNAi Result</title>
>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>> <meta http-equiv=\"expires\" content=\"0\">
>>> </head>\n
>>> <body>\n
>>>  Your results will appear <a
>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>> wait......<br>
>>> This page will automatically reload in 30 seconds Roopa <br>
>>> </BODY>\n
>>> </HTML>\n";
>>>
>>> close(OUTFILE);
>>>
>>>
>>> @compseqs = blastcode($in{'Inputseq'});
>>>
>>> $in{'Inputseq'} =~ s/>.*$//m;
>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>
>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>> $in{'Threshold'});
>>>
>>>
>>> sub blastcode
>>> {
>>>
>>> $inpu1= $_[0];
>>>
>>> #$organ= $_[1];
>>>
>>> open(NUC,'>',$nuc);
>>> print NUC $inpu1;
>>> close(NUC);
>>>
>>> my $prog = 'blastn';
>>> my $db   = 'refseq_rna';
>>> my $e_val= '1e-10';
>>> my $organism= 'Trypanosoma Brucei';
>>>
>>> $gb = new Bio::DB::GenBank;
>>>
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO',
>>>        '-Organism'   => $organism );
>>>
>>>           # open(OUTFILE,'>',$debugfile);
>>>            #  print OUTFILE @params;
>>>            # close(OUTFILE);
>>>
>>>
>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>
>>>  #change a paramter
>>>
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>> Brucei[ORGN]';
>>>
>>> #change a paramter
>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>>>
>>>  my $v = 1;
>>>  #$v is just to turn on and off the messages
>>>
>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>> '-organism' => 'Trypanosoma Brucei' );
>>>
>>>
>>> while (my $input = $str->next_seq())
>>> {
>>>  #Blast a sequence against a database:
>>>   #Alternatively, you could  pass in a file with many
>>>   #sequences rather than loop through sequence one at a time
>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>   #and swap the two lines below for an example of that.
>>>
>>>            open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE $input;
>>>             close(OUTFILE);
>>>
>>>
>>>  my $r = $factory->submit_blast($input);    #The program stops here it
>>> does not return any value and it does not enter the While loop,Please help
>>> me in this regard.#
>>>               open(OUTFILE,'>',$debugfile);
>>>               print OUTFILE $r;
>>>               close(OUTFILE);
>>>
>>>
>>>  print STDERR "waiting...." if($v>0);
>>>
>>>  while ( my @rids = $factory->each_rid ) {
>>>     open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "while entered";
>>>             close(OUTFILE);
>>>    foreach my $rid ( @rids ) {
>>>
>>>              open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "foreach entered";
>>>             close(OUTFILE);
>>>
>>>       my $rc = $factory->retrieve_blast($rid);
>>>
>>>       if( !ref($rc) )
>>>       {
>>>       if( $rc < 0 )
>>>       {
>>>       $factory->remove_rid($rid);
>>>       }
>>>        open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "if entered";
>>>             close(OUTFILE);
>>>        print STDERR "." if ( $v > 0 );
>>>        sleep 5;
>>>       }
>>>      else {
>>>             open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "else entered";
>>>             close(OUTFILE);
>>>
>>>         my $result = $rc->next_result();
>>>        #save the output
>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>
>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>         print BLASTDEBUGFILE $result->next_hit();
>>>         close(BLASTDEBUGFILE);
>>>
>>>       my $filename =
>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>
>>>        # open(DEBUGFILE,'>',$debugfile);
>>>        # open(new,'>',$filename);
>>>        # @arra=<new>;
>>>        # print DEBUGFILE @arra;
>>>        # close(DEBUGFILE);
>>>        # close(new);
>>>
>>>        $factory->save_output($filename);
>>>
>>>      # open(BLASTDEBUGFILE,'>',$debugfile);
>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>      # close(BLASTDEBUGFILE);
>>>
>>>      $factory->remove_rid($rid);
>>>
>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>      print BLASTDEBUGFILE  $organism;
>>>       close(BLASTDEBUGFILE);
>>>
>>>   # open(OUTFILE,'>',$outfile);
>>>   # print OUTFILE "Test2 $result->database_name()";
>>>   # close(OUTFILE);
>>>
>>> #$hit = $result->next_hit;
>>> #open(new,'>',$debugfile);
>>> #print $hit;
>>> #close(new);
>>>
>>>  while ( my $hit = $result->next_hit ) {
>>>
>>>           next unless ( $v > 0);
>>>
>>>         #     open(OUTFILE,'>',$debugfile);
>>>          #    print OUTFILE "$hit in while hits";
>>>           #  close(OUTFILE);
>>>
>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>                 push(@seqs,$dna);
>>>         }
>>>       }
>>>     }
>>>   }
>>>  }
>>>
>>>  #open(OUTFILE,'>',$debugfile);
>>>  #print OUTFILE $seqs[0];
>>>  #close(OUTFILE);
>>>
>>> return(@seqs);
>>>
>>> }
>>>
>>> open(OUTFILE, '>',$outfile) || die ;
>>>
>>> print OUTFILE "<HTML>\n
>>> <head><title>RNAi Result</title>
>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>> <body>\n
>>> <p><font face=\"Courier, monospace font set\">
>>> Inputsequence: <br>";
>>>
>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>
>>>   print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>
>>>   if ( ($i+1)%10==0){
>>>       print OUTFILE " ";
>>>   }
>>>   if ( ($i+1)%60==0){
>>>       print OUTFILE "<br>\n";
>>>   }
>>> }
>>>
>>>
>>>
>>> print OUTFILE "</font> <p>";
>>>
>>> $z=@compseqs;
>>>
>>> for($k=1;$k<$z;$k++) {
>>>   print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
>>> Sequence: <br>";
>>>
>>>   for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>
>>>       print OUTFILE substr ($compseqs[$k], $i, 1);
>>>
>>>       if ( ($i+1)%10==0){
>>>           print OUTFILE " ";
>>>       }
>>>       if ( ($i+1)%60==0){
>>>           print OUTFILE "<br>\n";
>>>       }
>>>   }
>>>   print OUTFILE "<p></font>";
>>> }
>>>
>>> print OUTFILE "<p>
>>> Window: <br>$in{'Windowsize'}
>>> <p>
>>> <p>
>>> Threshold: <br>$in{'Threshold'}
>>> <p>";
>>> my $j=0;
>>>
>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>
>>>   if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>       if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>           $j=$in{'Windowsize'};
>>>       }
>>>       $height=$out[$i]->{similar}*5;
>>>   }
>>>
>>>   if ($j>0) {
>>>       print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>> height=\"5\">";
>>>       $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
>>> 1)."</font>";
>>>       $j--;
>>>   }
>>>   else {
>>>       print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>> height=\"5\">";
>>>       $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
>>> 1)."</font>";
>>>   }
>>>
>>>   if ( ($i+1)%10==0){
>>>       $outstring .= " ";
>>>   }
>>>   if ( ($i+1)%60==0){
>>>       $outstring .= "<br>\n";
>>>
>>>   }
>>>   if ( ($i+1)%800==0){
>>>       print OUTFILE "<br><br>\n";
>>>
>>>   }
>>> }
>>>
>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>> set\">$outstring</font>";
>>>
>>> #foreach (@out) {
>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>
>>> #    }
>>> #}
>>>
>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>
>>> close OUTFILE;
>>>
>>> #nameprint();
>>>
>>> sub parse_form {
>>>   local ($buffer, @pairs, $pair, $name, $value);
>>>   # Read in text
>>>   $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>   if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>   {
>>>       read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>   }
>>>   else
>>>   {
>>>       $buffer = $ENV{'QUERY_STRING'};
>>>   }
>>>   @pairs = split(/&/, $buffer);
>>>   foreach $pair (@pairs)
>>>   {
>>>       ($name, $value) = split(/=/, $pair);
>>>       $value =~ tr/+/ /;
>>>       $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>       $in{$name} = $value;
>>>   }
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From robert.bradbury at gmail.com  Sat Jan  9 14:52:53 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sat, 9 Jan 2010 14:52:53 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<F19004692A4A4350856B23DF25E09074@NewLife>
	<c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
Message-ID: <deaa866a1001091152u4e85b1eboc99feb52a5b45b5@mail.gmail.com>

Roopa,

Mark is correct, you have to be very careful of single vs. double quotes in
perl. Double quoted strings are "interpreted" while single quoted strings
are taken literally is my current understanding.

I tried to run your script (with fixes) but without the supporting files it
appears to be impossible.

What I am curious about is what it is trying to do, I was particularly i
particularly intrigued by some apparent efforts to parse blast results into
color enhanced HTML and without thinking about the code in detail it seems
easier to simply ask what you are trying to do?  I find "classical" blast
results particularly tedious and long for blast results that display concise
information as the NCBI homologene cross-species comparisons do.
Unfortunately NCBI has deemed their methods (I have asked them) "too complex
to disclose (for a person comfortable in dealing with assembly language, or
even gate level electronics -- "too complex" is a very relative concept)".
One has the option of using NCBI with a limited number of species but good
display methodologies or Ensembl with many more species but less desirable
display methodologies (phylogenetic tree derived from cross species
comparisons).  And for the WRN protein which may play a key role in aging
(through the activity of its exonuclease domain mutating DNA sequences and
inducing microdeletions and microinsertions this gets important because it
appears that the *C. elegans* genome is missing the exonuclease domain (so
it may be useless from the perspective of studying aging), and the other 4
nematode species which have been sequenced aren't even in the NCBI nor the
Ensembl databases.  Needless to say, if we manage in the near future, given
the drop in sequencing costs, to sequence the nematodes which are
freeze/thaw tolerant (which induces DSB that have to be repaired) those
genomes will be unlikely to be in the NCBI/Ensembl databases either.  So
there is a requirement for the user to develop the ability to mix and match
public and obscure databases in creative ways to provide easy to interpret
information.

Robert Bradbury

From robert.bradbury at gmail.com  Sat Jan  9 15:27:54 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sat, 9 Jan 2010 15:27:54 -0500
Subject: [Bioperl-l] Ensembl problems
Message-ID: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>

I am trying to get the examples provided by EMBL/Ensembl to work and am
encountering problems.

For example, about 1/3 of the way through the Compara API tutorial [1] there
is what is supposed to be a completely functional script.  It does not
work.  This is in contrast to some of the earlier simple scripts (listing
the species in  Ensmbl etc.) which do work on my machine, so I have all the
libraries do dah installed correctly).

Very poor form to document scripts which do not function on a properly setup
system.

I have modified my invocation of the script slightly:
  Align.pl --set_of_species \
"Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"

which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
an undefined value at ./Align.pl line 132." (Align.pl is my slightly
modified example of the Compara Tutoraial code.)
As these are slightly modified perl scripts from the documantation, the line
numbers may be variable.

I can print out the genome_dbs, and it gives me a list of genome names (hash
tables) though it appears that is problematic in the Align.pl script.
in spite of the fact that just previously to that call I dumped "genome_dbs"
and got back some 25 hash tables (expected).  I believe this occurs whether
one is comparing "human:mouse" or the more complex species set I have
outlined above.


Has anyone else attempted to run the code documented in the Ensembl API
Tutorial?
Any suggestions as to what direction to go in would be appreciated -- when
one is trying to copy code out of a tutorial and it fails its kind of hard
to know where to go.)

There do appear to be some problems in the specifications of a Compara
version/database and there don't appear to be a lot of resources informing
one of what resources are currently available.

Robert


1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html

From ak at ebi.ac.uk  Sat Jan  9 17:01:21 2010
From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=)
Date: Sat, 9 Jan 2010 22:01:21 +0000
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
Message-ID: <20100109220121.GA9521@quux.windows.ebi.ac.uk>

On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
> I am trying to get the examples provided by EMBL/Ensembl to work and am
> encountering problems.

Hi Robert,

The ensembl-dev list is the appropriate forum for this type of questions
as it has nothing to do with bioperl.

There is also the Ensembl helpdesk.  If you send your problem to
<helpdesk at ensembl.org> I'm sure that it will be picked up by the
appropriate people (I do myself not know enough about the Compara API to
be able to diagnose this problem straight away I'm afraid).

Be sure to submit a minimal script that still exhibit the problem, and
information about what version of the APIs you're using (we will assume
that you're not mixing newer version of the API with older databases or
vice versa).

We are generally very happy to have bugs in documentation or code
pointed out to us, and will correct errors as we are made aware of them.


Kind regards,
Andreas

> For example, about 1/3 of the way through the Compara API tutorial [1] there
> is what is supposed to be a completely functional script.  It does not
> work.  This is in contrast to some of the earlier simple scripts (listing
> the species in  Ensmbl etc.) which do work on my machine, so I have all the
> libraries do dah installed correctly).
> 
> Very poor form to document scripts which do not function on a properly setup
> system.
> 
> I have modified my invocation of the script slightly:
>   Align.pl --set_of_species \
> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
> 
> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
> modified example of the Compara Tutoraial code.)
> As these are slightly modified perl scripts from the documantation, the line
> numbers may be variable.
> 
> I can print out the genome_dbs, and it gives me a list of genome names (hash
> tables) though it appears that is problematic in the Align.pl script.
> in spite of the fact that just previously to that call I dumped "genome_dbs"
> and got back some 25 hash tables (expected).  I believe this occurs whether
> one is comparing "human:mouse" or the more complex species set I have
> outlined above.
> 
> 
> 
> Has anyone else attempted to run the code documented in the Ensembl API
> Tutorial?
> Any suggestions as to what direction to go in would be appreciated -- when
> one is trying to copy code out of a tutorial and it fails its kind of hard
> to know where to go.)
> 
> There do appear to be some problems in the specifications of a Compara
> version/database and there don't appear to be a lot of resources informing
> one of what resources are currently available.
> 
> Robert
> 
> 
> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
Andreas K?h?ri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom

From cjfields at illinois.edu  Sat Jan  9 17:01:19 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Jan 2010 16:01:19 -0600
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
Message-ID: <743C998D-BBB5-4832-BA25-24D7D7288F78@illinois.edu>

Robert,

Ensembl errors probably should be redirected to the ensembl mail list.  I can't speak to the problems with it (they appear specific to the Ensembl tool set).

chris

On Jan 9, 2010, at 2:27 PM, Robert Bradbury wrote:

> I am trying to get the examples provided by EMBL/Ensembl to work and am
> encountering problems.
> 
> For example, about 1/3 of the way through the Compara API tutorial [1] there
> is what is supposed to be a completely functional script.  It does not
> work.  This is in contrast to some of the earlier simple scripts (listing
> the species in  Ensmbl etc.) which do work on my machine, so I have all the
> libraries do dah installed correctly).
> 
> Very poor form to document scripts which do not function on a properly setup
> system.
> 
> I have modified my invocation of the script slightly:
>  Align.pl --set_of_species \
> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
> 
> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
> modified example of the Compara Tutoraial code.)
> As these are slightly modified perl scripts from the documantation, the line
> numbers may be variable.
> 
> I can print out the genome_dbs, and it gives me a list of genome names (hash
> tables) though it appears that is problematic in the Align.pl script.
> in spite of the fact that just previously to that call I dumped "genome_dbs"
> and got back some 25 hash tables (expected).  I believe this occurs whether
> one is comparing "human:mouse" or the more complex species set I have
> outlined above.
> 
> 
> 
> Has anyone else attempted to run the code documented in the Ensembl API
> Tutorial?
> Any suggestions as to what direction to go in would be appreciated -- when
> one is trying to copy code out of a tutorial and it fails its kind of hard
> to know where to go.)
> 
> There do appear to be some problems in the specifications of a Compara
> version/database and there don't appear to be a lot of resources informing
> one of what resources are currently available.
> 
> Robert
> 
> 
> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robert.bradbury at gmail.com  Sun Jan 10 14:47:00 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sun, 10 Jan 2010 14:47:00 -0500
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <20100109220121.GA9521@quux.windows.ebi.ac.uk>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
	<20100109220121.GA9521@quux.windows.ebi.ac.uk>
Message-ID: <deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>

As it turns out the example from the file I cited (the compara API
tutorial does work).  The code that I started with may have been from
a "MS-WORD" document distributed with the documentation (which could
quite well be out-of-date).

But even the corrected code does not work for various uncommon
comparisons between species (which they may not have archived in
Ensembl).  I also don't understand enough about the functions yet as
to whether they are comparing the same regions from the same
chromosomes that just happen to be identical or whether they are
comparing the same region with a homologous region on a different
chromosome (i.e. conserved genes).  I'm going to have to dig into this
some more to figure out what is going on.

Thanks for the pointers, I'll refer future questions to the Ensembl
list/help-desk.

However, if anyone knows Ensembl very well, the database has in it
some of these interspecies comparisons already.  They are accessed
when one does a phylogeny tree for specific genes (and generally for
highly conserved gene you will get a tree that includes nearly all 50
species in the database).  As I don't think they are computed
on-the-fly, the information must be precomputed and stored someplace
in the database.  I would very much like to know how to access this
information.

Thanks,
Robert


On 1/9/10, Andreas K?h?ri <ak at ebi.ac.uk> wrote:
> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
>> I am trying to get the examples provided by EMBL/Ensembl to work and am
>> encountering problems.
>
> Hi Robert,
>
> The ensembl-dev list is the appropriate forum for this type of questions
> as it has nothing to do with bioperl.
>
> There is also the Ensembl helpdesk.  If you send your problem to
> <helpdesk at ensembl.org> I'm sure that it will be picked up by the
> appropriate people (I do myself not know enough about the Compara API to
> be able to diagnose this problem straight away I'm afraid).
>
> Be sure to submit a minimal script that still exhibit the problem, and
> information about what version of the APIs you're using (we will assume
> that you're not mixing newer version of the API with older databases or
> vice versa).
>
> We are generally very happy to have bugs in documentation or code
> pointed out to us, and will correct errors as we are made aware of them.
>
>
> Kind regards,
> Andreas
>
>> For example, about 1/3 of the way through the Compara API tutorial [1]
>> there
>> is what is supposed to be a completely functional script.  It does not
>> work.  This is in contrast to some of the earlier simple scripts (listing
>> the species in  Ensmbl etc.) which do work on my machine, so I have all
>> the
>> libraries do dah installed correctly).
>>
>> Very poor form to document scripts which do not function on a properly
>> setup
>> system.
>>
>> I have modified my invocation of the script slightly:
>>   Align.pl --set_of_species \
>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis
>> familiaris:Sus
>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
>>
>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs"
>> on
>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
>> modified example of the Compara Tutoraial code.)
>> As these are slightly modified perl scripts from the documantation, the
>> line
>> numbers may be variable.
>>
>> I can print out the genome_dbs, and it gives me a list of genome names
>> (hash
>> tables) though it appears that is problematic in the Align.pl script.
>> in spite of the fact that just previously to that call I dumped
>> "genome_dbs"
>> and got back some 25 hash tables (expected).  I believe this occurs
>> whether
>> one is comparing "human:mouse" or the more complex species set I have
>> outlined above.
>>
>>
>>
>> Has anyone else attempted to run the code documented in the Ensembl API
>> Tutorial?
>> Any suggestions as to what direction to go in would be appreciated -- when
>> one is trying to copy code out of a tutorial and it fails its kind of hard
>> to know where to go.)
>>
>> There do appear to be some problems in the specifications of a Compara
>> version/database and there don't appear to be a lot of resources informing
>> one of what resources are currently available.
>>
>> Robert
>>
>>
>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --
> Andreas K?h?ri, Ensembl Software Developer
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge CB10 1SD, United Kingdom
>


From Russell.Smithies at agresearch.co.nz  Sun Jan 10 15:34:39 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 11 Jan 2010 09:34:39 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>

An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. 
In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)

If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:

   my $taxid  = $gi_taxid_nucl{$accession};
   my $org_name = $names{$taxid};

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> Sent: Saturday, 26 December 2009 4:52 p.m.
> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Bhakti,
> The following example (using EUtilities) may serve your purpose:
> 
> use Bio::DB::EUtilities;
> 
> my (%taxa, @taxa);
> my (%names, %idmap);
> 
> # these are protein ids; nuc ids will work by changing -dbfrom =>
> 'nucleotide',
> # (probably)
> 
> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> 
> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>                                        -db => 'taxonomy',
>                                        -dbfrom => 'protein',
>                                        -correspondence => 1,
>                                        -id => \@ids);
> 
> # iterate through the LinkSet objects
> while (my $ds = $factory->next_LinkSet) {
>     $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> }
> 
> @taxa = @taxa{@ids};
> 
> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>         -db    => 'taxonomy',
>         -id    => \@taxa );
> 
> while (local $_ = $factory->next_DocSum) {
>     $names{($_->get_contents_by_name('TaxId'))[0]} =
> ($_->get_contents_by_name('ScientificName'))[0];
> }
> 
> foreach (@ids) {
>     $idmap{$_} = $names{$taxa{$_}};
> }
> 
> # %idmap is
> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> #    68536103 => 'Corynebacterium jeikeium K411'
> #    730439 => 'Bacillus caldolyticus'
> #    89318838 => undef    (this record has been removed from the db)
> 
> 1;
> 
> You probably will need to break up your 30000 into chunks
> (say, 1000-3000 each), and do the above on each chunk with a
> 
> sleep 3;
> 
> or so separating the queries.
> MAJ
> ----- Original Message -----
> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, December 25, 2009 9:46 PM
> Subject: [Bioperl-l] how to retrieve organism name from accession number?
> 
> 
> > Hi,
> >
> > Does anyone know how to retrieve the "Source" or the "Species name"
> given
> > the accession number using Bioperl.   I have these 30,000 accession
> numbers
> > for which I need to get the source organisms.  Any kind of help will be
> > appreciated.
> >
> > Thanks
> >
> > BD
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Sun Jan 10 15:49:40 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 10 Jan 2010 14:49:40 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
Message-ID: <F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>

One could also use Bio::DB::Taxonomy, which indexes the same files or (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the details).

chris

On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:

> An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. 
> In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)
> 
> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:
> 
>   my $taxid  = $gi_taxid_nucl{$accession};
>   my $org_name = $names{$taxid};
> 
> --Russell
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>> Sent: Saturday, 26 December 2009 4:52 p.m.
>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> Bhakti,
>> The following example (using EUtilities) may serve your purpose:
>> 
>> use Bio::DB::EUtilities;
>> 
>> my (%taxa, @taxa);
>> my (%names, %idmap);
>> 
>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>> 'nucleotide',
>> # (probably)
>> 
>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>> 
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>                                       -db => 'taxonomy',
>>                                       -dbfrom => 'protein',
>>                                       -correspondence => 1,
>>                                       -id => \@ids);
>> 
>> # iterate through the LinkSet objects
>> while (my $ds = $factory->next_LinkSet) {
>>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>> }
>> 
>> @taxa = @taxa{@ids};
>> 
>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>        -db    => 'taxonomy',
>>        -id    => \@taxa );
>> 
>> while (local $_ = $factory->next_DocSum) {
>>    $names{($_->get_contents_by_name('TaxId'))[0]} =
>> ($_->get_contents_by_name('ScientificName'))[0];
>> }
>> 
>> foreach (@ids) {
>>    $idmap{$_} = $names{$taxa{$_}};
>> }
>> 
>> # %idmap is
>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>> #    68536103 => 'Corynebacterium jeikeium K411'
>> #    730439 => 'Bacillus caldolyticus'
>> #    89318838 => undef    (this record has been removed from the db)
>> 
>> 1;
>> 
>> You probably will need to break up your 30000 into chunks
>> (say, 1000-3000 each), and do the above on each chunk with a
>> 
>> sleep 3;
>> 
>> or so separating the queries.
>> MAJ
>> ----- Original Message -----
>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, December 25, 2009 9:46 PM
>> Subject: [Bioperl-l] how to retrieve organism name from accession number?
>> 
>> 
>>> Hi,
>>> 
>>> Does anyone know how to retrieve the "Source" or the "Species name"
>> given
>>> the accession number using Bioperl.   I have these 30,000 accession
>> numbers
>>> for which I need to get the source organisms.  Any kind of help will be
>>> appreciated.
>>> 
>>> Thanks
>>> 
>>> BD
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Sun Jan 10 16:05:06 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 11 Jan 2010 10:05:06 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>

I've started to go off eUtils recently (not BioPerl's fault) as I've often been finding that with large queries, chunks of the resulting data is missing.
For example, before Xmas I was creating species-specific databases by using eUtils to get a list of GI numbers back for a taxid, then retrieving the fasta sequences in chunks of 500.
Very regularly, in the middle of the fasta there would be a message about resource unavailable eg.
  >test_sequence_1
  TACGATCATCGCTResource UnavailableTACGACTCTGCT
  >test_sequence_2
  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT

Often this wasn't detected until formatdb complained about invalid characters.
Inquiries to NCBI as to why this was happening and what to do about it returned stupid answers ("do each sequence manually thru the web interface", or "use eUtils").
As we have a nice fast network connection, I now prefer to download very large gzip files (i.e. all of refseq) and extract what I need.

I can't help but think that NCBI could solve a lot of problems if they gzipped the output from eUtils queries - it's something I've requested regularly for the last 5 years or so!!

--Russell


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Monday, 11 January 2010 9:50 a.m.
> To: Smithies, Russell
> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> One could also use Bio::DB::Taxonomy, which indexes the same files or
> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
> details).
> 
> chris
> 
> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> 
> > An alternate non-BioPerly way (that may be faster given NCBI's flakiness
> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and
> do lookups.
> > In that same dir, taxdump.tar.gz contains a file called names.dmp which
> lists taxids and descriptions (and synonyms)
> >
> > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> could do this:
> >
> >   my $taxid  = $gi_taxid_nucl{$accession};
> >   my $org_name = $names{$taxid};
> >
> > --Russell
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >> Sent: Saturday, 26 December 2009 4:52 p.m.
> >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> Bhakti,
> >> The following example (using EUtilities) may serve your purpose:
> >>
> >> use Bio::DB::EUtilities;
> >>
> >> my (%taxa, @taxa);
> >> my (%names, %idmap);
> >>
> >> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >> 'nucleotide',
> >> # (probably)
> >>
> >> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>
> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>                                       -db => 'taxonomy',
> >>                                       -dbfrom => 'protein',
> >>                                       -correspondence => 1,
> >>                                       -id => \@ids);
> >>
> >> # iterate through the LinkSet objects
> >> while (my $ds = $factory->next_LinkSet) {
> >>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >> }
> >>
> >> @taxa = @taxa{@ids};
> >>
> >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>        -db    => 'taxonomy',
> >>        -id    => \@taxa );
> >>
> >> while (local $_ = $factory->next_DocSum) {
> >>    $names{($_->get_contents_by_name('TaxId'))[0]} =
> >> ($_->get_contents_by_name('ScientificName'))[0];
> >> }
> >>
> >> foreach (@ids) {
> >>    $idmap{$_} = $names{$taxa{$_}};
> >> }
> >>
> >> # %idmap is
> >> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >> #    68536103 => 'Corynebacterium jeikeium K411'
> >> #    730439 => 'Bacillus caldolyticus'
> >> #    89318838 => undef    (this record has been removed from the db)
> >>
> >> 1;
> >>
> >> You probably will need to break up your 30000 into chunks
> >> (say, 1000-3000 each), and do the above on each chunk with a
> >>
> >> sleep 3;
> >>
> >> or so separating the queries.
> >> MAJ
> >> ----- Original Message -----
> >> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Friday, December 25, 2009 9:46 PM
> >> Subject: [Bioperl-l] how to retrieve organism name from accession
> number?
> >>
> >>
> >>> Hi,
> >>>
> >>> Does anyone know how to retrieve the "Source" or the "Species name"
> >> given
> >>> the accession number using Bioperl.   I have these 30,000 accession
> >> numbers
> >>> for which I need to get the source organisms.  Any kind of help will
> be
> >>> appreciated.
> >>>
> >>> Thanks
> >>>
> >>> BD
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From avilella at gmail.com  Sun Jan 10 16:05:13 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 10 Jan 2010 21:05:13 +0000
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
	<20100109220121.GA9521@quux.windows.ebi.ac.uk>
	<deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>
Message-ID: <358f4d651001101305q1b75cfe3q558a245ab1ab1238@mail.gmail.com>

> However, if anyone knows Ensembl very well, the database has in it
> some of these interspecies comparisons already. ?They are accessed
> when one does a phylogeny tree for specific genes (and generally for
> highly conserved gene you will get a tree that includes nearly all 50
> species in the database). ?As I don't think they are computed
> on-the-fly, the information must be precomputed and stored someplace
> in the database. ?I would very much like to know how to access this
> information.

Yes, they are. You can access the data programmatically by installing
the ensembl and ensembl-compara Perl APIs.
There are a few example scripts for the GeneTrees:

ensembl-compara/scripts/examples/homology*.pl

Cheers,

Albert.

> Thanks,
> Robert
>
>
>
>
> On 1/9/10, Andreas K?h?ri <ak at ebi.ac.uk> wrote:
>> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
>>> I am trying to get the examples provided by EMBL/Ensembl to work and am
>>> encountering problems.
>>
>> Hi Robert,
>>
>> The ensembl-dev list is the appropriate forum for this type of questions
>> as it has nothing to do with bioperl.
>>
>> There is also the Ensembl helpdesk. ?If you send your problem to
>> <helpdesk at ensembl.org> I'm sure that it will be picked up by the
>> appropriate people (I do myself not know enough about the Compara API to
>> be able to diagnose this problem straight away I'm afraid).
>>
>> Be sure to submit a minimal script that still exhibit the problem, and
>> information about what version of the APIs you're using (we will assume
>> that you're not mixing newer version of the API with older databases or
>> vice versa).
>>
>> We are generally very happy to have bugs in documentation or code
>> pointed out to us, and will correct errors as we are made aware of them.
>>
>>
>> Kind regards,
>> Andreas
>>
>>> For example, about 1/3 of the way through the Compara API tutorial [1]
>>> there
>>> is what is supposed to be a completely functional script. ?It does not
>>> work. ?This is in contrast to some of the earlier simple scripts (listing
>>> the species in ?Ensmbl etc.) which do work on my machine, so I have all
>>> the
>>> libraries do dah installed correctly).
>>>
>>> Very poor form to document scripts which do not function on a properly
>>> setup
>>> system.
>>>
>>> I have modified my invocation of the script slightly:
>>> ? Align.pl --set_of_species \
>>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
>>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
>>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis
>>> familiaris:Sus
>>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
>>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
>>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
>>>
>>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs"
>>> on
>>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
>>> modified example of the Compara Tutoraial code.)
>>> As these are slightly modified perl scripts from the documantation, the
>>> line
>>> numbers may be variable.
>>>
>>> I can print out the genome_dbs, and it gives me a list of genome names
>>> (hash
>>> tables) though it appears that is problematic in the Align.pl script.
>>> in spite of the fact that just previously to that call I dumped
>>> "genome_dbs"
>>> and got back some 25 hash tables (expected). ?I believe this occurs
>>> whether
>>> one is comparing "human:mouse" or the more complex species set I have
>>> outlined above.
>>>
>>>
>>>
>>> Has anyone else attempted to run the code documented in the Ensembl API
>>> Tutorial?
>>> Any suggestions as to what direction to go in would be appreciated -- when
>>> one is trying to copy code out of a tutorial and it fails its kind of hard
>>> to know where to go.)
>>>
>>> There do appear to be some problems in the specifications of a Compara
>>> version/database and there don't appear to be a lot of resources informing
>>> one of what resources are currently available.
>>>
>>> Robert
>>>
>>>
>>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Andreas K?h?ri, Ensembl Software Developer
>> European Bioinformatics Institute (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge CB10 1SD, United Kingdom
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From alessandra.bilardi at gmail.com  Sun Jan 10 18:21:12 2010
From: alessandra.bilardi at gmail.com (Alessandra)
Date: Mon, 11 Jan 2010 00:21:12 +0100
Subject: [Bioperl-l] GBrowse.org project
In-Reply-To: <e0996aca1001101515q8121c87o9b90310691fcd640@mail.gmail.com>
References: <e0996aca1001101515q8121c87o9b90310691fcd640@mail.gmail.com>
Message-ID: <e0996aca1001101521p30b46829p93ee75dd797829b1@mail.gmail.com>

 Hi all,

   I'm Alessandra and I run GBrowse.org.
GBrowse.org is a resource for using and setting up GBrowse genome
browsers. The site provides one location where biologists and
bioinformaticians can find:

  1. Genome browser web sites for any organism that has them. If a
species has a genome browser anywhere on the web, then we aim to link
to it.
  2. Links to sequence and annotation files that are available online.
  3. Links to genome browser configuration files, when available
  4. An FTP site containing genome annotation and configuration files
for each annotated genome that does not have its own web site.

GBrowse.org emphasizes the GBrowse genome browser in its organization,
but also links to sites that use other browser packages such as UCSC,
Ensembl, and JBrowse.

Also, we are currently conducting a survey seeking input on future
project direction. Please take a few minutes now to provide your
feedback.

   Survey link: http://gbrowse.org/survey/index.php?sid=64264&lang=en
   GBrowse.org introdution link:
http://gmod.org/wiki/August_2009_GMOD_Meeting#GBrowse.org

   Thank you for your help,

   Alessandra Bilardi.
   http://gbrowse.org/
   CRIBI Genomics, University of Padua
   http://genomics.cribi.unipd.it/

From cjfields at illinois.edu  Sun Jan 10 22:04:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 10 Jan 2010 21:04:13 -0600
Subject: [Bioperl-l] GMOD BioPerl Meeting
Message-ID: <7D72ECC2-E856-4C09-B67A-62AFFB59B377@illinois.edu>

Just a quick reminder that we're having a BioPerl satellite meeting after the PAG Conference (just prior to the GMOD Meeting).  The meeting is this Wednesday, Jan. 13, starting at 11:30am, at the Best Western Seven Seas in San Diego.  I will update the relevant BioPerl and GMOD pages with more details as they become available.  At the moment, we will be meeting in the hotel lobby prior to starting the meeting and possible hackathon.  

http://www.bioperl.org/wiki/GMOD_2010_Meeting
http://gmod.org/wiki/January_2010_GMOD_Meeting#Satellite_Meetings

Thanks!

chris

From bernd.jagla at pasteur.fr  Mon Jan 11 05:11:16 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 11 Jan 2010 11:11:16 +0100
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
Message-ID: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>

Hi,

 
First off, I am not sure if this is supposed to be addressed to the Bioperl
or Gbrowse mailing list, so apologies if this is the wrong list and please
let me know.

 
I am writing a program in Java that needs to access genome annotation data.
Since I am using Gbrowse already I was thinking that I could combine both
approaches making life eventually easier for me. I am mainly interested in
getting a gene/feature name for a given position. The position is stored in
the feature table and through linking typelist, locationlist, (maybe
sequence), and feature I can get all the information I need. Unfortunately
it seems that the feature name is stored in the object blog of the feature
table. 

 
That is a bit suspicious to me because I don't understand why searching for
a name can be so fast if it is not indexed through mysql when searching
using GBrowse.

 
So my question is how to I parse the Bio::DB::SeqFeature object in JAVA
correctly to get the name of the feature and possible also any further
information.

 
Any suggestions are greatly appreciated. Maybe there is a better solution
than parsing Perl code with Java.?

 
Thanks a lot,

 
Bernd


From biopython at maubp.freeserve.co.uk  Mon Jan 11 05:48:52 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 10:48:52 +0000
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
In-Reply-To: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
Message-ID: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>

On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla <bernd.jagla at pasteur.fr> wrote:
> Hi,
>
> First off, I am not sure if this is supposed to be addressed to the Bioperl
> or Gbrowse mailing list, so apologies if this is the wrong list and please
> let me know.
>
> I am writing a program in Java that needs to access genome annotation data.
> Since I am using Gbrowse already I was thinking that I could combine both
> approaches making life eventually easier for me. I am mainly interested in
> getting a gene/feature name for a given position. The position is stored in
> the feature table and through linking typelist, locationlist, (maybe
> sequence), and feature I can get all the information I need. Unfortunately
> it seems that the feature name is stored in the object blog of the feature
> table.

How are you storing the data in Gbrowse? There are several back ends,
and this will make a big difference for accessing the raw data.

One option would be to use Gbrowse with BioSQL as the backend.
You can then use BioJava (or BioPerl, or BioPython, etc) to access the
database. The only downside is Gbrowse isn't working 100% on top
of BioSQL right now (I'd like to see this fixed, but I don't know Perl).
There is an open bug on this [ gmod-Bugs-2168597 ].

Peter

From bernd.jagla at pasteur.fr  Mon Jan 11 05:53:20 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 11 Jan 2010 11:53:20 +0100
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
In-Reply-To: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>
References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
	<320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>
Message-ID: <9056164A8A744A77B6CD1E8E4E20B104@zillumina>

I am using bp_seqfeature_load.pl to load my features. That is using
Bio:DB:SeqFeature(Store) and MySql as a backend... That's all I
understood...

B

> -----Original Message-----
> From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On
> Behalf Of Peter
> Sent: Monday, January 11, 2010 11:49 AM
> To: Bernd Jagla
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
> 
> On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla <bernd.jagla at pasteur.fr>
> wrote:
> > Hi,
> >
> > First off, I am not sure if this is supposed to be addressed to the
> Bioperl
> > or Gbrowse mailing list, so apologies if this is the wrong list and
> please
> > let me know.
> >
> > I am writing a program in Java that needs to access genome annotation
> data.
> > Since I am using Gbrowse already I was thinking that I could combine
> both
> > approaches making life eventually easier for me. I am mainly interested
> in
> > getting a gene/feature name for a given position. The position is stored
> in
> > the feature table and through linking typelist, locationlist, (maybe
> > sequence), and feature I can get all the information I need.
> Unfortunately
> > it seems that the feature name is stored in the object blog of the
> feature
> > table.
> 
> How are you storing the data in Gbrowse? There are several back ends,
> and this will make a big difference for accessing the raw data.
> 
> One option would be to use Gbrowse with BioSQL as the backend.
> You can then use BioJava (or BioPerl, or BioPython, etc) to access the
> database. The only downside is Gbrowse isn't working 100% on top
> of BioSQL right now (I'd like to see this fixed, but I don't know Perl).
> There is an open bug on this [ gmod-Bugs-2168597 ].
> 
> Peter


From awitney at sgul.ac.uk  Mon Jan 11 07:21:07 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 11 Jan 2010 12:21:07 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
Message-ID: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>

Hi,

I am writing a script to automate the running of Phylip Pars. In the process i have to create a Bio::AlignIO object from a set of data that i have in a hash.

I could write the hash data into a phylip file and then load the Bio::AlignIO from that file, but i wondered if i could skip the writing and then reading of a temporary file ?

thanks for any help

adam

From roy.chaudhuri at gmail.com  Mon Jan 11 08:54:25 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 11 Jan 2010 13:54:25 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <4B4B2A51.9040602@gmail.com>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com>
Message-ID: <4B4B2D91.70906@gmail.com>

Actually, I guess some sample code would be more helpful:

use Bio::LocatableSeq;
use Bio::SimpleAlign;
use Bio::AlignIO;
my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, 
-end=>4);
my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, 
-end=>3);
my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, 
-end=>5);
my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);

Cheers,
Roy.


On 11/01/2010 13:40, Roy Chaudhuri wrote:
> Hi Adam,
>
> I'm guessing you actually want to create a Bio::SimpleAlign object
> (representing an alignment), rather than a Bio::AlignIO object (which is
> just for reading/writing alignment files). Bio::SimpleAlign has a
> documented new method that allows you to construct an alignment from
> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but
> include gaps and start/end coordinates to describe their relationship to
> other sequences in the alignment.
>
> Roy.
>
> On 11/01/2010 12:21, Adam Witney wrote:
>> Hi,
>>
>> I am writing a script to automate the running of Phylip Pars. In the
>> process i have to create a Bio::AlignIO object from a set of data
>> that i have in a hash.
>>
>> I could write the hash data into a phylip file and then load the
>> Bio::AlignIO from that file, but i wondered if i could skip the
>> writing and then reading of a temporary file ?
>>
>> thanks for any help
>>
>> adam _______________________________________________ Bioperl-l
>> mailing list Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From roy.chaudhuri at gmail.com  Mon Jan 11 08:40:33 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 11 Jan 2010 13:40:33 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
Message-ID: <4B4B2A51.9040602@gmail.com>

Hi Adam,

I'm guessing you actually want to create a Bio::SimpleAlign object 
(representing an alignment), rather than a Bio::AlignIO object (which is 
just for reading/writing alignment files). Bio::SimpleAlign has a 
documented new method that allows you to construct an alignment from 
Bio::LocatableSeq objects, which are similar to Bio::Seq objects but 
include gaps and start/end coordinates to describe their relationship to 
other sequences in the alignment.

Roy.

On 11/01/2010 12:21, Adam Witney wrote:
> Hi,
>
> I am writing a script to automate the running of Phylip Pars. In the
> process i have to create a Bio::AlignIO object from a set of data
> that i have in a hash.
>
> I could write the hash data into a phylip file and then load the
> Bio::AlignIO from that file, but i wondered if i could skip the
> writing and then reading of a temporary file ?
>
> thanks for any help
>
> adam _______________________________________________ Bioperl-l
> mailing list Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Mon Jan 11 09:16:45 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 14:16:45 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
Message-ID: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>

Hi,

I'm running bioperl-live from SVN, just updated to revision 16648.

$ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.0069

I am trying to get Bio::SeqIO to convert a multiple record EMBL
file into GenBank format, piping the data via stdin/stdout using
the following trivial Perl script:

#!/usr/bin/env perl
use Bio::SeqIO;
my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
my $out = Bio::SeqIO->new(-format => 'genbank');
while (my $seq = $in->next_seq) { $out->write_seq($seq) };

This only seems to find the first EMBL record in my example
files. For example, this simple file has just two contig records:
http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl

This is just the first two records taken from a much larger EMBL file
rel_con_hum_01_r102.dat downloaded and uncompressed from:
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz

Trying both these examples as input, BioPerl just gives a single
GenBank record as output (the first EMBL entry in the input).

Is this a BioPerl bug, or am I missing something?

Peter

From maj at fortinbras.us  Mon Jan 11 10:04:00 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 11 Jan 2010 10:04:00 -0500
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>

Hi Peter, 
I found the issue-- there are no SQ lines in the data, and 
having them is a key stop condition in the parser (line 438 embl.pm).
We evidently need to be more liberal in what we accept, even as we 
are strict in what we emit. Could you make a bug report?
thanks for the heads-up--
MAJ
----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "bioperl-l list" <bioperl-l at lists.open-bio.org>
Sent: Monday, January 11, 2010 9:16 AM
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records


> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz
> 
> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From biopython at maubp.freeserve.co.uk  Mon Jan 11 10:17:37 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 15:17:37 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
Message-ID: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>

On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
> Hi Peter, I found the issue-- there are no SQ lines in the data, and having
> them is a key stop condition in the parser (line 438 embl.pm).
> We evidently need to be more liberal in what we accept, even as we are
> strict in what we emit. Could you make a bug report?
> thanks for the heads-up--
> MAJ

Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982

These are EMBL contig records, so they don't have SQ lines,
but instead CO lines.

Peter

From cjfields at illinois.edu  Mon Jan 11 10:24:24 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 09:24:24 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
	<320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>
Message-ID: <CDB3F40D-0298-410B-9814-3D9721380EBA@illinois.edu>


On Jan 11, 2010, at 9:17 AM, Peter wrote:

> On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>> 
>> Hi Peter, I found the issue-- there are no SQ lines in the data, and having
>> them is a key stop condition in the parser (line 438 embl.pm).
>> We evidently need to be more liberal in what we accept, even as we are
>> strict in what we emit. Could you make a bug report?
>> thanks for the heads-up--
>> MAJ
> 
> Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982
> 
> These are EMBL contig records, so they don't have SQ lines,
> but instead CO lines.
> 
> Peter

Peter, 

Just curious, but have you tried the experimental EMBL parser 'embldriver'?  I don't think it's bound to the same strictures, but I may be mistaken.

chris

From cjfields at illinois.edu  Mon Jan 11 10:23:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 09:23:00 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <0D0D9DB5-56FA-414E-8D1D-3FE18198F7EC@illinois.edu>

Just saw that mark responded, so if possible submit a bug.  We may be doing a mini-hackathon this Wednesday, so we can probably tackle it in the process (possibly along with a few other pressing issues).

chris

On Jan 11, 2010, at 8:16 AM, Peter wrote:

> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz
> 
> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Mon Jan 11 10:55:26 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 15:55:26 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <C771056E.6204%hrh@fmi.ch>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
Message-ID: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>

On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>
> These entries form the CON data class, see:
> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
> and they don't contain any sequence information.

I know - GenBank files have a similar system with CONTIG
lines instead of sequences. I was expecting BioPerl to be
able to convert these EMBL files with CO lines into GenBank
files with CONTIG lines.

> If you take the 'expanded' entries from
> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
> your script will work.

That's a useful tip - thanks.

Peter

From hrh at fmi.ch  Mon Jan 11 10:42:22 2010
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Mon, 11 Jan 2010 16:42:22 +0100
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <C771056E.6204%hrh@fmi.ch>


On 1/11/10 3:16 PM, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz

These entries form the CON data class, see:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
and they don't contain any sequence information.

If you take the 'expanded' entries from
ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r
102.dat.gz
your script will work.


Hans


> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Mon Jan 11 11:27:15 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 11 Jan 2010 16:27:15 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <4B4B2D91.70906@gmail.com>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
Message-ID: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>


Ah excellent, thanks Roy. I was indeed thinking about it the wrong way.

In the process of writing this i have created a 

Bio::Tools::Run::Phylo::Phylip::Pars class

which is essentially just a modified copy of ProtPars. I have also fixed a few typos and possible bugs in

Bio/Tools/Run/Phylo/Phylip/Base.pm
Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm
Bio/AlignIO/phylip.pm
Bio/Tools/Run/Alignment/Clustalw.pm

I am of course happy to send these back in to the project... how would i best do this?

Cheers

adam


On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote:

> Actually, I guess some sample code would be more helpful:
> 
> use Bio::LocatableSeq;
> use Bio::SimpleAlign;
> use Bio::AlignIO;
> my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4);
> my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3);
> my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5);
> my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);
> 
> Cheers,
> Roy.
> 
> 
> On 11/01/2010 13:40, Roy Chaudhuri wrote:
>> Hi Adam,
>> 
>> I'm guessing you actually want to create a Bio::SimpleAlign object
>> (representing an alignment), rather than a Bio::AlignIO object (which is
>> just for reading/writing alignment files). Bio::SimpleAlign has a
>> documented new method that allows you to construct an alignment from
>> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but
>> include gaps and start/end coordinates to describe their relationship to
>> other sequences in the alignment.
>> 
>> Roy.
>> 
>> On 11/01/2010 12:21, Adam Witney wrote:
>>> Hi,
>>> 
>>> I am writing a script to automate the running of Phylip Pars. In the
>>> process i have to create a Bio::AlignIO object from a set of data
>>> that i have in a hash.
>>> 
>>> I could write the hash data into a phylip file and then load the
>>> Bio::AlignIO from that file, but i wondered if i could skip the
>>> writing and then reading of a temporary file ?
>>> 
>>> thanks for any help
>>> 
>>> adam _______________________________________________ Bioperl-l
>>> mailing list Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 


From Russell.Smithies at agresearch.co.nz  Mon Jan 11 22:41:02 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 12 Jan 2010 16:41:02 +1300
Subject: [Bioperl-l] BioPerl version?
In-Reply-To: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>

Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ?

--Russell

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Mon Jan 11 22:59:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 21:59:44 -0600
Subject: [Bioperl-l] BioPerl version?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
	<18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>
Message-ID: <795BD926-4AE9-4478-AAD5-E36558350745@illinois.edu>

Not dumb, but a frequently asked one: that's a FAQ question ;>

http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'

chris

On Jan 11, 2010, at 9:41 PM, Smithies, Russell wrote:

> Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ?
> 
> --Russell
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 12 11:02:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 10:02:02 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
	<320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
Message-ID: <ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>

On Jan 11, 2010, at 9:55 AM, Peter wrote:

> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>> 
>> These entries form the CON data class, see:
>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>> and they don't contain any sequence information.
> 
> I know - GenBank files have a similar system with CONTIG
> lines instead of sequences. I was expecting BioPerl to be
> able to convert these EMBL files with CO lines into GenBank
> files with CONTIG lines.

IIRC the contig information for GenBank is stored in annotation.  We can try to ensure the data is carried over to EMBL properly.

>> If you take the 'expanded' entries from
>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>> your script will work.
> 
> That's a useful tip - thanks.
> 
> Peter

NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).

chris

From biopython at maubp.freeserve.co.uk  Tue Jan 12 11:19:32 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 12 Jan 2010 16:19:32 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
	<320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
	<ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>
Message-ID: <320fb6e01001120819u50e73fa8k9bde8aa1abdf942d@mail.gmail.com>

On Tue, Jan 12, 2010 at 4:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Jan 11, 2010, at 9:55 AM, Peter wrote:
>
>> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>>>
>>> These entries form the CON data class, see:
>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>>> and they don't contain any sequence information.
>>
>> I know - GenBank files have a similar system with CONTIG
>> lines instead of sequences. I was expecting BioPerl to be
>> able to convert these EMBL files with CO lines into GenBank
>> files with CONTIG lines.
>
> IIRC the contig information for GenBank is stored in annotation.
> We can try to ensure the data is carried over to EMBL properly.

For contig records (where there is no sequence) I think we just
need to map the GenBank CONTIG lines to the EMBL CO lines,
and vice versa. At least, that's what Biopython now does (trunk
code, not yet released).

>>> If you take the 'expanded' entries from
>>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>>> your script will work.
>>
>> That's a useful tip - thanks.
>>
>> Peter
>
> NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).

Indeed. This is a useful work around for when a parser couldn't
cope with the contig version of a GenBank file for some reason, e.g.
http://bugzilla.open-bio.org/show_bug.cgi?id=2745

Peter

From maj at fortinbras.us  Tue Jan 12 12:33:30 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 12:33:30 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
Message-ID: <231A8D9473704E7697F7A486A0CDA86A@NewLife>

Hi All--

The beta of Bio::DB::SoapEUtilities is now available in the
bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
service. The system is fully WSDL based, and all eutils are
available. The best thing (IMHO) are the result adaptors, which
provide conversion and iteration of SOAP results into BioPerl
objects. Schau, mal:

 use Bio::DB::EUtilities;
 my $fac = Bio::DB::EUtilities->new(); # step 1
 my $seqio = $fac->esearch(
       -db => 'nucleotide', 
       -term => 'HIV1 and CCR5 and Brazil'
    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
 # yes, it's already done the efetch under the hood...
 while ( my $seq = $seqio->next_seq ) { # step 4
  # do something with $seq, a Bio::Seq object...
 }

or this:

 my $links = $fac->elink( -db => 'protein', 
                          -dbfrom => 'nucleotide',
                          -id => \@nucids )->run( -auto_adapt => 1 );
 
 # maybe more than one associated id...
 my @prot_0 = $links->id_map( $nucids[0] );
   
 while ( my $ls = $links->next_linkset ) {
    @ids = $ls->ids;
    @submitted_ids = $ls->submitted_ids;
    # etc.
 }

and much, much more. See

http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service

and of course, the POD, for all the details, including 
download/installation. Tests in bioperl-run/t.

cheers, 
MAJ

-- No new dependencies were added or animals mistreated 
-- during the making of these modules.


From sheldon.mckay at gmail.com  Tue Jan 12 13:02:53 2010
From: sheldon.mckay at gmail.com (Sheldon McKay)
Date: Tue, 12 Jan 2010 10:02:53 -0800
Subject: [Bioperl-l] code.open-bio.org timing out?
Message-ID: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>

Hi all,

I keep timing out trying to do an svn checkout of bioperl-live from
code.open-bio.org.  Any suggestions?

Thanks,
Sheldon

----
Sheldon McKay, PhD
Lead, iPlant Tree of Life Engagement Team;
Research Investigator
Cold Spring Harbor Laboratory
http://mckay.cshl.edu
Google Voice:  (203) 701-9204


On Tue, Nov 3, 2009 at 9:09 AM, Aaron Mackey <amackey at virginia.edu> wrote:
> [ajm6q at lc4 bioperl-live]$ svn update
> svn: Decompression of svndiff data failed
>
>
> I'll admit to not having svn updated in awhile; A clean, anonymous svn co
> failed with the same message:
>
> [...]
> A ? ?bioperl-live/Bio/Structure/StructureI.pm
> A ? ?bioperl-live/Bio/Structure/IO
> svn: Decompression of svndiff data failed
>
> -Aaron
>
> P.S. I used this command: svn co svn://
> code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From biopython at maubp.freeserve.co.uk  Tue Jan 12 13:12:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 12 Jan 2010 18:12:46 +0000
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
Message-ID: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>

On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com> wrote:
> Hi all,
>
> I keep timing out trying to do an svn checkout of bioperl-live from
> code.open-bio.org. ?Any suggestions?
>
> Thanks,
> Sheldon

The OBF team know about this (its being discussed on root-l),
hopefully they'll have it fixed before too long.

Peter


From cjfields at illinois.edu  Tue Jan 12 13:18:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 12:18:45 -0600
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
	<320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
Message-ID: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>

On Jan 12, 2010, at 12:12 PM, Peter wrote:

> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com> wrote:
>> Hi all,
>> 
>> I keep timing out trying to do an svn checkout of bioperl-live from
>> code.open-bio.org.  Any suggestions?
>> 
>> Thanks,
>> Sheldon
> 
> The OBF team know about this (its being discussed on root-l),
> hopefully they'll have it fixed before too long.
> 
> Peter

We probably need to set up some automatic syncing of our read-only code.google.com repo as a backup.  Jason had originally set that up, hopefully he'll respond.

chris

From jason at bioperl.org  Tue Jan 12 13:27:55 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 12 Jan 2010 10:27:55 -0800
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
	<320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
	<8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>
Message-ID: <C9DDBB08-DB88-4596-AED3-B3FD89893C55@bioperl.org>

Hi - I had setup the google code sync, but then the unfortunately  
realization that the revision numbers are shared among the wiki and  
the code SVN (all 1 repo) so when I added a wiki page on the site I  
screwed up the numbering and it wasn't possible to sync anymore (that  
I could figure out) without resetting it and I haven't gone back to  
that. Sorry - I wasn't sure if we had figured out what we wanted to  
for repositories so I sort of stopped worrying about it.


-jason
On Jan 12, 2010, at 10:18 AM, Chris Fields wrote:

> On Jan 12, 2010, at 12:12 PM, Peter wrote:
>
>> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com 
>> > wrote:
>>> Hi all,
>>>
>>> I keep timing out trying to do an svn checkout of bioperl-live from
>>> code.open-bio.org.  Any suggestions?
>>>
>>> Thanks,
>>> Sheldon
>>
>> The OBF team know about this (its being discussed on root-l),
>> hopefully they'll have it fixed before too long.
>>
>> Peter
>
> We probably need to set up some automatic syncing of our read-only  
> code.google.com repo as a backup.  Jason had originally set that up,  
> hopefully he'll respond.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From virajj at gmail.com  Wed Jan  6 13:20:39 2010
From: virajj at gmail.com (Vijayaraj Nagarajan)
Date: Wed, 6 Jan 2010 13:20:39 -0500
Subject: [Bioperl-l] targetp request
Message-ID: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>

Hi,

I am trying to use targetP in bioperl.
the documentation at the bioperl site is a bit confusing to me...

I would appreciate if you could give a very small example, as to how to use
"Bio::Tools::TargetP" to predict the localization of a protein sequence that
i have stored as a string.

Thanks,
Vijay

From cjfields at illinois.edu  Tue Jan 12 18:36:53 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 17:36:53 -0600
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
Message-ID: <D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>

Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API conflict with the current EUtilities tools.

chris

On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:

> Hi All--
> 
> The beta of Bio::DB::SoapEUtilities is now available in the
> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
> service. The system is fully WSDL based, and all eutils are
> available. The best thing (IMHO) are the result adaptors, which
> provide conversion and iteration of SOAP results into BioPerl
> objects. Schau, mal:
> 
> use Bio::DB::EUtilities;
> my $fac = Bio::DB::EUtilities->new(); # step 1
> my $seqio = $fac->esearch(
>       -db => 'nucleotide', 
>       -term => 'HIV1 and CCR5 and Brazil'
>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
> # yes, it's already done the efetch under the hood...
> while ( my $seq = $seqio->next_seq ) { # step 4
>  # do something with $seq, a Bio::Seq object...
> }
> 
> or this:
> 
> my $links = $fac->elink( -db => 'protein', 
>                          -dbfrom => 'nucleotide',
>                          -id => \@nucids )->run( -auto_adapt => 1 );
> 
> # maybe more than one associated id...
> my @prot_0 = $links->id_map( $nucids[0] );
> 
> while ( my $ls = $links->next_linkset ) {
>    @ids = $ls->ids;
>    @submitted_ids = $ls->submitted_ids;
>    # etc.
> }
> 
> and much, much more. See
> 
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
> 
> and of course, the POD, for all the details, including 
> download/installation. Tests in bioperl-run/t.
> 
> cheers, 
> MAJ
> 
> -- No new dependencies were added or animals mistreated 
> -- during the making of these modules.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 12 19:22:10 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 18:22:10 -0600
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
	<D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
	<5AD210CB0C444A57881BBDD34DE99149@NewLife>
Message-ID: <B536964F-8F2F-4E07-9FD3-B7D0A945253E@illinois.edu>

Okay, just making sure (I was getting a bit paranoid).  Great work on the SOAP interface, BTW!

chris

On Jan 12, 2010, at 6:08 PM, Mark A. Jensen wrote:

> Um, yeah.
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "BioPerl List" <bioperl-l at bioperl.org>
> Sent: Tuesday, January 12, 2010 6:36 PM
> Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service
> 
> 
> Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API conflict with the current EUtilities tools.
> 
> chris
> 
> On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:
> 
>> Hi All--
>> 
>> The beta of Bio::DB::SoapEUtilities is now available in the
>> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
>> service. The system is fully WSDL based, and all eutils are
>> available. The best thing (IMHO) are the result adaptors, which
>> provide conversion and iteration of SOAP results into BioPerl
>> objects. Schau, mal:
>> 
>> use Bio::DB::EUtilities;
>> my $fac = Bio::DB::EUtilities->new(); # step 1
>> my $seqio = $fac->esearch(
>>      -db => 'nucleotide',
>>      -term => 'HIV1 and CCR5 and Brazil'
>>   )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
>> # yes, it's already done the efetch under the hood...
>> while ( my $seq = $seqio->next_seq ) { # step 4
>> # do something with $seq, a Bio::Seq object...
>> }
>> 
>> or this:
>> 
>> my $links = $fac->elink( -db => 'protein',
>>                         -dbfrom => 'nucleotide',
>>                         -id => \@nucids )->run( -auto_adapt => 1 );
>> 
>> # maybe more than one associated id...
>> my @prot_0 = $links->id_map( $nucids[0] );
>> 
>> while ( my $ls = $links->next_linkset ) {
>>   @ids = $ls->ids;
>>   @submitted_ids = $ls->submitted_ids;
>>   # etc.
>> }
>> 
>> and much, much more. See
>> 
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>> 
>> and of course, the POD, for all the details, including
>> download/installation. Tests in bioperl-run/t.
>> 
>> cheers,
>> MAJ
>> 
>> -- No new dependencies were added or animals mistreated
>> -- during the making of these modules.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Tue Jan 12 19:08:12 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 19:08:12 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
	<D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
Message-ID: <5AD210CB0C444A57881BBDD34DE99149@NewLife>

Um, yeah.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Tuesday, January 12, 2010 6:36 PM
Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web 
service


Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's 
Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API 
conflict with the current EUtilities tools.

chris

On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:

> Hi All--
>
> The beta of Bio::DB::SoapEUtilities is now available in the
> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
> service. The system is fully WSDL based, and all eutils are
> available. The best thing (IMHO) are the result adaptors, which
> provide conversion and iteration of SOAP results into BioPerl
> objects. Schau, mal:
>
> use Bio::DB::EUtilities;
> my $fac = Bio::DB::EUtilities->new(); # step 1
> my $seqio = $fac->esearch(
>       -db => 'nucleotide',
>       -term => 'HIV1 and CCR5 and Brazil'
>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
> # yes, it's already done the efetch under the hood...
> while ( my $seq = $seqio->next_seq ) { # step 4
>  # do something with $seq, a Bio::Seq object...
> }
>
> or this:
>
> my $links = $fac->elink( -db => 'protein',
>                          -dbfrom => 'nucleotide',
>                          -id => \@nucids )->run( -auto_adapt => 1 );
>
> # maybe more than one associated id...
> my @prot_0 = $links->id_map( $nucids[0] );
>
> while ( my $ls = $links->next_linkset ) {
>    @ids = $ls->ids;
>    @submitted_ids = $ls->submitted_ids;
>    # etc.
> }
>
> and much, much more. See
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>
> and of course, the POD, for all the details, including
> download/installation. Tests in bioperl-run/t.
>
> cheers,
> MAJ
>
> -- No new dependencies were added or animals mistreated
> -- during the making of these modules.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jan 12 20:09:28 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 20:09:28 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP
	webservice
In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife><D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
	<5AD210CB0C444A57881BBDD34DE99149@NewLife>
Message-ID: <A5829F72FD6F469D9CBCC94FC69C068F@NewLife>

corrected:

use Bio::DB::SoapEUtilities;
my $fac = Bio::DB::SoapEUtilities->new(); # step 1
my $seqio = $fac->esearch(
       -db => 'nucleotide',
       -term => 'HIV1 and CCR5 and Brazil'
    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
 # yes, it's already done the efetch under the hood...
 while ( my $seq = $seqio->next_seq ) { # step 4
  # do something with $seq, a Bio::Seq object...
 }

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Tuesday, January 12, 2010 7:08 PM
Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP 
webservice


> Um, yeah.
> ----- Original Message ----- 
> From: "Chris Fields" <cjfields at illinois.edu>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "BioPerl List" <bioperl-l at bioperl.org>
> Sent: Tuesday, January 12, 2010 6:36 PM
> Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web 
> service
>
>
> Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's 
> Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API 
> conflict with the current EUtilities tools.
>
> chris
>
> On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:
>
>> Hi All--
>>
>> The beta of Bio::DB::SoapEUtilities is now available in the
>> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
>> service. The system is fully WSDL based, and all eutils are
>> available. The best thing (IMHO) are the result adaptors, which
>> provide conversion and iteration of SOAP results into BioPerl
>> objects. Schau, mal:
>>
>> use Bio::DB::EUtilities;
>> my $fac = Bio::DB::EUtilities->new(); # step 1
>> my $seqio = $fac->esearch(
>>       -db => 'nucleotide',
>>       -term => 'HIV1 and CCR5 and Brazil'
>>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
>> # yes, it's already done the efetch under the hood...
>> while ( my $seq = $seqio->next_seq ) { # step 4
>>  # do something with $seq, a Bio::Seq object...
>> }
>>
>> or this:
>>
>> my $links = $fac->elink( -db => 'protein',
>>                          -dbfrom => 'nucleotide',
>>                          -id => \@nucids )->run( -auto_adapt => 1 );
>>
>> # maybe more than one associated id...
>> my @prot_0 = $links->id_map( $nucids[0] );
>>
>> while ( my $ls = $links->next_linkset ) {
>>    @ids = $ls->ids;
>>    @submitted_ids = $ls->submitted_ids;
>>    # etc.
>> }
>>
>> and much, much more. See
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>>
>> and of course, the POD, for all the details, including
>> download/installation. Tests in bioperl-run/t.
>>
>> cheers,
>> MAJ
>>
>> -- No new dependencies were added or animals mistreated
>> -- during the making of these modules.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From tuco at pasteur.fr  Wed Jan 13 05:24:34 2010
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 13 Jan 2010 11:24:34 +0100
Subject: [Bioperl-l] targetp request
In-Reply-To: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
Message-ID: <4B4D9F62.5010306@pasteur.fr>

On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote:
> Hi,
>
> I am trying to use targetP in bioperl.
> the documentation at the bioperl site is a bit confusing to me...
>
> I would appreciate if you could give a very small example, as to how to use
> "Bio::Tools::TargetP" to predict the localization of a protein sequence that
> i have stored as a string.
>
> Thanks,
> Vijay
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Dear Vivay,

Bio::Tools::TargetP is not intended to run targetp on a sequence but to 
read and parse results from targetp run.

 From the Pod doc :

DESCRIPTION
        TargetP modules will provides parsed informations about protein 
localization.  It
        reads in a targetp output file.  It parses the results, and 
returns a
        Bio::SeqFeature::Generic object for each sequences found to have 
a subcellular
        localization


So to analyze your sequence, you'll first need to run targetp on your 
sequence file to create a targetp result output file. Then use 
Bio::Tools::TargetP module to parse this result file and get only 
informations you want/need from the result to be display as shown in the 
SYNOPSIS of the Pod documentation of the module.

HTH

Regards

Emmanuel

From roy.chaudhuri at gmail.com  Wed Jan 13 07:52:58 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 13 Jan 2010 12:52:58 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
Message-ID: <4B4DC22A.8080701@gmail.com>

Upload them to Bugzilla as patches, and one of the devs will review your 
changes and incorporate them into bioperl-live:
http://www.bioperl.org/wiki/HOWTO:SubmitPatch

Roy.

On 11/01/2010 16:27, Adam Witney wrote:
>
> Ah excellent, thanks Roy. I was indeed thinking about it the wrong
> way.
>
> In the process of writing this i have created a
>
> Bio::Tools::Run::Phylo::Phylip::Pars class
>
> which is essentially just a modified copy of ProtPars. I have also
> fixed a few typos and possible bugs in
>
> Bio/Tools/Run/Phylo/Phylip/Base.pm
> Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm
> Bio/Tools/Run/Alignment/Clustalw.pm
>
> I am of course happy to send these back in to the project... how
> would i best do this?
>
> Cheers
>
> adam
>
>
> On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote:
>
>> Actually, I guess some sample code would be more helpful:
>>
>> use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my
>> $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1,
>> -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two',
>> -seq=>'A--CG', -start=>1, -end=>3); my
>> $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG',
>> -start=>1, -end=>5); my
>> $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
>> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);
>>
>> Cheers, Roy.
>>
>>
>> On 11/01/2010 13:40, Roy Chaudhuri wrote:
>>> Hi Adam,
>>>
>>> I'm guessing you actually want to create a Bio::SimpleAlign
>>> object (representing an alignment), rather than a Bio::AlignIO
>>> object (which is just for reading/writing alignment files).
>>> Bio::SimpleAlign has a documented new method that allows you to
>>> construct an alignment from Bio::LocatableSeq objects, which are
>>> similar to Bio::Seq objects but include gaps and start/end
>>> coordinates to describe their relationship to other sequences in
>>> the alignment.
>>>
>>> Roy.
>>>
>>> On 11/01/2010 12:21, Adam Witney wrote:
>>>> Hi,
>>>>
>>>> I am writing a script to automate the running of Phylip Pars.
>>>> In the process i have to create a Bio::AlignIO object from a
>>>> set of data that i have in a hash.
>>>>
>>>> I could write the hash data into a phylip file and then load
>>>> the Bio::AlignIO from that file, but i wondered if i could skip
>>>> the writing and then reading of a temporary file ?
>>>>
>>>> thanks for any help
>>>>
>>>> adam _______________________________________________ Bioperl-l
>>>> mailing list Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>


From marcelo011982 at gmail.com  Wed Jan 13 13:12:04 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Wed, 13 Jan 2010 16:12:04 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
Message-ID: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>

Hi..
I have an simple Blast result, such as blastn.
Is there an  scrip  to transform such result to Clustalw format in Bioperl
?(.aln)

Thanx for any help.

From Kevin.M.Brown at asu.edu  Wed Jan 13 13:01:42 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 13 Jan 2010 11:01:42 -0700
Subject: [Bioperl-l] targetp request
In-Reply-To: <4B4D9F62.5010306@pasteur.fr>
References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
	<4B4D9F62.5010306@pasteur.fr>
Message-ID: <1A4207F8295607498283FE9E93B775B4067C133E@EX02.asurite.ad.asu.edu>

Sounds like this module might be in the wrong place then. Sounds more
like a SeqIO or AlignIO module, heheh. Also looks like the docs might
need to be cleaned up a bit for english readability (at least that
initial sentence).

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Emmanuel Quevillon
> Sent: Wednesday, January 13, 2010 3:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] targetp request
> 
> On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote:
> > Hi,
> >
> > I am trying to use targetP in bioperl.
> > the documentation at the bioperl site is a bit confusing to me...
> >
> > I would appreciate if you could give a very small example, 
> as to how to use
> > "Bio::Tools::TargetP" to predict the localization of a 
> protein sequence that
> > i have stored as a string.
> >
> > Thanks,
> > Vijay
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Dear Vivay,
> 
> Bio::Tools::TargetP is not intended to run targetp on a 
> sequence but to 
> read and parse results from targetp run.
> 
>  From the Pod doc :
> 
> DESCRIPTION
>         TargetP modules will provides parsed informations 
> about protein 
> localization.  It
>         reads in a targetp output file.  It parses the results, and 
> returns a
>         Bio::SeqFeature::Generic object for each sequences 
> found to have 
> a subcellular
>         localization
> 
> 
> So to analyze your sequence, you'll first need to run targetp on your 
> sequence file to create a targetp result output file. Then use 
> Bio::Tools::TargetP module to parse this result file and get only 
> informations you want/need from the result to be display as 
> shown in the 
> SYNOPSIS of the Pod documentation of the module.
> 
> HTH
> 
> Regards
> 
> Emmanuel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Wed Jan 13 13:44:36 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 13 Jan 2010 13:44:36 -0500
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
Message-ID: <C85EC8A05E884B328AFDAA055341E9E2@NewLife>

Marcelo-
Yes-- look at the code snip at
http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
combined with the snip at 
http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
(using -format => 'clustalw')
cheers MAJ
----- Original Message ----- 
From: "Marcelo Iwata" <marcelo011982 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, January 13, 2010 1:12 PM
Subject: [Bioperl-l] Blast to Clustalw Format


> Hi..
> I have an simple Blast result, such as blastn.
> Is there an  scrip  to transform such result to Clustalw format in Bioperl
> ?(.aln)
> 
> Thanx for any help.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From dan.kortschak at adelaide.edu.au  Wed Jan 13 23:26:46 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 14:56:46 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
Message-ID: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>

Hi All,

I'm having a stupid problem that for some reason I just can't figure
out. I'm putting together a B:A:IO:bowtie module to wrap around the
B:A:IO:sam module so bowtie output can be used as an assembly start
point.

For some reason that is escaping me I can't create tempfiles!

What should be the relevant code in the module:

package Bio::Assembly::IO::bowtie;
use strict;
use warnings;

# Object preamble - inherits from Bio::Root::Root

use Bio::SeqIO;
use Bio::Tools::Run::Samtools;
use Bio::Assembly::IO;
use Carp;
use Bio::Root::Root;
use Bio::Root::IO;
use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );


and the line (there are a couple of others that are like to fail in the
same way, but I've not got that far)

my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
$self->tempdir(), -suffix => '.sam' );

Which dies with:
Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.

Relevant environment vars:
  DB<10> x @ISA 
0  'Bio::Root::Root'
1  'Bio::Root::IO'
2  'Bio::Assembly::IO'

DB<11> x $self
0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
   '_no_head' => undef
   '_no_sq' => undef
   '_root_verbose' => 0


Can someone suggest what I'm missing?

cheers
Dan


From maj at fortinbras.us  Thu Jan 14 00:11:01 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:11:01 -0500
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <84196F01FF584C64A79B89FECE2DD86F@NewLife>

Hey Dan-- what does your constructor look like? I wonder if something's getting 
lost in new() and _initialize() chaining spaghetti- MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, January 13, 2010 11:26 PM
Subject: [Bioperl-l] not able to use Bio::Root::IO method


> Hi All,
>
> I'm having a stupid problem that for some reason I just can't figure
> out. I'm putting together a B:A:IO:bowtie module to wrap around the
> B:A:IO:sam module so bowtie output can be used as an assembly start
> point.
>
> For some reason that is escaping me I can't create tempfiles!
>
> What should be the relevant code in the module:
>
> package Bio::Assembly::IO::bowtie;
> use strict;
> use warnings;
>
> # Object preamble - inherits from Bio::Root::Root
>
> use Bio::SeqIO;
> use Bio::Tools::Run::Samtools;
> use Bio::Assembly::IO;
> use Carp;
> use Bio::Root::Root;
> use Bio::Root::IO;
> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>
>
> and the line (there are a couple of others that are like to fail in the
> same way, but I've not got that far)
>
> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
> $self->tempdir(), -suffix => '.sam' );
>
> Which dies with:
> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>
> Relevant environment vars:
>  DB<10> x @ISA
> 0  'Bio::Root::Root'
> 1  'Bio::Root::IO'
> 2  'Bio::Assembly::IO'
>
> DB<11> x $self
> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>   '_no_head' => undef
>   '_no_sq' => undef
>   '_root_verbose' => 0
>
>
>
> Can someone suggest what I'm missing?
>
> cheers
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Thu Jan 14 00:35:35 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 16:05:35 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <84196F01FF584C64A79B89FECE2DD86F@NewLife>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
Message-ID: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>

Thanks Mark, I'm not sure about that since @ISA still includes
Bio::Root:IO when it's at the call, but it might be.

cheers
Dan

Here is the entirety of the code (it reasonably short):

package Bio::Assembly::IO::bowtie;
use strict;
use warnings;

# Object preamble - inherits from Bio::Root::Root

use Bio::SeqIO;
use Bio::Tools::Run::Samtools;
use Bio::Assembly::IO;
use Carp;
use Bio::Root::Root;
use Bio::Root::IO;
use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );

our $HD = "\@HD\tVN:1.0\tSO:unsorted\n";
our $PG = "\@PG\tID=Bowtie\n";

our $HAVE_IO_UNCOMPRESS;
BEGIN {
# check requirements
    unless ( eval "require Bio::Tools::Run::Bowtie;") {
	Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index.");
    }
    unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") {
	Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand.");
    }
}

sub new {
	my $class = shift;
	my @args = @_;
	my $self = $class->SUPER::new(@args);
	my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args);
	$file =~ s/^<//;
	$self->{'_no_head'} = $no_head;
	$self->{'_no_sq'} = $no_sq;
	# get the sequence so samtools can work with it
	my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' );
	my $refdb = $inspector->run($index);
	my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb));
	my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' );
	return $sam;
}

sub _bowtie_to_sam {
	my ($self, $file, $refdb) = @_;

	$self->throw("'$file' does not exist or is not readable.")
		unless ( -e $file && -r $file );
	my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file);
	$self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/;

	my %SQ;
	my $mapq = 255;
	my $in_pair;
	my @mate_line;
	my $mlen;

	if ($file =~ m/\.gz[^.]*$/) {
		unless ($HAVE_IO_UNCOMPRESS) {
			croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" );
		}
		my ($tfh, $tf) = $self->io->tempfile;
		my $z = IO::Uncompress::Gunzip->new($_);
		while (<$z>) { print $tfh $_ }
		close $tfh;
		$file = $tf;
	}

        open(my $fh, $file) or
		$self->throw("Can not open '$file' for reading: $!");
            
	# create temp file for working
	my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
	
	while ($fh) {
		chomp;
		my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_);
		$SQ{$rname} = 1;
		
		my $paired_f =  ($qname =~ m#/[12]#) ? 0x03 : 0;
		my $strand_f = ($strand eq '-') ? 0x10 : 0;
		my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0;
		my $first_f =  ($qname =~ m#/1#) ? 0x40 : 0;
		my $second_f = ($qname =~ m#/2#) ? 0x80 : 0;
		my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f;

		$pos++;
		my $len = length $seq;
		die unless $len == length $qual;
		my $cigar = $len.'M';
		my @detail = split(',',$details);
		my $dist = 'NM:i:'.scalar @detail;
		
		my @mismatch;
		my $last_pos = 0;
		for (@detail) {
			m/(\d+):(\w)>\w/;
			my $err = ($1-$last_pos);
			$last_pos = $1+1;
			push @mismatch,($err,$2);
		}
		push @mismatch, $len-$last_pos;
		@mismatch = reverse @mismatch if $strand eq '-';
		my $mismatch = join('',('MD:Z:', at mismatch));

		if ($paired_f) {
			my $mrnm = '=';
			if ($in_pair) {
				my $mpos = $mate_line[3];
				$mate_line[7] = $pos;
				my $isize = $mpos-$pos-$len;
				$mate_line[8] = -$isize;
				print $sam_tmp_h join("\t", at mate_line),"\n";
				print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
				$in_pair = 0;
			} else {
				$mlen = $len;
				@mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist);
				$in_pair = 1;
			}
		} else {
			my $mrnm = '*';
			my $mpos = 0;
			my $isize = 0;
			print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
		}
	}

	close($fh);
	$sam_tmp_h->close;
	
	return $sam_tmp_f if $self->{'_no_head'};

	my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );

	# print header
	print $samh $HD;
	
	# print sequence dictionary
	unless ($self->{'_no_sq'}) {
		my $db  = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' );
		while ( my $seq = $db->next_seq() ) {
			$SQ{$seq->id} = $seq->length if $SQ{$seq->id};
		}
	
		map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ;
	}
	
	# print program
	print $samh $PG;
	
	open($sam_tmp_h, $sam_tmp_f) or
		$self->throw("Can not open '$sam_tmp_f' for reading: $!");

	print $samh $_ while ($sam_tmp_h);
	
	close($sam_tmp_h);
	$samh->close;
	
	return $samf;
}

sub _make_bam {
	my ($self, $file) = @_;
	
	$self->throw("'$file' does not exist or is not readable")
		unless ( -e $file && -r $file );

	# make a sorted bam file from a sam file input
	my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' );
	my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' );
	$_->close for ($bamh, $srth);
	
	my $samt = Bio::Tools::Run::Samtools->new( -command => 'view',
						   -sam_input => 1,
						   -bam_output => 1 );

	$samt->run( -bam => $file, -out => $bamf );

	$samt = Bio::Tools::Run::Samtools->new( -command => 'sort' );

	$samt->run( -bam => $bamf, -pfx => $srtf);

	return $srtf.'.bam'
}

1;


On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote:
> Hey Dan-- what does your constructor look like? I wonder if
> something's getting 
> lost in new() and _initialize() chaining spaghetti- MAJ
> 


From dan.kortschak at adelaide.edu.au  Thu Jan 14 00:35:48 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 16:05:48 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au>
	<B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
Message-ID: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>

I've had a bit of a play with that, but no luck.

Dan

On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote:
> I've found that rearranging the items in the 'use base' array can
> sometimes 
> recover
> lost methods. I don't know enough of the arcana to know why it works. 
> (Sometimes,
> java starts looking pretty good from here...)
> 


From maj at fortinbras.us  Thu Jan 14 00:38:00 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:38:00 -0500
Subject: [Bioperl-l] Fw:  not able to use Bio::Root::IO method
Message-ID: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>

up to list
----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
Sent: Thursday, January 14, 2010 12:36 AM
Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method


> Aha-- check out the pod for Bio::Root::IO:
> 
> "This module provides methods that will usually be needed for any sort
> of file- or stream-related input/output, e.g., keeping track of a file
> handle, transient printing and reading from the file handle, a close
> method, automatically closing the handle on garbage collection, etc.
> 
> To use this for your own code you will either want to inherit from
> this module, or instantiate an object for every file or stream you are
> dealing with. In the first case this module will most likely not be
> the first class off which your class inherits; therefore you need to
> call _initialize_io() with the named parameters in order to set file
> handle, open file, etc automatically."
> 
> I think you're wanting a call to $self->_initialize_io(). (There is no io() 
> method explicitly defined in any of the base classes.)
> MAJ
> ----- Original Message ----- 
> From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, January 13, 2010 11:26 PM
> Subject: [Bioperl-l] not able to use Bio::Root::IO method
> 
> 
>> Hi All,
>> 
>> I'm having a stupid problem that for some reason I just can't figure
>> out. I'm putting together a B:A:IO:bowtie module to wrap around the
>> B:A:IO:sam module so bowtie output can be used as an assembly start
>> point.
>> 
>> For some reason that is escaping me I can't create tempfiles!
>> 
>> What should be the relevant code in the module:
>> 
>> package Bio::Assembly::IO::bowtie;
>> use strict;
>> use warnings;
>> 
>> # Object preamble - inherits from Bio::Root::Root
>> 
>> use Bio::SeqIO;
>> use Bio::Tools::Run::Samtools;
>> use Bio::Assembly::IO;
>> use Carp;
>> use Bio::Root::Root;
>> use Bio::Root::IO;
>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>> 
>> 
>> and the line (there are a couple of others that are like to fail in the
>> same way, but I've not got that far)
>> 
>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
>> $self->tempdir(), -suffix => '.sam' );
>> 
>> Which dies with:
>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>> 
>> Relevant environment vars:
>>  DB<10> x @ISA 
>> 0  'Bio::Root::Root'
>> 1  'Bio::Root::IO'
>> 2  'Bio::Assembly::IO'
>> 
>> DB<11> x $self
>> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>>   '_no_head' => undef
>>   '_no_sq' => undef
>>   '_root_verbose' => 0
>> 
>> 
>> 
>> Can someone suggest what I'm missing?
>> 
>> cheers
>> Dan
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>

From maj at fortinbras.us  Thu Jan 14 00:50:11 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:50:11 -0500
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au>
	<B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
	<1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <82BFF47099684EF496DB3875D39DCA14@NewLife>

For the benefit of the list, I categorically deny ever making the 
statement about java below....
MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 12:35 AM
Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method


> I've had a bit of a play with that, but no luck.
> 
> Dan
> 
> On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote:
>> I've found that rearranging the items in the 'use base' array can
>> sometimes 
>> recover
>> lost methods. I don't know enough of the arcana to know why it works. 
>> (Sometimes,
>> java starts looking pretty good from here...)
>> 
> 
>

From cjfields at illinois.edu  Thu Jan 14 02:23:41 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 01:23:41 -0600
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>

You can remove separate 'use' directives if they are declared with 'use base' (they will be imported then).  Also, Bio::Root::IO inherits Bio::Root::Root, and Bio::Assembly::IO should inherit from Bio::Root::IO, so the only base module you should need is Bio::Assembly::IO.  It's possible having all three is confusing the interpreter.

chris

On Jan 13, 2010, at 11:35 PM, Dan Kortschak wrote:

> Thanks Mark, I'm not sure about that since @ISA still includes
> Bio::Root:IO when it's at the call, but it might be.
> 
> cheers
> Dan
> 
> Here is the entirety of the code (it reasonably short):
> 
> package Bio::Assembly::IO::bowtie;
> use strict;
> use warnings;
> 
> # Object preamble - inherits from Bio::Root::Root
> 
> use Bio::SeqIO;
> use Bio::Tools::Run::Samtools;
> use Bio::Assembly::IO;
> use Carp;
> use Bio::Root::Root;
> use Bio::Root::IO;
> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
> 
> our $HD = "\@HD\tVN:1.0\tSO:unsorted\n";
> our $PG = "\@PG\tID=Bowtie\n";
> 
> our $HAVE_IO_UNCOMPRESS;
> BEGIN {
> # check requirements
>    unless ( eval "require Bio::Tools::Run::Bowtie;") {
> 	Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index.");
>    }
>    unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") {
> 	Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand.");
>    }
> }
> 
> sub new {
> 	my $class = shift;
> 	my @args = @_;
> 	my $self = $class->SUPER::new(@args);
> 	my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args);
> 	$file =~ s/^<//;
> 	$self->{'_no_head'} = $no_head;
> 	$self->{'_no_sq'} = $no_sq;
> 	# get the sequence so samtools can work with it
> 	my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' );
> 	my $refdb = $inspector->run($index);
> 	my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb));
> 	my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' );
> 	return $sam;
> }
> 
> sub _bowtie_to_sam {
> 	my ($self, $file, $refdb) = @_;
> 
> 	$self->throw("'$file' does not exist or is not readable.")
> 		unless ( -e $file && -r $file );
> 	my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file);
> 	$self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/;
> 
> 	my %SQ;
> 	my $mapq = 255;
> 	my $in_pair;
> 	my @mate_line;
> 	my $mlen;
> 
> 	if ($file =~ m/\.gz[^.]*$/) {
> 		unless ($HAVE_IO_UNCOMPRESS) {
> 			croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" );
> 		}
> 		my ($tfh, $tf) = $self->io->tempfile;
> 		my $z = IO::Uncompress::Gunzip->new($_);
> 		while (<$z>) { print $tfh $_ }
> 		close $tfh;
> 		$file = $tf;
> 	}
> 
>        open(my $fh, $file) or
> 		$self->throw("Can not open '$file' for reading: $!");
> 
> 	# create temp file for working
> 	my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
> 	
> 	while ($fh) {
> 		chomp;
> 		my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_);
> 		$SQ{$rname} = 1;
> 		
> 		my $paired_f =  ($qname =~ m#/[12]#) ? 0x03 : 0;
> 		my $strand_f = ($strand eq '-') ? 0x10 : 0;
> 		my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0;
> 		my $first_f =  ($qname =~ m#/1#) ? 0x40 : 0;
> 		my $second_f = ($qname =~ m#/2#) ? 0x80 : 0;
> 		my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f;
> 
> 		$pos++;
> 		my $len = length $seq;
> 		die unless $len == length $qual;
> 		my $cigar = $len.'M';
> 		my @detail = split(',',$details);
> 		my $dist = 'NM:i:'.scalar @detail;
> 		
> 		my @mismatch;
> 		my $last_pos = 0;
> 		for (@detail) {
> 			m/(\d+):(\w)>\w/;
> 			my $err = ($1-$last_pos);
> 			$last_pos = $1+1;
> 			push @mismatch,($err,$2);
> 		}
> 		push @mismatch, $len-$last_pos;
> 		@mismatch = reverse @mismatch if $strand eq '-';
> 		my $mismatch = join('',('MD:Z:', at mismatch));
> 
> 		if ($paired_f) {
> 			my $mrnm = '=';
> 			if ($in_pair) {
> 				my $mpos = $mate_line[3];
> 				$mate_line[7] = $pos;
> 				my $isize = $mpos-$pos-$len;
> 				$mate_line[8] = -$isize;
> 				print $sam_tmp_h join("\t", at mate_line),"\n";
> 				print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
> 				$in_pair = 0;
> 			} else {
> 				$mlen = $len;
> 				@mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist);
> 				$in_pair = 1;
> 			}
> 		} else {
> 			my $mrnm = '*';
> 			my $mpos = 0;
> 			my $isize = 0;
> 			print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
> 		}
> 	}
> 
> 	close($fh);
> 	$sam_tmp_h->close;
> 	
> 	return $sam_tmp_f if $self->{'_no_head'};
> 
> 	my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
> 
> 	# print header
> 	print $samh $HD;
> 	
> 	# print sequence dictionary
> 	unless ($self->{'_no_sq'}) {
> 		my $db  = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' );
> 		while ( my $seq = $db->next_seq() ) {
> 			$SQ{$seq->id} = $seq->length if $SQ{$seq->id};
> 		}
> 	
> 		map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ;
> 	}
> 	
> 	# print program
> 	print $samh $PG;
> 	
> 	open($sam_tmp_h, $sam_tmp_f) or
> 		$self->throw("Can not open '$sam_tmp_f' for reading: $!");
> 
> 	print $samh $_ while ($sam_tmp_h);
> 	
> 	close($sam_tmp_h);
> 	$samh->close;
> 	
> 	return $samf;
> }
> 
> sub _make_bam {
> 	my ($self, $file) = @_;
> 	
> 	$self->throw("'$file' does not exist or is not readable")
> 		unless ( -e $file && -r $file );
> 
> 	# make a sorted bam file from a sam file input
> 	my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' );
> 	my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' );
> 	$_->close for ($bamh, $srth);
> 	
> 	my $samt = Bio::Tools::Run::Samtools->new( -command => 'view',
> 						   -sam_input => 1,
> 						   -bam_output => 1 );
> 
> 	$samt->run( -bam => $file, -out => $bamf );
> 
> 	$samt = Bio::Tools::Run::Samtools->new( -command => 'sort' );
> 
> 	$samt->run( -bam => $bamf, -pfx => $srtf);
> 
> 	return $srtf.'.bam'
> }
> 
> 1;
> 
> 
> On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote:
>> Hey Dan-- what does your constructor look like? I wonder if
>> something's getting 
>> lost in new() and _initialize() chaining spaghetti- MAJ
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jan 14 02:25:05 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 01:25:05 -0600
Subject: [Bioperl-l] Fw:  not able to use Bio::Root::IO method
In-Reply-To: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>
References: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>
Message-ID: <1DB926E1-9C6F-4B96-8D7E-28317DD7DE42@illinois.edu>

Yes, that's true.  The call to an io() is a Bio::Tools::Run::WrapperBase thing (the io() is a Bio::Root::IO instance).

chris

On Jan 13, 2010, at 11:38 PM, Mark A. Jensen wrote:

> up to list
> ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
> Sent: Thursday, January 14, 2010 12:36 AM
> Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method
> 
> 
>> Aha-- check out the pod for Bio::Root::IO:
>> "This module provides methods that will usually be needed for any sort
>> of file- or stream-related input/output, e.g., keeping track of a file
>> handle, transient printing and reading from the file handle, a close
>> method, automatically closing the handle on garbage collection, etc.
>> To use this for your own code you will either want to inherit from
>> this module, or instantiate an object for every file or stream you are
>> dealing with. In the first case this module will most likely not be
>> the first class off which your class inherits; therefore you need to
>> call _initialize_io() with the named parameters in order to set file
>> handle, open file, etc automatically."
>> I think you're wanting a call to $self->_initialize_io(). (There is no io() method explicitly defined in any of the base classes.)
>> MAJ
>> ----- Original Message ----- From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, January 13, 2010 11:26 PM
>> Subject: [Bioperl-l] not able to use Bio::Root::IO method
>>> Hi All,
>>> I'm having a stupid problem that for some reason I just can't figure
>>> out. I'm putting together a B:A:IO:bowtie module to wrap around the
>>> B:A:IO:sam module so bowtie output can be used as an assembly start
>>> point.
>>> For some reason that is escaping me I can't create tempfiles!
>>> What should be the relevant code in the module:
>>> package Bio::Assembly::IO::bowtie;
>>> use strict;
>>> use warnings;
>>> # Object preamble - inherits from Bio::Root::Root
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::Samtools;
>>> use Bio::Assembly::IO;
>>> use Carp;
>>> use Bio::Root::Root;
>>> use Bio::Root::IO;
>>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>>> and the line (there are a couple of others that are like to fail in the
>>> same way, but I've not got that far)
>>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
>>> $self->tempdir(), -suffix => '.sam' );
>>> Which dies with:
>>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
>>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>>> Relevant environment vars:
>>> DB<10> x @ISA 0  'Bio::Root::Root'
>>> 1  'Bio::Root::IO'
>>> 2  'Bio::Assembly::IO'
>>> DB<11> x $self
>>> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>>>  '_no_head' => undef
>>>  '_no_sq' => undef
>>>  '_root_verbose' => 0
>>> Can someone suggest what I'm missing?
>>> cheers
>>> Dan
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Thu Jan 14 02:59:20 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 18:29:20 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
	<B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>
Message-ID: <1263455960.4630.3.camel@epistle>

Thanks Chris,

I've done that, and since the inheritance is direct (rather than being a
constructed attribute in the object hash) the calls are $obj->temp*
rather than the $obj->io->temp* that I was using.

It works now and is much clearer having gotten rid of much of the
declarations.

cheers
Dan

On Thu, 2010-01-14 at 01:23 -0600, Chris Fields wrote:
> You can remove separate 'use' directives if they are declared with
> 'use base' (they will be imported then).  Also, Bio::Root::IO inherits
> Bio::Root::Root, and Bio::Assembly::IO should inherit from
> Bio::Root::IO, so the only base module you should need is
> Bio::Assembly::IO.  It's possible having all three is confusing the
> interpreter.
> 
> chris


From marcelo011982 at gmail.com  Thu Jan 14 08:44:25 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Thu, 14 Jan 2010 11:44:25 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <C85EC8A05E884B328AFDAA055341E9E2@NewLife>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
	<C85EC8A05E884B328AFDAA055341E9E2@NewLife>
Message-ID: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>

Thanks Mark.
I think that most of you already know it.
But , i'll put it for new users:


#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
use Bio::AlignIO;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => '
../../fontes/exemplos/blat/teste2/output.blast ');
my $aln;
my $alnIO;
$alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
while ( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while ( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while ( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
      $aln = $hsp->get_aln;
      $alnIO->write_aln($aln);

    }
  }
}


On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> Marcelo-
> Yes-- look at the code snip at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
> combined with the snip at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> (using -format => 'clustalw')
> cheers MAJ
> ----- Original Message ----- From: "Marcelo Iwata" <
> marcelo011982 at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, January 13, 2010 1:12 PM
> Subject: [Bioperl-l] Blast to Clustalw Format
>
>
>  Hi..
>> I have an simple Blast result, such as blastn.
>> Is there an  scrip  to transform such result to Clustalw format in Bioperl
>> ?(.aln)
>>
>> Thanx for any help.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>

From marcelo011982 at gmail.com  Thu Jan 14 08:46:21 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Thu, 14 Jan 2010 11:46:21 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
	<C85EC8A05E884B328AFDAA055341E9E2@NewLife>
	<1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
Message-ID: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>

Sorry , the correct code is:


#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
use Bio::AlignIO;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => '
../../fontes/exemplos/blat/teste2/output.blast ');
my $aln;
my $alnIO;
$alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
while ( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while ( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while ( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
      $aln = $hsp->get_aln;
      $alnIO->write_aln($aln);

    }
  }
}


On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata <marcelo011982 at gmail.com>wrote:

> Thanks Mark.
> I think that most of you already know it.
> But , i'll put it for new users:
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
> use Bio::AlignIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
>                            -file   => '
> ../../fontes/exemplos/blat/teste2/output.blast ');
> my $aln;
> my $alnIO;
> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
> while ( my $result = $in->next_result ) {
>   ## $result is a Bio::Search::Result::ResultI compliant object
>   while ( my $hit = $result->next_hit ) {
>     ## $hit is a Bio::Search::Hit::HitI compliant object
>     while ( my $hsp = $hit->next_hsp ) {
>       ## $hsp is a Bio::Search::HSP::HSPI compliant object
>       $aln = $hsp->get_aln;
>       $alnIO->write_aln($aln);
>
>
>     }
>   }
> }
>
>
> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>> Marcelo-
>> Yes-- look at the code snip at
>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>> combined with the snip at
>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>> (using -format => 'clustalw')
>> cheers MAJ
>> ----- Original Message ----- From: "Marcelo Iwata" <
>> marcelo011982 at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, January 13, 2010 1:12 PM
>> Subject: [Bioperl-l] Blast to Clustalw Format
>>
>>
>>  Hi..
>>> I have an simple Blast result, such as blastn.
>>> Is there an  scrip  to transform such result to Clustalw format in
>>> Bioperl
>>> ?(.aln)
>>>
>>> Thanx for any help.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>

From maj at fortinbras.us  Thu Jan 14 08:54:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 08:54:31 -0500
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com><C85EC8A05E884B328AFDAA055341E9E2@NewLife><1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
	<1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>
Message-ID: <1B8891488AA746F49BCAAB531FBE4D0B@NewLife>

Thanks Marcelo-- code snips always appreciated! MAJ
----- Original Message ----- 
From: "Marcelo Iwata" <marcelo011982 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 8:46 AM
Subject: Re: [Bioperl-l] Blast to Clustalw Format


> Sorry , the correct code is:
>
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
> use Bio::AlignIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
>                           -file   => '
> ../../fontes/exemplos/blat/teste2/output.blast ');
> my $aln;
> my $alnIO;
> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
> while ( my $result = $in->next_result ) {
>  ## $result is a Bio::Search::Result::ResultI compliant object
>  while ( my $hit = $result->next_hit ) {
>    ## $hit is a Bio::Search::Hit::HitI compliant object
>    while ( my $hsp = $hit->next_hsp ) {
>      ## $hsp is a Bio::Search::HSP::HSPI compliant object
>      $aln = $hsp->get_aln;
>      $alnIO->write_aln($aln);
>
>    }
>  }
> }
>
>
> On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata 
> <marcelo011982 at gmail.com>wrote:
>
>> Thanks Mark.
>> I think that most of you already know it.
>> But , i'll put it for new users:
>>
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>> use Bio::SearchIO;
>> use Bio::AlignIO;
>>
>> my $in = new Bio::SearchIO(-format => 'blast',
>>                            -file   => '
>> ../../fontes/exemplos/blat/teste2/output.blast ');
>> my $aln;
>> my $alnIO;
>> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
>> while ( my $result = $in->next_result ) {
>>   ## $result is a Bio::Search::Result::ResultI compliant object
>>   while ( my $hit = $result->next_hit ) {
>>     ## $hit is a Bio::Search::Hit::HitI compliant object
>>     while ( my $hsp = $hit->next_hsp ) {
>>       ## $hsp is a Bio::Search::HSP::HSPI compliant object
>>       $aln = $hsp->get_aln;
>>       $alnIO->write_aln($aln);
>>
>>
>>     }
>>   }
>> }
>>
>>
>> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>> Marcelo-
>>> Yes-- look at the code snip at
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>>> combined with the snip at
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>> (using -format => 'clustalw')
>>> cheers MAJ
>>> ----- Original Message ----- From: "Marcelo Iwata" <
>>> marcelo011982 at gmail.com>
>>> To: <bioperl-l at lists.open-bio.org>
>>> Sent: Wednesday, January 13, 2010 1:12 PM
>>> Subject: [Bioperl-l] Blast to Clustalw Format
>>>
>>>
>>>  Hi..
>>>> I have an simple Blast result, such as blastn.
>>>> Is there an  scrip  to transform such result to Clustalw format in
>>>> Bioperl
>>>> ?(.aln)
>>>>
>>>> Thanx for any help.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From sidd.basu at gmail.com  Thu Jan 14 14:15:04 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 13:15:04 -0600
Subject: [Bioperl-l] reading blast report
Message-ID: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>

Hi, 
I have a script that reads a tblastn report(13000 records) and loads in
a chado database(Bio::Chado::Schema module),  however the machine runs of memory. I am trying to figure 
out other than loading the database stuff 
if it the reading of SearchIO module could consume a lot of memory. So,
when i am reading a blast file and getting the result object ....

while (my $result = $searchio->next_result)

* Does the searchio object loads a huge chunk of file in the memory or
  for each iteration it only reads a part of the result.

* Does doing an index on blast report and then reading from it be much
  faster and why. And is there any way i could iterate through each
  record in the index,  will that be helpful.

-siddhartha


From jason at bioperl.org  Thu Jan 14 14:53:29 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 11:53:29 -0800
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
Message-ID: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>

What aspects of the report are you loading?  You might consider the  
blast report as tab-delimited (-m 8 format) if you only are interested  
in start/end positions and scores of ailgnments which is a simpler and  
reduced dataset that has lower memory footprint by the parser.

Searchio (default) -format => blast - you can try the BLAST -format =>  
blast_pull instead which lazy parses to create objects and will reduce  
memory consumption.

-jason
On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:

> Hi,
> I have a script that reads a tblastn report(13000 records) and loads  
> in
> a chado database(Bio::Chado::Schema module),  however the machine  
> runs of memory. I am trying to figure
> out other than loading the database stuff
> if it the reading of SearchIO module could consume a lot of memory.  
> So,
> when i am reading a blast file and getting the result object ....
>
> while (my $result = $searchio->next_result)
>
> * Does the searchio object loads a huge chunk of file in the memory or
>  for each iteration it only reads a part of the result.
>
> * Does doing an index on blast report and then reading from it be much
>  faster and why. And is there any way i could iterate through each
>  record in the index,  will that be helpful.
>
> -siddhartha
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From sidd.basu at gmail.com  Thu Jan 14 15:15:45 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 14:15:45 -0600
Subject: [Bioperl-l]  Re: reading blast report
In-Reply-To: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
Message-ID: <4b4f7b74.5744f10a.7087.4813@mx.google.com>

On Thu, 14 Jan 2010, Jason Stajich wrote:

> What aspects of the report are you loading?  You might consider the blast 
> report as tab-delimited (-m 8 format) if you only are interested in 
> start/end positions and scores of ailgnments which is a simpler and reduced 
> dataset that has lower memory footprint by the parser.

I think this would be a better approach i am mostly interested in
start/end/score data only.

>
> Searchio (default) -format => blast - you can try the BLAST -format => 
> blast_pull instead which lazy parses to create objects and will reduce 
> memory consumption.

It's another good option though. But just out of curosity,  so the
regular blast parser do load the entire file in the memory consider the
output consist of multiple Results concatenated together into a
single file. Could anybody clarify.

thanks, 
-siddhartha


>
> -jason
> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
>
> > Hi,
> > I have a script that reads a tblastn report(13000 records) and loads in
> > a chado database(Bio::Chado::Schema module),  however the machine runs of 
> > memory. I am trying to figure
> > out other than loading the database stuff
> > if it the reading of SearchIO module could consume a lot of memory. So,
> > when i am reading a blast file and getting the result object ....
> >
> > while (my $result = $searchio->next_result)
> >
> > * Does the searchio object loads a huge chunk of file in the memory or
> >  for each iteration it only reads a part of the result.
> >
> > * Does doing an index on blast report and then reading from it be much
> >  faster and why. And is there any way i could iterate through each
> >  record in the index,  will that be helpful.
> >
> > -siddhartha
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>

From jason at bioperl.org  Thu Jan 14 16:28:29 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 13:28:29 -0800
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
Message-ID: <CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>


On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
>
>> What aspects of the report are you loading?  You might consider the  
>> blast
>> report as tab-delimited (-m 8 format) if you only are interested in
>> start/end positions and scores of ailgnments which is a simpler and  
>> reduced
>> dataset that has lower memory footprint by the parser.
>
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
>
>>
>> Searchio (default) -format => blast - you can try the BLAST -format  
>> =>
>> blast_pull instead which lazy parses to create objects and will  
>> reduce
>> memory consumption.
>
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider  
> the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.
>
> thanks,
> -siddhartha

Each result is parsed (1 result per query) and all the hits and HSPs  
are parsed and brought into memory with the standard (non-pull)  
approach.
The SearchIO iterates at the level of result - that is why you call  
next_result which parses each one at a time.

>
>
>>
>> -jason
>> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
>>
>>> Hi,
>>> I have a script that reads a tblastn report(13000 records) and  
>>> loads in
>>> a chado database(Bio::Chado::Schema module),  however the machine  
>>> runs of
>>> memory. I am trying to figure
>>> out other than loading the database stuff
>>> if it the reading of SearchIO module could consume a lot of  
>>> memory. So,
>>> when i am reading a blast file and getting the result object ....
>>>
>>> while (my $result = $searchio->next_result)
>>>
>>> * Does the searchio object loads a huge chunk of file in the  
>>> memory or
>>> for each iteration it only reads a part of the result.
>>>
>>> * Does doing an index on blast report and then reading from it be  
>>> much
>>> faster and why. And is there any way i could iterate through each
>>> record in the index,  will that be helpful.
>>>
>>> -siddhartha
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From sidd.basu at gmail.com  Thu Jan 14 16:40:42 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 15:40:42 -0600
Subject: [Bioperl-l]  Re: reading blast report
In-Reply-To: <CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
	<CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>
Message-ID: <4b4f8f5d.5644f10a.2be2.47dc@mx.google.com>

Thanks jason for clarification.

On Thu, 14 Jan 2010, Jason Stajich wrote:

>
> On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:
>
> > On Thu, 14 Jan 2010, Jason Stajich wrote:
> >
> >> What aspects of the report are you loading?  You might consider the blast
> >> report as tab-delimited (-m 8 format) if you only are interested in
> >> start/end positions and scores of ailgnments which is a simpler and 
> >> reduced
> >> dataset that has lower memory footprint by the parser.
> >
> > I think this would be a better approach i am mostly interested in
> > start/end/score data only.
> >
> >>
> >> Searchio (default) -format => blast - you can try the BLAST -format =>
> >> blast_pull instead which lazy parses to create objects and will reduce
> >> memory consumption.
> >
> > It's another good option though. But just out of curosity,  so the
> > regular blast parser do load the entire file in the memory consider the
> > output consist of multiple Results concatenated together into a
> > single file. Could anybody clarify.
> >
> > thanks,
> > -siddhartha
>
> Each result is parsed (1 result per query) and all the hits and HSPs are 
> parsed and brought into memory with the standard (non-pull) approach.
> The SearchIO iterates at the level of result - that is why you call 
> next_result which parses each one at a time.
>
> >
> >
> >>
> >> -jason
> >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
> >>
> >>> Hi,
> >>> I have a script that reads a tblastn report(13000 records) and loads in
> >>> a chado database(Bio::Chado::Schema module),  however the machine runs 
> >>> of
> >>> memory. I am trying to figure
> >>> out other than loading the database stuff
> >>> if it the reading of SearchIO module could consume a lot of memory. So,
> >>> when i am reading a blast file and getting the result object ....
> >>>
> >>> while (my $result = $searchio->next_result)
> >>>
> >>> * Does the searchio object loads a huge chunk of file in the memory or
> >>> for each iteration it only reads a part of the result.
> >>>
> >>> * Does doing an index on blast report and then reading from it be much
> >>> faster and why. And is there any way i could iterate through each
> >>> record in the index,  will that be helpful.
> >>>
> >>> -siddhartha
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at gmail.com
> >> jason at bioperl.org
> >> http://fungalgenomes.org/
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>

From SMarkel at accelrys.com  Thu Jan 14 17:58:06 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Thu, 14 Jan 2010 14:58:06 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>

We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
from our customers.  Due to network irregularities (not sure what else
to call it) users see the getting of remote BLAST results as somewhat
random.  When results come back the hits are fine, but sometimes no
information comes back at all.  Retrying helps.

In looking at RemoteBlast.pm there are four "return -1" cases.

* $status eq 'ERROR'      (return on line 614)
* $line =~ /ERROR/I       (return on line 628)
* !$got_content           (return on line 648)
* !$response->is_success  (return on line 655)

In the case of no content we'd like to retry remote BLAST.  We're happy
to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
module, but we only want to retry in that case, not the other three.

What would happen if that third "return -1" changed to a different
return value?

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


From nickjd at gmail.com  Wed Jan 13 08:18:12 2010
From: nickjd at gmail.com (NickJD)
Date: Wed, 13 Jan 2010 05:18:12 -0800 (PST)
Subject: [Bioperl-l] Parsing PSI-BLAST results with SearchIO
Message-ID: <65554589-081b-4297-ab68-9ddfbd3d9944@c34g2000yqn.googlegroups.com>

I am trying to parse PSI-BLAST results using SearchIO and some very
basic code just to read the number of hits, number of hsps, etc.  I
have done 10 rounds on 1 input sequence and parsed it but it seems to
treat each round as a separate result, so round/iteration is always 1
and new_hits its always the total list not the ones that are new to
that round.  Does anyone have any experience of this?

Thanks,

Nick

From dsidote at waksman.rutgers.edu  Wed Jan 13 10:08:48 2010
From: dsidote at waksman.rutgers.edu (David J Sidote)
Date: Wed, 13 Jan 2010 10:08:48 -0500
Subject: [Bioperl-l] Bioinformatician position - Waksman Institute
Message-ID: <4b42af671001130708i703ecce0u47348484321714f@mail.gmail.com>

Bioinformatician ? Research Assistant Professor


The Waksman Institute of Microbiology located on the New Brunswick campus of
Rutgers University is seeking a highly motivated and talented bioinformatics
scientist for an Research Assistant Professor appointment.  The successful
candidate will analyze genome, transcriptome, and epigenome data generated
on the Life Sciences 454, Illumina, and AB SOLiD high-throughput sequencing
platforms. Excellent communication and teamwork skills are essential as the
successful candidate will work closely with individual research groups to
develop software to facilitate the visualization, quantification, and
interpretation of the data. The successful candidate will be expected to
contribute to the publication of scientific literature and to present at
seminars and conferences.


Qualifications:


-       PhD in molecular biology, genetics, bioinformatics, systems biology
or other related fields; candidates with a PhD in physics, mathematics, or
computer science with some working knowledge of biology and experience are
encouraged to apply.

-       Demonstrated scientific track record

-       Highly proficient in perl, python, or ruby programming, linux/unix
scripting, and SQL.

-       Experience with R is desirable but not required

-       Experience with high-throughput sequencing, microarrays, or other
high-throughput biological platforms

-       Excellent communication and organizational skills


How to Apply:


Please send a cover letter stating your current research interests, why you
are interested in this position, and how your skill set complements this
position along with a curriculum vitae, and the names and contact
information of three references to hr at waksman.rutgers.edu. Please include
"Bioinformatics Assistant Research Professor" in the subject line. Rutgers
is an equal opportunity employer.


For more information about this position please contact:

Dr. David Sidote (dsidote at waksman.rutgers.edu)


From albezg at gmail.com  Wed Jan 13 20:57:27 2010
From: albezg at gmail.com (albezg)
Date: Wed, 13 Jan 2010 20:57:27 -0500
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with
 negative PDB ranges
In-Reply-To: <49C405F0.5050100@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com>
Message-ID: <4B4E7A07.7070805@gmail.com>

Hi all,

I have a problem using AlignIO to read Pfam database:
ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
The database is in STOCKHOLM 1.0 format. AlignIO can read the alignment 
OK until the alignment PF00331.13. There it crashes with the following 
message:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: '1-344' is not an integer.

STACK: Error::throw
STACK: Bio::Root::Root::throw 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
STACK: Bio::Range::end 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
STACK: Bio::Annotation::Target::new 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
STACK: Bio::AlignIO::stockholm::next_aln 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
STACK: /home/albezg/scripts/pfam2fasta.pl:22
-----------------------------------------------------------

It appears this is caused by this entry:
#=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;

I don't care about residues in PDB, so I have just removed minus signs 
from the ranges. This seems to have fixed the crashing.

Is it a known problem? Is there a solution for it?

Thanks,
Alexandr


On 03/20/2009 05:09 PM, albezg wrote:
>
> I'm trying to change FASTA header(display_id) for a sequence in an
> alignment(SimpleAlign).
>
> There are no issues when I print it, however when I use AlignIO to write
> the alignment to a FASTA file, it does not work. Is this behavior intended?
>
> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>
> The error:
> ------------- EXCEPTION -------------
> MSG: No sequence with name [1/1-11]
> STACK Bio::SimpleAlign::displayname
> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
> STACK Bio::AlignIO::fasta::write_aln
> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
> STACK toplevel ./demo.pl:14
> -------------------------------------
>
> Alexandr


From mitch_skinner at berkeley.edu  Thu Jan 14 17:10:53 2010
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Thu, 14 Jan 2010 14:10:53 -0800
Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory
Message-ID: <4B4F966D.3030300@berkeley.edu>

Hi,

Some people haven't been getting all of the features in their GFF3 into 
JBrowse, and a nice test case that James Casbon posted to the list 
helped me track it down.

Here's an example of the behavior I was seeing with BioPerl 1.6.1 (using 
Devel::REPL):

==============
$ use Bio::DB::SeqFeature::Store

$ my $db = Bio::DB::SeqFeature::Store->new(-adaptor=>"memory", 
-dsn=>"casbon.gff3")
$Bio_DB_SeqFeature_Store_memory1 = 
Bio::DB::SeqFeature::Store::memory=HASH(0xa27ceec);

$ $db->features(-seq_id=>"CYP2C8")
$ARRAY1 = [
             Feature:src(41),
             region(CYP2C8),
             Feature:src(37),
             Feature:src(39),
             Feature:src(42),
             Feature:src(40),
             Feature:src(38)
           ];
==============

I expected to also see the features with IDs 43 and 44 (the gff3 file is 
attached).

I think there's a problem in the filter_by_location method.  If start 
and end parameters aren't passed to the method, it sets default start 
and end values that lead it to examine all of the bins in its index.  
But the end value that it creates is at the beginning of the last bin, 
and I think it should be at the end of the last bin instead.  The 
attached patch changes it to be at the end of the last bin.

Regards,
Mitch
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: casbon.gff3
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100114/8494aaa7/attachment.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bdsfsm-filter_by_location.patch
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100114/8494aaa7/attachment-0001.pl>

From jason at bioperl.org  Thu Jan 14 19:20:43 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 16:20:43 -0800
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <4B4E7A07.7070805@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
Message-ID: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>

Seems like improper data really -- "-1" is an improper coordinate as  
far as the parser is concerned. You may want to tell Pfam that there  
is possible error in the dumper since that was the only record that  
had this problem?

-jason
On Jan 13, 2010, at 5:57 PM, albezg wrote:

> Hi all,
>
> I have a problem using AlignIO to read Pfam database:
> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
> The database is in STOCKHOLM 1.0 format. AlignIO can read the  
> alignment OK until the alignment PF00331.13. There it crashes with  
> the following message:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: '1-344' is not an integer.
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/ 
> 5.10.0/Bio/Root/Root.pm:368
> STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/ 
> Range.pm:228
> STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/ 
> 5.10.0/Bio/Annotation/Target.pm:82
> STACK:  
> Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/ 
> albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
> GenericAlignHandler.pm:293
> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler / 
> home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
> GenericAlignHandler.pm:73
> STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/ 
> site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
> STACK: /home/albezg/scripts/pfam2fasta.pl:22
> -----------------------------------------------------------
>
> It appears this is caused by this entry:
> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>
> I don't care about residues in PDB, so I have just removed minus  
> signs from the ranges. This seems to have fixed the crashing.
>
> Is it a known problem? Is there a solution for it?
>
> Thanks,
> Alexandr
>
>
> On 03/20/2009 05:09 PM, albezg wrote:
>>
>> I'm trying to change FASTA header(display_id) for a sequence in an
>> alignment(SimpleAlign).
>>
>> There are no issues when I print it, however when I use AlignIO to  
>> write
>> the alignment to a FASTA file, it does not work. Is this behavior  
>> intended?
>>
>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>
>> The error:
>> ------------- EXCEPTION -------------
>> MSG: No sequence with name [1/1-11]
>> STACK Bio::SimpleAlign::displayname
>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>> STACK Bio::AlignIO::fasta::write_aln
>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>> STACK toplevel ./demo.pl:14
>> -------------------------------------
>>
>> Alexandr
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From maj at fortinbras.us  Thu Jan 14 21:00:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 21:00:31 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
Message-ID: <CD613D33411040F8921DE3098FD6DF41@NewLife>

How about returning 1, 2, 4 for the non-zero cases, with some
error constants set for convenience? MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 5:58 PM
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From cjfields at illinois.edu  Thu Jan 14 19:42:31 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 18:42:31 -0600
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
Message-ID: <0B76CCA7-C37C-4E24-BBDF-C8FD805DBBF2@illinois.edu>


On Jan 14, 2010, at 2:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
> 
>> What aspects of the report are you loading?  You might consider the blast 
>> report as tab-delimited (-m 8 format) if you only are interested in 
>> start/end positions and scores of ailgnments which is a simpler and reduced 
>> dataset that has lower memory footprint by the parser.
> 
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
> 
>> Searchio (default) -format => blast - you can try the BLAST -format => 
>> blast_pull instead which lazy parses to create objects and will reduce 
>> memory consumption.
> 
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.

Yes, the original SearchIO parsers all load the data into objects.  This was based on the presumption that one wouldn't want very large BLAST reports, but this assumption probably isn't amenable today.  The pull parser is one aswer to that, in it pulls the data only upon request (creates them on the fly), so it should be more amenable to parsing very large BLAST reports.

> thanks, 
> -siddhartha
> 
>> -jason

chris

From cjfields at illinois.edu  Fri Jan 15 01:33:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Jan 2010 00:33:50 -0600
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
Message-ID: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>

Scott,

I think this is fine (to change the third condition and retry with a specific code).  The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance).

One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3.  Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.

chris

PS - BTW, nice to finally meet you at GMOD!

On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:

> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields1 at gmail.com  Fri Jan 15 01:35:35 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Fri, 15 Jan 2010 00:35:35 -0600
Subject: [Bioperl-l] filter_by_location in
	Bio::DB::SeqFeature::Store::memory
In-Reply-To: <4B4F966D.3030300@berkeley.edu>
References: <4B4F966D.3030300@berkeley.edu>
Message-ID: <992796AC-B85B-4555-88A1-36000C0A2002@gmail.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100115/b772ee67/attachment.html>

From David.Messina at sbc.su.se  Fri Jan 15 10:17:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 15 Jan 2010 16:17:14 +0100
Subject: [Bioperl-l] getting/setting species names with Bio::Species
Message-ID: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>

Hi everybody,

I'm having a little trouble with names in Bio::Species objects.

According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method:

my $my_species_obj = Bio::Species->new();
$my_species_obj->species('Homo sapiens');

print $my_species_obj->species;     # 'Homo sapiens'


That works fine if I create the Bio::Species object myself.

But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back:

my $io = Bio::SeqIO->new('-format' => 'genbank',
                         '-file'   => 'hoxa2.gb');
my $seq_obj = $io->next_seq;
my $io_species_obj = $seq_obj->species;

print $io_species_obj->species;     # 'sapiens'


I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately.

Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object:

print $my_species_obj->binomial;    # 'Homosapiens'
print $io_species_obj->binomial;    # 'Homo sapiens'


I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way?

If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite.


Thanks,
Dave


From maj at fortinbras.us  Fri Jan 15 10:31:16 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 10:31:16 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
Message-ID: <C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>

I'm not that familiar with Bio::Species either, but this looks
like conflicting semantics betwen Bio::Species and Bio::SeqIO.
Bio::SeqIO sets the species accessor to the 'species' element of
the lineage array, I believe.
FWIW, I'd prefer "binomial" = "genus" . "species"
MAJ
----- Original Message ----- 
From: "Dave Messina" <David.Messina at sbc.su.se>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:17 AM
Subject: [Bioperl-l] getting/setting species names with Bio::Species


> Hi everybody,
>
> I'm having a little trouble with names in Bio::Species objects.
>
> According to the Bio::Species documentation, if I have a species name as a 
> string, like "Homo sapiens", I can get and set that using the species method:
>
> my $my_species_obj = Bio::Species->new();
> $my_species_obj->species('Homo sapiens');
>
> print $my_species_obj->species;     # 'Homo sapiens'
>
>
> That works fine if I create the Bio::Species object myself.
>
> But if I try to get that string back out from a BIo::Species object created by 
> SeqIO from a genbank file, I get just 'sapiens' back:
>
> my $io = Bio::SeqIO->new('-format' => 'genbank',
>                         '-file'   => 'hoxa2.gb');
> my $seq_obj = $io->next_seq;
> my $io_species_obj = $seq_obj->species;
>
> print $io_species_obj->species;     # 'sapiens'
>
>
> I think that happens because genbank records have more taxonomic info about 
> the species name, like the genus (and in fact the whole taxonomic 
> categorization: kingdom phylum order, etc). So the genus is stored separately.
>
> Poking around a bit more in Bio::Species, I turned up the method 'binomial', 
> which appears to do the right thing, returning genus and species in both 
> cases. Except, as you can see, the space is stripped out for my 
> species-name-is-just-a-string object:
>
> print $my_species_obj->binomial;    # 'Homosapiens'
> print $io_species_obj->binomial;    # 'Homo sapiens'
>
>
> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I 
> using it correctly above, or is there a better way?
>
> If not, this kinda looks like a bug to me. I've got a patch which works and 
> passes the BioPerl test suite.
>
>
> Thanks,
> Dave
>
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 10:24:06 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 10:24:06 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
	<E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
Message-ID: <F1C8FA379C5746FB8987C1D41905C3F3@NewLife>

True-- blast+ allows remote dbs. I just commited a patch that makes
this easy in StandAloneBlastPlus: specify '-remote => 1' in the
factory, and downstream command calls will take care of it-
MAJ

# ex...
use Bio::Tools::Run::StandAloneBlastPlus;
use Bio::Seq;

$ENV{BLASTPLUSDIR} = $where_it_is;
my $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
    -db_name => 'wgs',
    -remote => 1
    );
my $result = $fac->blastn(
    -query => 
Bio::Seq->new(-seq=>'ggcaacaaacctggtaaagaagacggcaacaagcctggtaaagaagatggcaacaagcct',
       -id=>"proteinA")
    );


1;

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Markel" <smarkel at accelrys.com>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 1:33 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From SMarkel at accelrys.com  Fri Jan 15 10:40:31 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Fri, 15 Jan 2010 07:40:31 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
	<E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>

Chris,

It was nice meeting you and Scott C., too.  And seeing Jason again.

If you and Mark

> How about returning 1, 2, 4 for the non-zero cases, with some
> error constants set for convenience? MAJ

are okay with adding more return values, that works best for us in
Pipeline Pilot.

I'll add a Bugzilla entry.

Scott


-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu] 
Sent: Thursday, 14 January 2010 10:34 PM
To: Scott Markel
Cc: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes

Scott,

I think this is fine (to change the third condition and retry with a specific code).  The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance).

One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3.  Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.

chris

PS - BTW, nice to finally meet you at GMOD!

On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:

> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Fri Jan 15 11:00:21 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Jan 2010 10:00:21 -0600
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
	<C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
Message-ID: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>

> FWIW, I'd prefer "binomial" = "genus" . "species"


That's the way Bio::Species is supposed to work, at least when it was refactored by Sendu.  But just a note: Bio::Species was considered deprecated (scheduled for the 1.7 release IIRC) for many very good reasons in favor of Bio::Taxon.  First and foremost among these is the fact we cannot consistently parse out the genus/species/strain/variant/etc for every organism in GenBank w/o knowing it's full lineage, which means including some taxonomic information.  And even then it's highly problematic.

We've had several heated discussions on list about how to handle this in a somewhat backwards-compatible way, and the main solution was to forego compatibility issues altogether and eventually deprecate Bio::Species altogether in favor of Bio::Taxon, a class that doesn't make the same assumptions.  Bio::Species, in the interim, is-a Bio::Taxon.  You'll note that a minimal Bio::DB::Taxonomy instance is constructed from the classification scheme in some instances, but if one had a proper DB link one could link to Entrez Taxonomy or a local flat file indexes DB and grab the info.  Bio::Taxon (correct me if I'm wrong on this Sendu, if you're out there) eschews various methods (species, etc) for simpler consistent ones based on Taxonomy, and doesn't force us to handle every exception to getting the genus/species out of a name.  That is left up to the user, at their peril.

For either one, if you are reproducing the fully qualified name, you probably should use something like node_name() for consistency.  Bio::Species also has scientific_name().  With a true Bio::Taxon one would need to be check this is performed on the species node.

chris

On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote:

> I'm not that familiar with Bio::Species either, but this looks
> like conflicting semantics betwen Bio::Species and Bio::SeqIO.
> Bio::SeqIO sets the species accessor to the 'species' element of
> the lineage array, I believe.
> FWIW, I'd prefer "binomial" = "genus" . "species"
> MAJ
> ----- Original Message ----- From: "Dave Messina" <David.Messina at sbc.su.se>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 15, 2010 10:17 AM
> Subject: [Bioperl-l] getting/setting species names with Bio::Species
> 
> 
>> Hi everybody,
>> 
>> I'm having a little trouble with names in Bio::Species objects.
>> 
>> According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method:
>> 
>> my $my_species_obj = Bio::Species->new();
>> $my_species_obj->species('Homo sapiens');
>> 
>> print $my_species_obj->species;     # 'Homo sapiens'
>> 
>> 
>> That works fine if I create the Bio::Species object myself.
>> 
>> But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back:
>> 
>> my $io = Bio::SeqIO->new('-format' => 'genbank',
>>                        '-file'   => 'hoxa2.gb');
>> my $seq_obj = $io->next_seq;
>> my $io_species_obj = $seq_obj->species;
>> 
>> print $io_species_obj->species;     # 'sapiens'
>> 
>> 
>> I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately.
>> 
>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object:
>> 
>> print $my_species_obj->binomial;    # 'Homosapiens'
>> print $io_species_obj->binomial;    # 'Homo sapiens'
>> 
>> 
>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way?
>> 
>> If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite.
>> 
>> 
>> Thanks,
>> Dave
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From SMarkel at accelrys.com  Fri Jan 15 11:10:34 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Fri, 15 Jan 2010 08:10:34 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <FE85CD2526044E8797D5A1A248AF6866@NewLife>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net><E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
	<5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
	<FE85CD2526044E8797D5A1A248AF6866@NewLife>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B30A7@EXCH1-COLO.accelrys.net>

Mark,

Thank you.

Scott


-----Original Message-----
From: Mark A. Jensen [mailto:maj at fortinbras.us] 
Sent: Friday, 15 January 2010 8:10 AM
To: Scott Markel; Chris Fields
Cc: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes

can do Scott-- cheers MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:40 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Chris,
>
> It was nice meeting you and Scott C., too.  And seeing Jason again.
>
> If you and Mark
>
>> How about returning 1, 2, 4 for the non-zero cases, with some
>> error constants set for convenience? MAJ
>
> are okay with adding more return values, that works best for us in
> Pipeline Pilot.
>
> I'll add a Bugzilla entry.
>
> Scott
>
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thursday, 14 January 2010 10:34 PM
> To: Scott Markel
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
>
> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 11:09:38 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 11:09:38 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net><E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
	<5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
Message-ID: <FE85CD2526044E8797D5A1A248AF6866@NewLife>

can do Scott-- cheers MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:40 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Chris,
>
> It was nice meeting you and Scott C., too.  And seeing Jason again.
>
> If you and Mark
>
>> How about returning 1, 2, 4 for the non-zero cases, with some
>> error constants set for convenience? MAJ
>
> are okay with adding more return values, that works best for us in
> Pipeline Pilot.
>
> I'll add a Bugzilla entry.
>
> Scott
>
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thursday, 14 January 2010 10:34 PM
> To: Scott Markel
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
>
> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 11:10:02 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 11:10:02 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se><C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
	<16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>
Message-ID: <C4C0A0697FCE4CFD897AD58FA7FD58AA@NewLife>

excellent summary--thanks!!
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 11:00 AM
Subject: Re: [Bioperl-l] getting/setting species names with Bio::Species


>> FWIW, I'd prefer "binomial" = "genus" . "species"
>
>
> That's the way Bio::Species is supposed to work, at least when it was 
> refactored by Sendu.  But just a note: Bio::Species was considered deprecated 
> (scheduled for the 1.7 release IIRC) for many very good reasons in favor of 
> Bio::Taxon.  First and foremost among these is the fact we cannot consistently 
> parse out the genus/species/strain/variant/etc for every organism in GenBank 
> w/o knowing it's full lineage, which means including some taxonomic 
> information.  And even then it's highly problematic.
>
> We've had several heated discussions on list about how to handle this in a 
> somewhat backwards-compatible way, and the main solution was to forego 
> compatibility issues altogether and eventually deprecate Bio::Species 
> altogether in favor of Bio::Taxon, a class that doesn't make the same 
> assumptions.  Bio::Species, in the interim, is-a Bio::Taxon.  You'll note that 
> a minimal Bio::DB::Taxonomy instance is constructed from the classification 
> scheme in some instances, but if one had a proper DB link one could link to 
> Entrez Taxonomy or a local flat file indexes DB and grab the info.  Bio::Taxon 
> (correct me if I'm wrong on this Sendu, if you're out there) eschews various 
> methods (species, etc) for simpler consistent ones based on Taxonomy, and 
> doesn't force us to handle every exception to getting the genus/species out of 
> a name.  That is left up to the user, at their peril.
>
> For either one, if you are reproducing the fully qualified name, you probably 
> should use something like node_name() for consistency.  Bio::Species also has 
> scientific_name().  With a true Bio::Taxon one would need to be check this is 
> performed on the species node.
>
> chris
>
> On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote:
>
>> I'm not that familiar with Bio::Species either, but this looks
>> like conflicting semantics betwen Bio::Species and Bio::SeqIO.
>> Bio::SeqIO sets the species accessor to the 'species' element of
>> the lineage array, I believe.
>> FWIW, I'd prefer "binomial" = "genus" . "species"
>> MAJ
>> ----- Original Message ----- From: "Dave Messina" <David.Messina at sbc.su.se>
>> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 15, 2010 10:17 AM
>> Subject: [Bioperl-l] getting/setting species names with Bio::Species
>>
>>
>>> Hi everybody,
>>>
>>> I'm having a little trouble with names in Bio::Species objects.
>>>
>>> According to the Bio::Species documentation, if I have a species name as a 
>>> string, like "Homo sapiens", I can get and set that using the species 
>>> method:
>>>
>>> my $my_species_obj = Bio::Species->new();
>>> $my_species_obj->species('Homo sapiens');
>>>
>>> print $my_species_obj->species;     # 'Homo sapiens'
>>>
>>>
>>> That works fine if I create the Bio::Species object myself.
>>>
>>> But if I try to get that string back out from a BIo::Species object created 
>>> by SeqIO from a genbank file, I get just 'sapiens' back:
>>>
>>> my $io = Bio::SeqIO->new('-format' => 'genbank',
>>>                        '-file'   => 'hoxa2.gb');
>>> my $seq_obj = $io->next_seq;
>>> my $io_species_obj = $seq_obj->species;
>>>
>>> print $io_species_obj->species;     # 'sapiens'
>>>
>>>
>>> I think that happens because genbank records have more taxonomic info about 
>>> the species name, like the genus (and in fact the whole taxonomic 
>>> categorization: kingdom phylum order, etc). So the genus is stored 
>>> separately.
>>>
>>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', 
>>> which appears to do the right thing, returning genus and species in both 
>>> cases. Except, as you can see, the space is stripped out for my 
>>> species-name-is-just-a-string object:
>>>
>>> print $my_species_obj->binomial;    # 'Homosapiens'
>>> print $io_species_obj->binomial;    # 'Homo sapiens'
>>>
>>>
>>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I 
>>> using it correctly above, or is there a better way?
>>>
>>> If not, this kinda looks like a bug to me. I've got a patch which works and 
>>> passes the BioPerl test suite.
>>>
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hlapp at drycafe.net  Fri Jan 15 12:04:43 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Fri, 15 Jan 2010 12:04:43 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
Message-ID: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>


On Jan 15, 2010, at 10:17 AM, Dave Messina wrote:

> According to the Bio::Species documentation, if I have a species  
> name as a string, like "Homo sapiens", I can get and set that using  
> the species method:
>
> my $my_species_obj = Bio::Species->new();
> $my_species_obj->species('Homo sapiens');


If that's really what the documentation says, it's wrong. It is the  
binomial() method that does this (as getter and setter).

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From David.Messina at sbc.su.se  Fri Jan 15 13:37:17 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 15 Jan 2010 19:37:17 +0100
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
	<2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>
Message-ID: <24798E45-CF24-47D9-AB39-E66C35A5FA8B@sbc.su.se>

Thanks guys.

Well, looks like I ignored the deprecation warnings at my own peril. :)

I'll reimplement my code using Bio::Taxon directly instead. I made a little test using the node_name() method as Chris suggested, and it seems to do the trick nicely.


> If that's really what the documentation says, it's wrong.

I'm afraid so. In the POD
>  Title   : species
>  Usage   : $self->species( $species );
>            $species = $self->species();
>  Function: Get or set the scientific species name.
>  Example : $self->species('Homo sapiens');
>  Returns : Scientific species name as string
>  Args    : Scientific species name as string

and the HOWTO 
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object
> # legible and long
> my $species_object = $seq_object->species;
> my $species_string = $species_object->species;
> 
> # Perlish
> my $species_string = $seq_object->species->species;
> # either way, $species_string is "Homo sapiens"


Unless there's objection, I'll fix both of those.


> It is the binomial() method that does this (as getter and setter).

Great, thanks for the clarification, Hilmar.


From bhakti.dwivedi at gmail.com  Sun Jan 17 11:02:47 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sun, 17 Jan 2010 11:02:47 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
Message-ID: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>

Hi

Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
&& hit1 -> query1)  from a blast table report?

Thanks

BD

From cjfields at illinois.edu  Sun Jan 17 12:45:08 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 17 Jan 2010 11:45:08 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <4FC546A8-079F-4A17-AB96-D4A0060904D6@illinois.edu>

It's probably not best to use BioPerl directly for this.  Have you tried OrthoMCL, or InParanoid? 

chris

On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote:

> Hi
> 
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?
> 
> Thanks
> 
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Sun Jan 17 16:03:24 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 17 Jan 2010 16:03:24 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <B602C24552CF42C58F80F3883198121C@NewLife>

re Chris's answer, check out this archived post:
http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
cheers MAJ
----- Original Message ----- 
From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 17, 2010 11:02 AM
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?


> Hi
> 
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?
> 
> Thanks
> 
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From bhakti.dwivedi at gmail.com  Sun Jan 17 16:10:03 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sun, 17 Jan 2010 16:10:03 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <B602C24552CF42C58F80F3883198121C@NewLife>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<B602C24552CF42C58F80F3883198121C@NewLife>
Message-ID: <b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>

Thank you!


On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> re Chris's answer, check out this archived post:
> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
> cheers MAJ
> ----- Original Message ----- From: "Bhakti Dwivedi" <
> bhakti.dwivedi at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Sunday, January 17, 2010 11:02 AM
> Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
>
>
>  Hi
>>
>> Is there a Bio-perl module to parse the reciprocal best hits (query1->
>> hit1
>> && hit1 -> query1)  from a blast table report?
>>
>> Thanks
>>
>> BD
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>

From cjfields at illinois.edu  Sun Jan 17 17:00:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 17 Jan 2010 16:00:02 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<B602C24552CF42C58F80F3883198121C@NewLife>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
Message-ID: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>

OrthoMCL has updated to v2 and no longer uses BioPerl, just plain perl.  Database is available here:

http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi

Package (you'll need a few other things to get it working):

http://orthomcl.org/common/downloads/software/

chris

On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:

> Thank you!
> 
> 
> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> 
>> re Chris's answer, check out this archived post:
>> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
>> cheers MAJ
>> ----- Original Message ----- From: "Bhakti Dwivedi" <
>> bhakti.dwivedi at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Sunday, January 17, 2010 11:02 AM
>> Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
>> 
>> 
>> Hi
>>> 
>>> Is there a Bio-perl module to parse the reciprocal best hits (query1->
>>> hit1
>>> && hit1 -> query1)  from a blast table report?
>>> 
>>> Thanks
>>> 
>>> BD
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From tristan.lefebure at gmail.com  Sun Jan 17 18:12:56 2010
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Sun, 17 Jan 2010 18:12:56 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
	<392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
Message-ID: <201001171812.56238.tristan.lefebure@gmail.com>

The transition to orthoMCL v2 being a bit painful (you need 
a MySQL database), I recently switched directly to MCL and 
the accompanying mclblastline and co programs. Modular, 
simple and very fast. Following some simulations, It gives 
better results with incomplete genomes than orthoMCL v1.x 
...

http://micans.org/mcl/

--Tristan

On Sunday 17 January 2010 17:00:02 Chris Fields wrote:
> OrthoMCL has updated to v2 and no longer uses BioPerl,
>  just plain perl.  Database is available here:
> 
> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi
> 
> Package (you'll need a few other things to get it
>  working):
> 
> http://orthomcl.org/common/downloads/software/
> 
> chris
> 
> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:
> > Thank you!
> >
> > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen 
<maj at fortinbras.us> wrote:
> >> re Chris's answer, check out this archived post:
> >> http://bioperl.org/pipermail/bioperl-l/2008-March/0273
> >>57.html cheers MAJ
> >> ----- Original Message ----- From: "Bhakti Dwivedi" <
> >> bhakti.dwivedi at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Sunday, January 17, 2010 11:02 AM
> >> Subject: [Bioperl-l] Reciprocal best hits using
> >> Bioperl?
> >>
> >>
> >> Hi
> >>
> >>> Is there a Bio-perl module to parse the reciprocal
> >>> best hits (query1-> hit1
> >>> && hit1 -> query1)  from a blast table report?
> >>>
> >>> Thanks
> >>>
> >>> BD
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason at bioperl.org  Sun Jan 17 18:59:05 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 17 Jan 2010 15:59:05 -0800
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <201001171812.56238.tristan.lefebure@gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
	<392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
	<201001171812.56238.tristan.lefebure@gmail.com>
Message-ID: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>

yes - but mcl alone is something slightly different in that it doesn't  
correct for inparalogs, but for incomplete genomes this is probably  
okay.

orthomcl2 does correct the major memory hog problem and efficiencies  
in the parsing in the previous version by relying on the db for the  
indexing and looking of the reciprocal hits.

-jason
On Jan 17, 2010, at 3:12 PM, Tristan Lefebure wrote:

> The transition to orthoMCL v2 being a bit painful (you need
> a MySQL database), I recently switched directly to MCL and
> the accompanying mclblastline and co programs. Modular,
> simple and very fast. Following some simulations, It gives
> better results with incomplete genomes than orthoMCL v1.x
> ...
>
> http://micans.org/mcl/
>
> --Tristan
>
> On Sunday 17 January 2010 17:00:02 Chris Fields wrote:
>> OrthoMCL has updated to v2 and no longer uses BioPerl,
>> just plain perl.  Database is available here:
>>
>> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi
>>
>> Package (you'll need a few other things to get it
>> working):
>>
>> http://orthomcl.org/common/downloads/software/
>>
>> chris
>>
>> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:
>>> Thank you!
>>>
>>> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen
> <maj at fortinbras.us> wrote:
>>>> re Chris's answer, check out this archived post:
>>>> http://bioperl.org/pipermail/bioperl-l/2008-March/0273
>>>> 57.html cheers MAJ
>>>> ----- Original Message ----- From: "Bhakti Dwivedi" <
>>>> bhakti.dwivedi at gmail.com>
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Sunday, January 17, 2010 11:02 AM
>>>> Subject: [Bioperl-l] Reciprocal best hits using
>>>> Bioperl?
>>>>
>>>>
>>>> Hi
>>>>
>>>>> Is there a Bio-perl module to parse the reciprocal
>>>>> best hits (query1-> hit1
>>>>> && hit1 -> query1)  from a blast table report?
>>>>>
>>>>> Thanks
>>>>>
>>>>> BD
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From tristan.lefebure at gmail.com  Sun Jan 17 20:36:38 2010
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Sun, 17 Jan 2010 20:36:38 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
Message-ID: <201001172036.39032.tristan.lefebure@gmail.com>

On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
> yes - but mcl alone is something slightly different in
>  that it doesn't   correct for inparalogs, but for
>  incomplete genomes this is probably okay.

interestingly, my experience with not too divergent 
bacterial genomes (same genera) does not support the 
normalization used in the orthoMCL (which, as far as I 
understand, is a standardization of the -Log10(evalue) per 
taxa combination, including a taxa with itself). MCL, which 
does not do any normalization (just -Log10(evalue)) gives 
about the same number of false negative (i.e. missed 
orthologs), but a lot less false positive (false orthologs). 
In other words, you get many fake singletons. I don't known 
exactly if the problem lies in the normalization process or 
the fact that orthoMCLv1.x is using a very old version of 
MCL. What I do known is that many false positive are made of 
short or incomplete proteins that are very common in draft 
genomes and automatic annotations... Things might be 
completely different with more divergent and globally longer 
proteins. Testing orthoMCLv2 on the same data set would 
probably give the answer.

--Tristan

From robert.bradbury at gmail.com  Mon Jan 18 05:20:33 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Mon, 18 Jan 2010 05:20:33 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <201001172036.39032.tristan.lefebure@gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
Message-ID: <deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>

My comment might be that the problem with OrthoMCL is that it is
primarily lower organisms.  The problem with Ensembl (and some other
databases) is that it is primarliy higher organisms (though they do
include Drosophila, C. elegans and Yeast).

The problem arises when one wants to cross those boundaries.  For
example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
tRNAs, and the fundamental biochemistry (EC) proteins are homologous
all the way from the most ancient bacteria through H. sapiens.  The
only way to play in the mixed arena of prokaryotes and eukaryotes
involving fundamental vectors in evolution is to either construct ones
own databases (which presumably means getting involved with MySQL, and
probably spending some $$$ on hardware) or to develop some BioPerl
modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
using some part of the cloud.  This problem isn't going to get smaller
its only going to get larger, now that the cost of sequencing
(pseudo-resequencing) a vertebrate genome is starting to come in under
$10,000 and people are starting to seriously talk about 10,000
vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
people are going to undertake very soon.

Robert


On 1/17/10, Tristan Lefebure <tristan.lefebure at gmail.com> wrote:
> On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
>> yes - but mcl alone is something slightly different in
>>  that it doesn't   correct for inparalogs, but for
>>  incomplete genomes this is probably okay.
>
> interestingly, my experience with not too divergent
> bacterial genomes (same genera) does not support the
> normalization used in the orthoMCL (which, as far as I
> understand, is a standardization of the -Log10(evalue) per
> taxa combination, including a taxa with itself). MCL, which
> does not do any normalization (just -Log10(evalue)) gives
> about the same number of false negative (i.e. missed
> orthologs), but a lot less false positive (false orthologs).
> In other words, you get many fake singletons. I don't known
> exactly if the problem lies in the normalization process or
> the fact that orthoMCLv1.x is using a very old version of
> MCL. What I do known is that many false positive are made of
> short or incomplete proteins that are very common in draft
> genomes and automatic annotations... Things might be
> completely different with more divergent and globally longer
> proteins. Testing orthoMCLv2 on the same data set would
> probably give the answer.
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From ghhu at sibs.ac.cn  Sun Jan 17 21:34:23 2010
From: ghhu at sibs.ac.cn (Guohong Hu)
Date: Mon, 18 Jan 2010 10:34:23 +0800
Subject: [Bioperl-l] Bioperl 1.6
Message-ID: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>

Hi there,

 
I was trying to install BioPerl in windows using ppm, by following the
instruction in
"http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
the repositories, and did the search of Bioperl packages. The latest version
available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
install it, a number of prerequisite modules were being installed too, which
include Bioperl 1.4. Then an error message showed up during installation:

 
"ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
BioPerl has already installed a file that package bioperl wants to install."

 
It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
wanted to install again. I don't know why bioperl 1.4 was one of the
prerequisites for 1.6.1. If I just install 1.4, it will be installed without
errors. But I need a newer version, because some modules (like

Bio::Tools::HMM) is not included in 1.4.

 
I saw on internet that somebody had the same problem when he was trying to
install BioPerl 1.5, but I didn't find the solution.

 
Anybody has a clue on that? Thank you for your time.

 
GH

 
From cjfields at illinois.edu  Mon Jan 18 10:30:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 09:30:20 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
Message-ID: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>

Guohong, 

1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first.  Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused).  Just curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I can physically remove it to prevent this from happening.

chris

On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:

> Hi there,
> 
> 
> 
> I was trying to install BioPerl in windows using ppm, by following the
> instruction in
> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
> the repositories, and did the search of Bioperl packages. The latest version
> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
> install it, a number of prerequisite modules were being installed too, which
> include Bioperl 1.4. Then an error message showed up during installation:
> 
> 
> 
> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
> BioPerl has already installed a file that package bioperl wants to install."
> 
> 
> 
> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
> wanted to install again. I don't know why bioperl 1.4 was one of the
> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
> errors. But I need a newer version, because some modules (like
> 
> Bio::Tools::HMM) is not included in 1.4.
> 
> 
> 
> I saw on internet that somebody had the same problem when he was trying to
> install BioPerl 1.5, but I didn't find the solution.
> 
> 
> 
> Anybody has a clue on that? Thank you for your time.
> 
> 
> 
> GH
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 18 11:12:08 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 10:12:08 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
	<deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
Message-ID: <B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>

(my small rant on this)

On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote:

> My comment might be that the problem with OrthoMCL is that it is
> primarily lower organisms.  The problem with Ensembl (and some other
> databases) is that it is primarliy higher organisms (though they do
> include Drosophila, C. elegans and Yeast).

OrthoMCL v2 handles both lower and higher organism; I've used it for both, with decent success.  Most other ortholog tools do as well (if I'm not mistaken, ensembl also uses MCL under the hood, unless that's changed).  I don't believe one should be completely bound to one toolset, particularly in this case (there are lots of nice ortholog clustering tools using various moeans of comparison out there), but I do think OrthoMCL is very good as an initial pass.  If anything, I would like a set of (possibly bioperl-based, definitely DB-based) modules that can deal with this information.

The more imperative issue in my opinion is that one is prisoner to the gene models for those specific organisms of interest, and this may vary widely depending on the source of those gene models (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc).  For instance, if gene models are poorly curated or rarely updated, the comparisons may be significantly flawed.  Some of these issues may also be (somewhat) alleviated once more transcriptome data is available that helps clear up gene model ambiguities, but that won't be true for all organisms, at least initially.

Note this isn't meant as a slam on any specific DBs or MODs in general, the problem is one born of the fact that there isn't a single, centralized, trusted, consistently updated source for this data, specifically something that will handle moderated third-party annotation.  That's a very difficult problem to solve effectively.  Some of these very issues crept up at the GMOD conference, and there appears to be consensus that a real attempt is needed to address this.  

I don't know, maybe it's just unicorns and rainbows.  Personally I do think the situation will improve, as there seems to be great demand for it, but it requires time, resources, manpower, money, cat herding, etc.

> The problem arises when one wants to cross those boundaries.  For
> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
> tRNAs, and the fundamental biochemistry (EC) proteins are homologous
> all the way from the most ancient bacteria through H. sapiens.  The
> only way to play in the mixed arena of prokaryotes and eukaryotes
> involving fundamental vectors in evolution is to either construct ones
> own databases (which presumably means getting involved with MySQL, and
> probably spending some $$$ on hardware) or to develop some BioPerl
> modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
> using some part of the cloud.  This problem isn't going to get smaller
> its only going to get larger, now that the cost of sequencing
> (pseudo-resequencing) a vertebrate genome is starting to come in under
> $10,000 and people are starting to seriously talk about 10,000
> vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
> people are going to undertake very soon.
> 
> Robert

They're already undertaking it now using a broad range of organisms, in and out of the cloud.  In most cases one can amend a prior recip. comparative analysis with new data fairly easily, if one takes care to do so early on (i.e. set up the BLAST databases with a specified defined size for comparative stats between separate analyses).  OrthoMCL v2 describes a procedure to do this, and I believe others have similar methodology.  

I could also see possible ways one can further optimize this, for instance in cases where two very closely-related organisms are compared, where translated seqs are 100% identical, etc.  IIRC, the OrthoMCL DB site already has a way to upload custom sets of protein data for mapping to (already pre-run) clusters.  Just the fact that the tools are available as OS, they're semi-automated, and can be generically applied to data of personal interest is a great boon.  Not sure I see the downside of that, and I'm pretty confident the scalability issues will be addressed in some way.

chris

From maj at fortinbras.us  Mon Jan 18 11:33:12 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 11:33:12 -0500
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
Message-ID: <6093E45F17B543438AC02E6C626439E1@NewLife>

this issue's come up before, see this thread
http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Guohong Hu" <ghhu at sibs.ac.cn>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 10:30 AM
Subject: Re: [Bioperl-l] Bioperl 1.6


> Guohong,
>
> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed 
> first.  Make sure the repos are set according to the Windows installation 
> instructions on the BioPerl wiki:
>
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> IIRC the actual order of the PPM repository can be critical (PPM pulls based 
> on highest version, first repo, but sometimes it gets confused).  Just curious 
> but where is the v 1.4 PPM located?  If it is local to our PPM repo I can 
> physically remove it to prevent this from happening.
>
> chris
>
> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>
>> Hi there,
>>
>>
>>
>> I was trying to install BioPerl in windows using ppm, by following the
>> instruction in
>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>> the repositories, and did the search of Bioperl packages. The latest version
>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>> install it, a number of prerequisite modules were being installed too, which
>> include Bioperl 1.4. Then an error message showed up during installation:
>>
>>
>>
>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>> BioPerl has already installed a file that package bioperl wants to install."
>>
>>
>>
>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>> wanted to install again. I don't know why bioperl 1.4 was one of the
>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>> errors. But I need a newer version, because some modules (like
>>
>> Bio::Tools::HMM) is not included in 1.4.
>>
>>
>>
>> I saw on internet that somebody had the same problem when he was trying to
>> install BioPerl 1.5, but I didn't find the solution.
>>
>>
>>
>> Anybody has a clue on that? Thank you for your time.
>>
>>
>>
>> GH
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jan 18 12:18:34 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 11:18:34 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <6093E45F17B543438AC02E6C626439E1@NewLife>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<6093E45F17B543438AC02E6C626439E1@NewLife>
Message-ID: <E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>

Mark,

Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this?  Regardless, it's problematic for me to test this out directly, at least for the next few days.  Maybe someone could try it?

Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this).

chris

On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote:

> this issue's come up before, see this thread
> http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
> MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Guohong Hu" <ghhu at sibs.ac.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Monday, January 18, 2010 10:30 AM
> Subject: Re: [Bioperl-l] Bioperl 1.6
> 
> 
>> Guohong,
>> 
>> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first.  Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki:
>> 
>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>> 
>> IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused).  Just curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I can physically remove it to prevent this from happening.
>> 
>> chris
>> 
>> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>> 
>>> Hi there,
>>> 
>>> 
>>> 
>>> I was trying to install BioPerl in windows using ppm, by following the
>>> instruction in
>>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>>> the repositories, and did the search of Bioperl packages. The latest version
>>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>>> install it, a number of prerequisite modules were being installed too, which
>>> include Bioperl 1.4. Then an error message showed up during installation:
>>> 
>>> 
>>> 
>>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>>> BioPerl has already installed a file that package bioperl wants to install."
>>> 
>>> 
>>> 
>>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>>> wanted to install again. I don't know why bioperl 1.4 was one of the
>>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>>> errors. But I need a newer version, because some modules (like
>>> 
>>> Bio::Tools::HMM) is not included in 1.4.
>>> 
>>> 
>>> 
>>> I saw on internet that somebody had the same problem when he was trying to
>>> install BioPerl 1.5, but I didn't find the solution.
>>> 
>>> 
>>> 
>>> Anybody has a clue on that? Thank you for your time.
>>> 
>>> 
>>> 
>>> GH
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From clarsen at vecna.com  Mon Jan 18 12:42:13 2010
From: clarsen at vecna.com (Chris Larsen)
Date: Mon, 18 Jan 2010 12:42:13 -0500
Subject: [Bioperl-l] Reciprocal best blast hits using BioPerl?
In-Reply-To: <B7BD7C2F-4A70-49B5-9074-7EBAF5094AE9@vecna.com>
References: <B0218AEF-3CEB-4E06-B8DF-7B302D024797@vecna.com>
	<B7BD7C2F-4A70-49B5-9074-7EBAF5094AE9@vecna.com>
Message-ID: <ED172CDA-A8C3-4488-9648-1FBA7036BAD6@vecna.com>

Bhakti, (and Chris, Mark)--

Yes there is some perl available to parse reciprocal best blast hits.

Mark's referenced / archived post was mine, we were looking to do what  
you wanted. Here we proceed with the thread.

We ended up implementing OrthoMCL 1.4 as Chris F pointed to, and then  
made a simple perl parser that would take the raw OrthoMCL output, do  
splits, and spit out a delimited table of all the orthologs in a  
group, for say Mycobacterium Genus, so you could stuff it into DBLoader.

The link to the script, SOP, and method is at:
http://www.biohealthbase.org/brcDocs/documents/BHB_ORTHOLOG_SOP.pdf

Giving e.g.:

Francisella 1 110321310
Francisella 1 110321361
Francisella 1 56707275
Francisella 1 56707366
Francisella 1 56707462

Five members of Ortholog Group 1, with just their gi number.  And you  
can see the results of that parsing, supported by a database, being  
used to load BioHealthbase with all the reciprocal best blast hits  
plus other OrthoMCL parsing, for mycobacterial PolA at:

http://www.biohealthbase.org/brc/details.do?locus=MAV_3155&decorator=mycobacterium

See? Pretty? We were just interested in making ortholog groups on the  
bais of paralog-conscious reciprocal blast stuff. Like you. This  
package and doc I've made does what you want I think, as long as you  
stay in prokaryotes. But--careful...garbage in, garbage out. We  
started with clean Genuses. (. o O Genii?). You'll get more junky HUGE  
and TINY ortholog groups if you put in different Orders of microbes.  
Its taxa sensitive. OrthoMCL author David Roos is great at it though  
and designed it in mind of higher unicellular euks too...comb the docs  
for that; sorry I was doing bacterial work at the time and cant guide  
you if thats what you want.. If you end up installing OrthMCL 1.4, you  
can pipe the output to this method and get out useable stuff.

Hope it works for you.

Cheers,

Chris L

-- 

Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525


From maj at fortinbras.us  Mon Jan 18 14:37:43 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 14:37:43 -0500
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<6093E45F17B543438AC02E6C626439E1@NewLife>
	<E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>
Message-ID: <61F331117B7C4E2282684FA240B9710F@NewLife>

I will play around with it-- in the meantime, Guohong, please look at the 
following
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
where there is a workaround for this issue, using the ppm-shell--
cheers,
Mark
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Guohong Hu" <ghhu at sibs.ac.cn>; <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 12:18 PM
Subject: Re: [Bioperl-l] Bioperl 1.6


Mark,

Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing 
this?  Regardless, it's problematic for me to test this out directly, at least 
for the next few days.  Maybe someone could try it?

Also, there is the Strawberry Perl alternative, which uses CPAN (I think 
ActiveState also supports this).

chris

On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote:

> this issue's come up before, see this thread
> http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
> MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Guohong Hu" <ghhu at sibs.ac.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Monday, January 18, 2010 10:30 AM
> Subject: Re: [Bioperl-l] Bioperl 1.6
>
>
>> Guohong,
>>
>> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed 
>> first.  Make sure the repos are set according to the Windows installation 
>> instructions on the BioPerl wiki:
>>
>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>>
>> IIRC the actual order of the PPM repository can be critical (PPM pulls based 
>> on highest version, first repo, but sometimes it gets confused).  Just 
>> curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I 
>> can physically remove it to prevent this from happening.
>>
>> chris
>>
>> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>>
>>> Hi there,
>>>
>>>
>>>
>>> I was trying to install BioPerl in windows using ppm, by following the
>>> instruction in
>>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>>> the repositories, and did the search of Bioperl packages. The latest version
>>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>>> install it, a number of prerequisite modules were being installed too, which
>>> include Bioperl 1.4. Then an error message showed up during installation:
>>>
>>>
>>>
>>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>>> BioPerl has already installed a file that package bioperl wants to install."
>>>
>>>
>>>
>>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>>> wanted to install again. I don't know why bioperl 1.4 was one of the
>>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>>> errors. But I need a newer version, because some modules (like
>>>
>>> Bio::Tools::HMM) is not included in 1.4.
>>>
>>>
>>>
>>> I saw on internet that somebody had the same problem when he was trying to
>>> install BioPerl 1.5, but I didn't find the solution.
>>>
>>>
>>>
>>> Anybody has a clue on that? Thank you for your time.
>>>
>>>
>>>
>>> GH
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Mon Jan 18 15:24:33 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Jan 2010 12:24:33 -0800
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
	<deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
	<B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>
Message-ID: <68DF70A5-63A6-428D-A7F1-7B3D01528375@bioperl.org>


On Jan 18, 2010, at 8:12 AM, Chris Fields wrote:

> (my small rant on this)
>
> On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote:
>
>> My comment might be that the problem with OrthoMCL is that it is
>> primarily lower organisms.  The problem with Ensembl (and some other
>> databases) is that it is primarliy higher organisms (though they do
>> include Drosophila, C. elegans and Yeast).
>
> OrthoMCL v2 handles both lower and higher organism; I've used it for  
> both, with decent success.  Most other ortholog tools do as well (if  
> I'm not mistaken, ensembl also uses MCL under the hood, unless  
> that's changed).  I don't believe one should be completely bound to  
> one toolset, particularly in this case (there are lots of nice  
> ortholog clustering tools using various moeans of comparison out  
> there), but I do think OrthoMCL is very good as an initial pass.  If  
> anything, I would like a set of (possibly bioperl-based, definitely  
> DB-based) modules that can deal with this information.
>
> The more imperative issue in my opinion is that one is prisoner to  
> the gene models for those specific organisms of interest, and this  
> may vary widely depending on the source of those gene models  
> (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc).  For  
> instance, if gene models are poorly curated or rarely updated, the  
> comparisons may be significantly flawed.  Some of these issues may  
> also be (somewhat) alleviated once more transcriptome data is  
> available that helps clear up gene model ambiguities, but that won't  
> be true for all organisms, at least initially.
>
> Note this isn't meant as a slam on any specific DBs or MODs in  
> general, the problem is one born of the fact that there isn't a  
> single, centralized, trusted, consistently updated source for this  
> data, specifically something that will handle moderated third-party  
> annotation.  That's a very difficult problem to solve effectively.   
> Some of these very issues crept up at the GMOD conference, and there  
> appears to be consensus that a real attempt is needed to address this.
>
> I don't know, maybe it's just unicorns and rainbows.  Personally I  
> do think the situation will improve, as there seems to be great  
> demand for it, but it requires time, resources, manpower, money, cat  
> herding, etc.
>
>> The problem arises when one wants to cross those boundaries.  For
>> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
>> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
>> tRNAs, and the fundamental biochemistry (EC) proteins are homologous
>> all the way from the most ancient bacteria through H. sapiens.  The
>> only way to play in the mixed arena of prokaryotes and eukaryotes
>> involving fundamental vectors in evolution is to either construct  
>> ones
>> own databases (which presumably means getting involved with MySQL,  
>> and
>> probably spending some $$$ on hardware) or to develop some BioPerl
>> modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
>> using some part of the cloud.  This problem isn't going to get  
>> smaller
>> its only going to get larger, now that the cost of sequencing
>> (pseudo-resequencing) a vertebrate genome is starting to come in  
>> under
>> $10,000 and people are starting to seriously talk about 10,000
>> vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
>> people are going to undertake very soon.
>>
>> Robert
>
> They're already undertaking it now using a broad range of organisms,  
> in and out of the cloud.  In most cases one can amend a prior recip.  
> comparative analysis with new data fairly easily, if one takes care  
> to do so early on (i.e. set up the BLAST databases with a specified  
> defined size for comparative stats between separate analyses).   
> OrthoMCL v2 describes a procedure to do this, and I believe others  
> have similar methodology.
>
> I could also see possible ways one can further optimize this, for  
> instance in cases where two very closely-related organisms are  
> compared, where translated seqs are 100% identical, etc.  IIRC, the  
> OrthoMCL DB site already has a way to upload custom sets of protein  
> data for mapping to (already pre-run) clusters.  Just the fact that  
> the tools are available as OS, they're semi-automated, and can be  
> generically applied to data of personal interest is a great boon.   
> Not sure I see the downside of that, and I'm pretty confident the  
> scalability issues will be addressed in some way.


I think that the approach that Paul Thomas's group at SRI http://www.ai.sri.com/esb/ 
  is doing is really what you'd want to focus on if you are only  
interested in a particular set of gene families rather than de novo  
clustering. That or the PhyloFacts approach http://phylogenomics.berkeley.edu/phylofacts/ 
  .  That is where HMMs are more appropriate, focusing on your initial  
seed set of families of proteins. HMMs for your families with some  
automated clustering initially to get better resolution.  Once you  
start throwing multiple 10^6 proteins  the unsupervised clustering  
approach may not be able to give as accurate or timely results but can  
be a good initial filtering step depending on how much initial  
knowledge you are starting with. Using HMM models won't be as  
computationally expensive either if you are compute limited.

TreeFam is also providing curated phylogenies of gene families http://www.treefam.org/ 
  that span the optisthokonts in that a few fungi are sprinkled in.

Also things like http://boinc.bio.wzw.tum.de/boincsimap/ provide ways  
to use distributed computing to calculate the matrix of similarities  
among proteins if you are interested in the exhaustive approach.

-jason

>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jay at jays.net  Mon Jan 18 18:36:20 2010
From: jay at jays.net (Jay Hannah)
Date: Mon, 18 Jan 2010 17:36:20 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <9AA13F94-3336-4CC1-89C4-249D0EB7C857@jays.net>

On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote:
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?

If all the advice and resources in this thread have not dissuaded you from writing your own, you could glance at cross_blast() here as reference:

   https://clabsvn.ist.unomaha.edu/anonsvn/user/jhannah/UNO/seqlab/seqlab/tutorial.pod

About the (abandoned) project:

   http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29

I wrote that in 2006 for clustering a few hundred proteins based on custom criteria.

Cheers,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From jay at jays.net  Mon Jan 18 19:22:48 2010
From: jay at jays.net (Jay Hannah)
Date: Mon, 18 Jan 2010 18:22:48 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
Message-ID: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>

I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do.

   http://github.com/jhannah/bio-broodcomb

It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. 

The first two functions I stuck in the framework:

Find subsequences (Bio::BroodComb::SubSeq):

   use Bio::BroodComb;
   my $bc = Bio::BroodComb->new();
   $bc->load_large_seq(file => "large_seq.fasta");
   $bc->load_small_seq(file => "small_seq.fasta");
   $bc->find_subseqs();
   print $bc->subseq_report1;

In-silico PCR (Bio::BroodComb::PCR):

  use Bio::BroodComb;
  my $bc = Bio::BroodComb->new();
  $bc->load_large_seq(file => "large_seq.fasta");
  $bc->add_primerset(
     description    => "U5/R",   # however you want it reported
     forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA',
     reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT',
  );
  $bc->find_pcr_hits();
  $bc->find_pcr_products();
  print $bc->pcr_report1;

I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever.

Suggestions, contributions welcome.   :)

   http://github.com/jhannah/bio-broodcomb

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From ocornejo at gmail.com  Mon Jan 18 19:46:10 2010
From: ocornejo at gmail.com (Omar Cornejo)
Date: Mon, 18 Jan 2010 16:46:10 -0800 (PST)
Subject: [Bioperl-l] installing bioperl for mac
Message-ID: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>

Dear People,
  I have tried to install Bioperl in my new Mac Book, which carries
the latest perl distribution (5.10.0) and for some reason I can't
(using fink) make it recognize this version or perl.
  I have tried:
fink install bioperl-pm510
fink install bioperl-pm5100

but neither one works.  Is it fine installing bioperl for perl v 5.9?

thank you,
Omar Cornejo

From jason at bioperl.org  Mon Jan 18 20:04:31 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Jan 2010 17:04:31 -0800
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <4B5502D9.2010706@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
	<4B5502D9.2010706@gmail.com>
Message-ID: <F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>

Alexandr -

Thanks for getting back to us - I am guessing the parser needs to  
recognize negative coordinates around about line 370 in Bio/AlignIO/ 
Handler/GenericAlignHandler.pm which assumes a split on '-' will be  
sufficient.

Can you post it as a bug to bugzilla along with attaching a record and  
script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/

-jason
On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote:

> I have contacted Pfam, and I have been told that The PDB file actually
> does include a reference to residue "-1":
>
> DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>
> DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>
>
> Since negative numbers are allowed in PDB, the data should probably be
> considered valid.
>
> There are quite a few records like this, so this is not an isolated  
> issue.
>
> Alexandr
>
> On 1/14/2010 7:20 PM, Jason Stajich wrote:
>> Seems like improper data really -- "-1" is an improper coordinate  
>> as far
>> as the parser is concerned. You may want to tell Pfam that there is
>> possible error in the dumper since that was the only record that had
>> this problem?
>>
>> -jason
>> On Jan 13, 2010, at 5:57 PM, albezg wrote:
>>
>>> Hi all,
>>>
>>> I have a problem using AlignIO to read Pfam database:
>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>>> alignment OK until the alignment PF00331.13. There it crashes with  
>>> the
>>> following message:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: '1-344' is not an integer.
>>>
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>>> STACK: Bio::Range::end
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>>> STACK: Bio::Annotation::Target::new
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
>>> GenericAlignHandler.pm:293
>>>
>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
>>> GenericAlignHandler.pm:73
>>>
>>> STACK: Bio::AlignIO::stockholm::next_aln
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>>> -----------------------------------------------------------
>>>
>>> It appears this is caused by this entry:
>>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>>
>>> I don't care about residues in PDB, so I have just removed minus  
>>> signs
>>> from the ranges. This seems to have fixed the crashing.
>>>
>>> Is it a known problem? Is there a solution for it?
>>>
>>> Thanks,
>>> Alexandr
>>>
>>>
>>> On 03/20/2009 05:09 PM, albezg wrote:
>>>>
>>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>>> alignment(SimpleAlign).
>>>>
>>>> There are no issues when I print it, however when I use AlignIO  
>>>> to write
>>>> the alignment to a FASTA file, it does not work. Is this behavior
>>>> intended?
>>>>
>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>>
>>>> The error:
>>>> ------------- EXCEPTION -------------
>>>> MSG: No sequence with name [1/1-11]
>>>> STACK Bio::SimpleAlign::displayname
>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>>> STACK Bio::AlignIO::fasta::write_aln
>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>>> STACK toplevel ./demo.pl:14
>>>> -------------------------------------
>>>>
>>>> Alexandr
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>>
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From cjfields at illinois.edu  Mon Jan 18 21:19:30 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 20:19:30 -0600
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
	<4B5502D9.2010706@gmail.com>
	<F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>
Message-ID: <46FD172A-69C0-436C-A005-AC38668C3347@illinois.edu>

Alexandr,

Posting the bug report would be great, should be an easy enough fix.

chris

On Jan 18, 2010, at 7:04 PM, Jason Stajich wrote:

> Alexandr -
> 
> Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient.
> 
> Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/
> 
> -jason
> On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote:
> 
>> I have contacted Pfam, and I have been told that The PDB file actually
>> does include a reference to residue "-1":
>> 
>> DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>> 
>> DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>> 
>> 
>> Since negative numbers are allowed in PDB, the data should probably be
>> considered valid.
>> 
>> There are quite a few records like this, so this is not an isolated issue.
>> 
>> Alexandr
>> 
>> On 1/14/2010 7:20 PM, Jason Stajich wrote:
>>> Seems like improper data really -- "-1" is an improper coordinate as far
>>> as the parser is concerned. You may want to tell Pfam that there is
>>> possible error in the dumper since that was the only record that had
>>> this problem?
>>> 
>>> -jason
>>> On Jan 13, 2010, at 5:57 PM, albezg wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have a problem using AlignIO to read Pfam database:
>>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>>>> alignment OK until the alignment PF00331.13. There it crashes with the
>>>> following message:
>>>> 
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: '1-344' is not an integer.
>>>> 
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>>>> STACK: Bio::Range::end
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>>>> STACK: Bio::Annotation::Target::new
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
>>>> 
>>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
>>>> 
>>>> STACK: Bio::AlignIO::stockholm::next_aln
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>>>> -----------------------------------------------------------
>>>> 
>>>> It appears this is caused by this entry:
>>>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>>> 
>>>> I don't care about residues in PDB, so I have just removed minus signs
>>>> from the ranges. This seems to have fixed the crashing.
>>>> 
>>>> Is it a known problem? Is there a solution for it?
>>>> 
>>>> Thanks,
>>>> Alexandr
>>>> 
>>>> 
>>>> On 03/20/2009 05:09 PM, albezg wrote:
>>>>> 
>>>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>>>> alignment(SimpleAlign).
>>>>> 
>>>>> There are no issues when I print it, however when I use AlignIO to write
>>>>> the alignment to a FASTA file, it does not work. Is this behavior
>>>>> intended?
>>>>> 
>>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>>> 
>>>>> The error:
>>>>> ------------- EXCEPTION -------------
>>>>> MSG: No sequence with name [1/1-11]
>>>>> STACK Bio::SimpleAlign::displayname
>>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>>>> STACK Bio::AlignIO::fasta::write_aln
>>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>>>> STACK toplevel ./demo.pl:14
>>>>> -------------------------------------
>>>>> 
>>>>> Alexandr
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> -- 
>>> Jason Stajich
>>> jason.stajich at gmail.com
>>> jason at bioperl.org
>>> http://fungalgenomes.org/
>>> 
>> 
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 18 21:20:31 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 20:20:31 -0600
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
Message-ID: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>

On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:

> Dear People,
>  I have tried to install Bioperl in my new Mac Book, which carries
> the latest perl distribution (5.10.0) and for some reason I can't
> (using fink) make it recognize this version or perl.
>  I have tried:
> fink install bioperl-pm510
> fink install bioperl-pm5100
> 
> but neither one works.  Is it fine installing bioperl for perl v 5.9?
> 
> thank you,
> Omar Cornejo

fink doesn't have a package for perl 5.10.  You can install it using CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX installation instructions on the wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

chris

From dan.kortschak at adelaide.edu.au  Mon Jan 18 21:47:47 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 19 Jan 2010 13:17:47 +1030
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie
 now available BETA
Message-ID: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>

Hi All,

A wrapper and output parser for bowtie 'ultrafast, memory-efficient
short read aligner' are now available in the bioperl-live and
bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
bioperl-run/trunk at 16726). Bowtie details are available here:

http://bowtie-bio.sourceforge.net/index.shtml

The modules can return a Bio::Assembly::Scaffold object (operating via
the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
uses large amounts of memory - the test suite works for me with >=2GB
but not with 1GB due to this. (Is there a disk file system based tool
for this for large projects?)

Bowtie (>0.12.0) can align in colour space, but this is not currently
supported by the wrapper though it should not be difficult to add. If
someone can point me to a small set of colour space reads and a
reference sequence I will be able to use these for testing.

Thanks to the core devs for helping me with many of my problems in
putting this together.

Dan


From maj at fortinbras.us  Mon Jan 18 22:31:36 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 22:31:36 -0500
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
	Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <D26A5B3DAFDA4068863C7735BAF7894B@NewLife>

Excellent Dan! Thanks for all this work-- MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 9:47 PM
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now 
available BETA


> Hi All,
>
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
>
> http://bowtie-bio.sourceforge.net/index.shtml
>
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
>
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
>
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
>
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jan 18 22:36:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 21:36:12 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
	Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <CD36CE88-DC05-4A17-86A7-17A85C14F67A@illinois.edu>

On Jan 18, 2010, at 8:47 PM, Dan Kortschak wrote:

> Hi All,
> 
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
> 
> http://bowtie-bio.sourceforge.net/index.shtml
> 
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
> 
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
> 
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
> 
> Dan

And (on behalf of the core devs) thank you for putting this together!

chris

From scott at scottcain.net  Mon Jan 18 22:41:43 2010
From: scott at scottcain.net (Scott Cain)
Date: Mon, 18 Jan 2010 22:41:43 -0500
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
Message-ID: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>

But make sure you have the developers tools installed before the first
time you run the cpan shell; it will make your life easier.

Scott


On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
>
>> Dear People,
>> ?I have tried to install Bioperl in my new Mac Book, which carries
>> the latest perl distribution (5.10.0) and for some reason I can't
>> (using fink) make it recognize this version or perl.
>> ?I have tried:
>> fink install bioperl-pm510
>> fink install bioperl-pm5100
>>
>> but neither one works. ?Is it fine installing bioperl for perl v 5.9?
>>
>> thank you,
>> Omar Cornejo
>
> fink doesn't have a package for perl 5.10. ?You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. ?See the UNIX installation instructions on the wiki:
>
> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Mon Jan 18 23:04:57 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 22:04:57 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <009801c8b957$2af4f8d0$80deea70$@ac.cn>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<009801c8b957$2af4f8d0$80deea70$@ac.cn>
Message-ID: <79D53148-1FDA-4025-99A6-77A7F124E6BD@illinois.edu>

Hmm, the trouchelle repo is the only one that had a working DB_File for perl 5.10 (not sure but I think 5.8.9 was fine).  Probably worth contacting them about this to see if they can drop the (way out-of-date) 1.4 distribution.

chris

On May 18, 2008, at 9:22 PM, Guohong Hu wrote:

> Thank for you all. The problem is solved. The bioperl 1.4 version is from
> the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I
> added all the repo according to the bioperl wiki instruction, somehow 1.4
> became a prerequisite for 1.6. But Chris's question reminded me, so I
> removed Trouchelle repo, and the installation proceeded without errors. I
> suggested we put a note in the wiki link since it looks like an odd issue
> not just for me.
> 
> Best,
> Guohong
> 
> 
> 
> _________________________________________
> ??????: Chris Fields [mailto:cjfields at illinois.edu] 
> ????????: 2010??1??18?? 23:30
> ??????: Guohong Hu
> ????: bioperl-l at lists.open-bio.org
> ????: Re: [Bioperl-l] Bioperl 1.6
> 
> Guohong, 
> 
> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed
> first.  Make sure the repos are set according to the Windows installation
> instructions on the BioPerl wiki:
> 
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> 
> IIRC the actual order of the PPM repository can be critical (PPM pulls based
> on highest version, first repo, but sometimes it gets confused).  Just
> curious but where is the v 1.4 PPM located?  If it is local to our PPM repo
> I can physically remove it to prevent this from happening.
> 
> chris
> 
> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
> 
>> Hi there,
>> 
>> 
>> 
>> I was trying to install BioPerl in windows using ppm, by following the
>> instruction in
>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>> the repositories, and did the search of Bioperl packages. The latest
> version
>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>> install it, a number of prerequisite modules were being installed too,
> which
>> include Bioperl 1.4. Then an error message showed up during installation:
>> 
>> 
>> 
>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>> BioPerl has already installed a file that package bioperl wants to
> install."
>> 
>> 
>> 
>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>> wanted to install again. I don't know why bioperl 1.4 was one of the
>> prerequisites for 1.6.1. If I just install 1.4, it will be installed
> without
>> errors. But I need a newer version, because some modules (like
>> 
>> Bio::Tools::HMM) is not included in 1.4.
>> 
>> 
>> 
>> I saw on internet that somebody had the same problem when he was trying to
>> install BioPerl 1.5, but I didn't find the solution.
>> 
>> 
>> 
>> Anybody has a clue on that? Thank you for your time.
>> 
>> 
>> 
>> GH
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From ocornejo at gmail.com  Mon Jan 18 23:18:00 2010
From: ocornejo at gmail.com (Omar Eduardo Cornejo Ordaz)
Date: Mon, 18 Jan 2010 23:18:00 -0500
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
	<4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
	<5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>
Message-ID: <ddd346a41001182018o5952415fx7930d85a9430453@mail.gmail.com>

I see.
  thank you Scott and Chris.
  I had already installed the latest version of the Xcode Developer Tools.
  I will go the cpan way then.

have a nice one,
Omar

On Mon, Jan 18, 2010 at 10:58 PM, Chris Fields <cjfields at illinois.edu>wrote:

> Yes, definitely!
>
> -c
>
> On Jan 18, 2010, at 9:41 PM, Scott Cain wrote:
>
> > But make sure you have the developers tools installed before the first
> > time you run the cpan shell; it will make your life easier.
> >
> > Scott
> >
> >
> > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
> >>
> >>> Dear People,
> >>>  I have tried to install Bioperl in my new Mac Book, which carries
> >>> the latest perl distribution (5.10.0) and for some reason I can't
> >>> (using fink) make it recognize this version or perl.
> >>>  I have tried:
> >>> fink install bioperl-pm510
> >>> fink install bioperl-pm5100
> >>>
> >>> but neither one works.  Is it fine installing bioperl for perl v 5.9?
> >>>
> >>> thank you,
> >>> Omar Cornejo
> >>
> >> fink doesn't have a package for perl 5.10.  You can install it using
> CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX
> installation instructions on the wiki:
> >>
> >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
> >>
> >> chris
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From cjfields at illinois.edu  Mon Jan 18 22:58:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 21:58:36 -0600
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
	<4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
Message-ID: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>

Yes, definitely!

-c

On Jan 18, 2010, at 9:41 PM, Scott Cain wrote:

> But make sure you have the developers tools installed before the first
> time you run the cpan shell; it will make your life easier.
> 
> Scott
> 
> 
> On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
>> 
>>> Dear People,
>>>  I have tried to install Bioperl in my new Mac Book, which carries
>>> the latest perl distribution (5.10.0) and for some reason I can't
>>> (using fink) make it recognize this version or perl.
>>>  I have tried:
>>> fink install bioperl-pm510
>>> fink install bioperl-pm5100
>>> 
>>> but neither one works.  Is it fine installing bioperl for perl v 5.9?
>>> 
>>> thank you,
>>> Omar Cornejo
>> 
>> fink doesn't have a package for perl 5.10.  You can install it using CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX installation instructions on the wiki:
>> 
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From albezg at gmail.com  Mon Jan 18 19:54:49 2010
From: albezg at gmail.com (Alexandr Bezginov)
Date: Mon, 18 Jan 2010 19:54:49 -0500
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
 with negative PDB ranges
In-Reply-To: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
Message-ID: <4B5502D9.2010706@gmail.com>

I have contacted Pfam, and I have been told that The PDB file actually
does include a reference to residue "-1":

DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611

DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611


Since negative numbers are allowed in PDB, the data should probably be
considered valid.

There are quite a few records like this, so this is not an isolated issue.

Alexandr

On 1/14/2010 7:20 PM, Jason Stajich wrote:
> Seems like improper data really -- "-1" is an improper coordinate as far
> as the parser is concerned. You may want to tell Pfam that there is
> possible error in the dumper since that was the only record that had
> this problem?
> 
> -jason
> On Jan 13, 2010, at 5:57 PM, albezg wrote:
> 
>> Hi all,
>>
>> I have a problem using AlignIO to read Pfam database:
>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>> alignment OK until the alignment PF00331.13. There it crashes with the
>> following message:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: '1-344' is not an integer.
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>> STACK: Bio::Range::end
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>> STACK: Bio::Annotation::Target::new
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
>>
>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
>>
>> STACK: Bio::AlignIO::stockholm::next_aln
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>> -----------------------------------------------------------
>>
>> It appears this is caused by this entry:
>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>
>> I don't care about residues in PDB, so I have just removed minus signs
>> from the ranges. This seems to have fixed the crashing.
>>
>> Is it a known problem? Is there a solution for it?
>>
>> Thanks,
>> Alexandr
>>
>>
>> On 03/20/2009 05:09 PM, albezg wrote:
>>>
>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>> alignment(SimpleAlign).
>>>
>>> There are no issues when I print it, however when I use AlignIO to write
>>> the alignment to a FASTA file, it does not work. Is this behavior
>>> intended?
>>>
>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>
>>> The error:
>>> ------------- EXCEPTION -------------
>>> MSG: No sequence with name [1/1-11]
>>> STACK Bio::SimpleAlign::displayname
>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>> STACK Bio::AlignIO::fasta::write_aln
>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>> STACK toplevel ./demo.pl:14
>>> -------------------------------------
>>>
>>> Alexandr
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 


From ghhu at sibs.ac.cn  Mon Jan 18 21:22:19 2010
From: ghhu at sibs.ac.cn (Guohong Hu)
Date: Tue, 19 Jan 2010 02:22:19 -0000
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
Message-ID: <009801c8b957$2af4f8d0$80deea70$@ac.cn>

Thank for you all. The problem is solved. The bioperl 1.4 version is from
the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I
added all the repo according to the bioperl wiki instruction, somehow 1.4
became a prerequisite for 1.6. But Chris's question reminded me, so I
removed Trouchelle repo, and the installation proceeded without errors. I
suggested we put a note in the wiki link since it looks like an odd issue
not just for me.

Best,
Guohong


_________________________________________
??????: Chris Fields [mailto:cjfields at illinois.edu] 
????????: 2010??1??18?? 23:30
??????: Guohong Hu
????: bioperl-l at lists.open-bio.org
????: Re: [Bioperl-l] Bioperl 1.6

Guohong, 

1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed
first.  Make sure the repos are set according to the Windows installation
instructions on the BioPerl wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

IIRC the actual order of the PPM repository can be critical (PPM pulls based
on highest version, first repo, but sometimes it gets confused).  Just
curious but where is the v 1.4 PPM located?  If it is local to our PPM repo
I can physically remove it to prevent this from happening.

chris

On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:

> Hi there,
> 
> 
> 
> I was trying to install BioPerl in windows using ppm, by following the
> instruction in
> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
> the repositories, and did the search of Bioperl packages. The latest
version
> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
> install it, a number of prerequisite modules were being installed too,
which
> include Bioperl 1.4. Then an error message showed up during installation:
> 
> 
> 
> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
> BioPerl has already installed a file that package bioperl wants to
install."
> 
> 
> 
> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
> wanted to install again. I don't know why bioperl 1.4 was one of the
> prerequisites for 1.6.1. If I just install 1.4, it will be installed
without
> errors. But I need a newer version, because some modules (like
> 
> Bio::Tools::HMM) is not included in 1.4.
> 
> 
> 
> I saw on internet that somebody had the same problem when he was trying to
> install BioPerl 1.5, but I didn't find the solution.
> 
> 
> 
> Anybody has a clue on that? Thank you for your time.
> 
> 
> 
> GH
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jw12 at sanger.ac.uk  Tue Jan 19 05:41:12 2010
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 19 Jan 2010 10:41:12 +0000
Subject: [Bioperl-l] DAS Workshop Registrations now Open (workshop date 7-9
	April 2010)
Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk>

If you don't know about DAS and wish to know how to distribute your  
latest biological annotation to the world then the upcoming DAS  
workshop maybe for you.
If you know about DAS and are maybe a DAS client developer then the  
upcoming DAS workshop is for you (as you will need to know about the  
upcoming DAS 1.6 Specification and how it may affect your software).

For information on the workshop and registration please go to:

http://www.ebi.ac.uk/training/handson/DAS_070410.html


Jonathan Warren
Senior Developer and DAS coordinator
jw12 at sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 

From SMarkel at accelrys.com  Tue Jan 19 13:00:22 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Tue, 19 Jan 2010 10:00:22 -0800
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
 Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>

Dan,

Life Tech has sample data for E. coli at

http://solidsoftwaretools.com/gf/project/ecoli2x50/

and

http://solidsoftwaretools.com/gf/project/dh10bfrag/.

Reference sequences are included.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
Sent: Monday, 18 January 2010 6:48 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA

Hi All,

A wrapper and output parser for bowtie 'ultrafast, memory-efficient
short read aligner' are now available in the bioperl-live and
bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
bioperl-run/trunk at 16726). Bowtie details are available here:

http://bowtie-bio.sourceforge.net/index.shtml

The modules can return a Bio::Assembly::Scaffold object (operating via
the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
uses large amounts of memory - the test suite works for me with >=2GB
but not with 1GB due to this. (Is there a disk file system based tool
for this for large projects?)

Bowtie (>0.12.0) can align in colour space, but this is not currently
supported by the wrapper though it should not be difficult to add. If
someone can point me to a small set of colour space reads and a
reference sequence I will be able to use these for testing.

Thanks to the core devs for helping me with many of my problems in
putting this together.

Dan

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Tue Jan 19 16:18:20 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 07:48:20 +1030
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
 Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
	<5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>
Message-ID: <1263935900.4813.0.camel@epistle>

Great.

Thanks, Scott.

Dan

On Tue, 2010-01-19 at 10:00 -0800, Scott Markel wrote:
> Dan,
> 
> Life Tech has sample data for E. coli at
> 
> http://solidsoftwaretools.com/gf/project/ecoli2x50/
> 
> and
> 
> http://solidsoftwaretools.com/gf/project/dh10bfrag/.
> 
> Reference sequences are included.
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>     International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
> Sent: Monday, 18 January 2010 6:48 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA
> 
> Hi All,
> 
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
> 
> http://bowtie-bio.sourceforge.net/index.shtml
> 
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
> 
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
> 
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
> 
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Wed Jan 20 00:32:05 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 16:02:05 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
Message-ID: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>

Hi Chris (or others),

I've been looking at ways to do large assemblies (really rnaseq/readseq
comparisons for coverage) with maq/bowtie output and it's clear that for
the size of project that I'm working on the space complexity is too
nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
go.

I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF

This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
read through the docs, and it's not entirely clear (I'm hoping I've
interpreted it the right way), but does this result in the return of
features such that overlapping features are returned as a single feature
while non-overlapping features come back separately. If this is the
case, it would satisfy my requirements perfectly.

thanks for your time
Dan


From jason at bioperl.org  Wed Jan 20 01:35:24 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 19 Jan 2010 22:35:24 -0800
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>

Are you looking at the bowtie features file or the SAM?
-jason
On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote:

> Hi Chris (or others),
>
> I've been looking at ways to do large assemblies (really rnaseq/ 
> readseq
> comparisons for coverage) with maq/bowtie output and it's clear that  
> for
> the size of project that I'm working on the space complexity is too
> nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> go.
>
> I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF ->  
> B:DB:GFF
>
> This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> read through the docs, and it's not entirely clear (I'm hoping I've
> interpreted it the right way), but does this result in the return of
> features such that overlapping features are returned as a single  
> feature
> while non-overlapping features come back separately. If this is the
> case, it would satisfy my requirements perfectly.
>
> thanks for your time
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From dan.kortschak at adelaide.edu.au  Wed Jan 20 02:19:05 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 17:49:05 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
	<C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>
Message-ID: <1263971945.4582.2.camel@epistle>

It doesn't really matter, they are largely inter-convertible. The
problem is not really the upstream processing, but the aggregation of
reads into read-assigned regions (unless I've misunderstood your
question).

Dan

On Tue, 2010-01-19 at 22:35 -0800, Jason Stajich wrote:
> Are you looking at the bowtie features file or the SAM?
> -jason
> On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote:
> 
> > Hi Chris (or others),
> >
> > I've been looking at ways to do large assemblies (really rnaseq/ 
> > readseq
> > comparisons for coverage) with maq/bowtie output and it's clear that  
> > for
> > the size of project that I'm working on the space complexity is too
> > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> > go.
> >
> > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF ->  
> > B:DB:GFF
> >
> > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> > read through the docs, and it's not entirely clear (I'm hoping I've
> > interpreted it the right way), but does this result in the return of
> > features such that overlapping features are returned as a single  
> > feature
> > while non-overlapping features come back separately. If this is the
> > case, it would satisfy my requirements perfectly.
> >
> > thanks for your time
> > Dan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/

-- 
Dan Kortschak <dan.kortschak at adelaide.edu.au>


From ajmackey at gmail.com  Wed Jan 20 07:59:38 2010
From: ajmackey at gmail.com (Aaron Mackey)
Date: Wed, 20 Jan 2010 07:59:38 -0500
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>

I would advise using BEDtools or the R IRanges package for this kind of
aggregation/merging work, rather than trying to reinvent this particular
wheel.

-Aaron

On Wed, Jan 20, 2010 at 12:32 AM, Dan Kortschak <
dan.kortschak at adelaide.edu.au> wrote:

> Hi Chris (or others),
>
> I've been looking at ways to do large assemblies (really rnaseq/readseq
> comparisons for coverage) with maq/bowtie output and it's clear that for
> the size of project that I'm working on the space complexity is too
> nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> go.
>
> I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF
>
> This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> read through the docs, and it's not entirely clear (I'm hoping I've
> interpreted it the right way), but does this result in the return of
> features such that overlapping features are returned as a single feature
> while non-overlapping features come back separately. If this is the
> case, it would satisfy my requirements perfectly.
>
> thanks for your time
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From dan.kortschak at adelaide.edu.au  Wed Jan 20 16:16:39 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 21 Jan 2010 07:46:39 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
	<24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>
Message-ID: <1264022199.4688.29.camel@epistle>

Thanks for that, I'll look into those. BEDtools looks like what I want.

cheers
Dan

On Wed, 2010-01-20 at 07:59 -0500, Aaron Mackey wrote:
> I would advise using BEDtools or the R IRanges package for this kind
> of aggregation/merging work, rather than trying to reinvent this
> particular wheel.
> 
> -Aaron


From biopython at maubp.freeserve.co.uk  Thu Jan 21 07:33:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 21 Jan 2010 12:33:53 +0000
Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in
	BioSQL
Message-ID: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>

Hi all,

This is cross posted to try and ensure relevant people see it.
I suggest we continue the discussion on the BioSQL list
(for how to serialise structured annotation to BioSQL), and/or
the OpenBio list (for things like file format naming conventions).

I am hoping we (Bio*) can be consistent in how we parse and load
into BioSQL the SwissProt DE lines (known as "swiss" format in
both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
equivalent UniProt XML tags (which we are tentatively going to
call the "uniprot" format in Biopython's SeqIO - comments?).

Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
files and load them into BioSQL. Biopython currently treats the DE
comment lines as a long string, as BioPerl used to:

http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html

I understand that BioPerl now turns the SwissProt DE lines into a
TagTree, and for storing this in BioSQL this gets serialised as XML.
I would like Biopython to handle this the same way (although rather
than a Perl TagTree, we'd use a Python structure of course), and
would appreciate clarification of what exactly was implemented
(e.g. which bit of the BioPerl source code should be look at,
and could you show a worked example?).

Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
Open-Bio lists yet) has started work on parsing UniProt XML
files for Biopython. Here the DE comment lines are already
provided broken up with XML markup. Hopefully their nested
structure matches what BioPerl was doing with the SwissProt
DE lines.

Regards,

Peter

From cjfields at illinois.edu  Thu Jan 21 08:34:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Jan 2010 07:34:12 -0600
Subject: [Bioperl-l] [Open-bio-l] SwissProt DE lines and UniProt XML /
	TagTree as XML in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <A6F5F623-2750-4BB0-91F7-5A87BABE367B@illinois.edu>

Peter,

The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag:

http://search.cpan.org/~cmungall/Data-Stag-0.11/Data/Stag.pm

This is where the text output is derived from.  It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable.  We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback. 

If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.).  

chris

On Jan 21, 2010, at 6:33 AM, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter


From sharmashalu.bio at gmail.com  Thu Jan 21 09:25:44 2010
From: sharmashalu.bio at gmail.com (shalu sharma)
Date: Thu, 21 Jan 2010 09:25:44 -0500
Subject: [Bioperl-l] sequence orientation
Message-ID: <465b5a661001210625j3d84a165u69d8c8d21d2fe7ac@mail.gmail.com>

Hi All,
         This is not a perl/bioperl query but i thought that its a best
place to ask.
I have some pyro reads ( from CAMERA) and i want to find out their 5' and 3'
ends. Is there any way i can do this?

I would really appreciate if anyone can help me out.

Thanks
Shalu

From rtbio.2009 at gmail.com  Thu Jan 21 13:28:43 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Thu, 21 Jan 2010 19:28:43 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <196889DF87964224ACDB948681BA7F86@NewLife>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<4C2E8133F916495B876628EF3E8FCBB2@NewLife>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
Message-ID: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>

Hello Mark,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");

while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;
   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;
              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen <maj at fortinbras.us> wrote:

>  Excellent Roopa- it's my pleasure-- MAJ
>
> ----- Original Message -----
> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
> *To:* Mark A. Jensen <maj at fortinbras.us>
> *Sent:* Saturday, January 09, 2010 6:41 PM
> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>
> Hi Mark,
>
> Thank you very very much. The code is working now. Thanks for the support
> and time you have spent on me.
>
> Thanks in advance
> Roopa.
>
> On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>>  There is still a bug with the double quotes. Use "$organ\[ORGN]", which
>> prevents perl from
>> looking for a member of an array called @organ. This would have shown up
>> if 'use strict;' had
>> been in place. Still don't know whether this would work precisely; can you
>> send me the query
>> sequence so I can reproduce your ouput?
>> thanks MAJ
>>
>>  ----- Original Message -----
>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>> *Sent:* Saturday, January 09, 2010 2:02 PM
>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>
>> Hi Mark,
>>
>> I tried it with double quotes but still i got the same o/p with sequences
>> from different species.
>>
>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...  1813
>> 0.0
>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...  1622
>> 0.0
>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...   773
>> 0.0
>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...   749
>> 0.0
>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   542
>> 2e-151
>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...   196
>> 3e-47
>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...   190
>> 1e-45
>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...   181
>> 7e-43
>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...   179
>> 2e-42
>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   178
>> 8e-42
>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...   170
>> 1e-39
>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...   169
>> 4e-39
>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...   167
>> 1e-38
>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...   150
>> 1e-33
>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...   145
>> 5e-32
>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...   143
>> 2e-31
>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...   143
>> 2e-31
>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...   138
>> 7e-30
>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...   138
>> 7e-30
>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...   136
>> 2e-29
>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...   136
>> 2e-29
>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...   134
>> 9e-29
>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...   132
>> 3e-28
>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...   132
>> 3e-28
>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...   132
>> 3e-28
>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...   131
>> 1e-27
>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...   131
>> 1e-27
>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...   129
>> 4e-27
>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...   129
>> 4e-27
>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...   129
>> 4e-27
>> ref|NM_001003470.1|  Danio rerio protein kinase, cAMP-dependen...   129
>> 4e-27
>> ref|XM_001141503.1|  PREDICTED: Pan troglodytes verus protein ...   127
>> 1e-26
>> ref|XM_001145269.1|  PREDICTED: Pan troglodytes protein kinase...   127
>> 1e-26
>> ref|XM_512434.2|  PREDICTED: Pan troglodytes cAMP-dependent pr...   127
>> 1e-26
>> ref|XM_001171457.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_001171437.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_847420.1|  PREDICTED: Canis familiaris similar to Serin...   127
>> 1e-26
>> ref|NM_207518.1|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>> ref|NM_002730.3|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>>
>>
>> Thanks in advance.
>>
>> Roopa.
>>
>> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>>  I understand you. Put in the double quotes and see what happens.
>>>
>>>  ----- Original Message -----
>>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>> *To:* Mark A. Jensen <maj at fortinbras.us>
>>>   *Sent:* Saturday, January 09, 2010 1:40 PM
>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>
>>> Hi Mark,
>>>
>>> Thanks for your reply. It was working when I specifically use the name of
>>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a
>>> $organ which takes the organism given by the user i.e., let it be anything
>>>
>>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc.,  I should get the
>>> sequences related to only those organisms.
>>>
>>> i.e., If the user enters Pseudomonas,the $organ parameter of the code
>>> takes Pseudomonas ,does BLAST and returns only those sequences that produce
>>> significant alignment with Pseudomonas(only).But this is not happening like
>>> that .
>>>
>>> Please help me in this regard.
>>>
>>> Thanks in advance
>>> Roopa
>>>
>>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>
>>>>  Hi Roopa-- You may get what you want if you make the change.
>>>> With single quotes, ENTREZ_QUERY is set to the literal string
>>>>
>>>>  $organ[ORGN]
>>>>
>>>> while, with double quotes, the variable value will be substituted,
>>>> and the parameter should be set to
>>>>
>>>>  Trypanosoma brucei[ORGN]
>>>>
>>>> I'm guess that it worked because the database ignored the strange
>>>> parameter,
>>>> and returned all the matches. Try this and if it doesn't work I look
>>>> harder.
>>>> cheers,
>>>> Mark
>>>>
>>>>  ----- Original Message -----
>>>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>>>> *Sent:* Saturday, January 09, 2010 1:24 PM
>>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>>
>>>> hello Mark,
>>>>
>>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in
>>>> double quotations,but. I would like to have only those specific sequences
>>>> which are specific for my Organism i.e., I need sequences only from the
>>>> organism that I entered.
>>>>
>>>> When the organism is Trypanosoma brucei,I could get even Leishmania and
>>>> other species as the similar sequences. But I want to get only trypanosoma
>>>> brucei sequences.
>>>>
>>>> Could you please help me out in this regard?
>>>>
>>>> Roopa.
>>>>
>>>> My output
>>>>
>>>> I/P organism: Trypanosoma brucei
>>>>
>>>> O/P:-
>>>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1813    0.0
>>>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1622    0.0
>>>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 773    0.0
>>>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 749    0.0
>>>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 542    2e-151
>>>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...
>>>> 196    3e-47
>>>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 190    1e-45
>>>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...
>>>> 181    7e-43
>>>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 179    2e-42
>>>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 178    8e-42
>>>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 170    1e-39
>>>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...
>>>> 168    4e-39
>>>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...
>>>> 167    1e-38
>>>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 150    1e-33
>>>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...
>>>> 145    5e-32
>>>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...
>>>> 143    2e-31
>>>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...
>>>> 143    2e-31
>>>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...
>>>> 138    7e-30
>>>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...
>>>> 138    7e-30
>>>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...
>>>> 136    2e-29
>>>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...
>>>> 136    2e-29
>>>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...
>>>> 134    9e-29
>>>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...
>>>> 132    3e-28
>>>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...
>>>> 132    3e-28
>>>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...
>>>> 132    3e-28
>>>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...
>>>> 131    1e-27
>>>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 131    1e-27
>>>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...
>>>> 129    4e-27
>>>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...
>>>> 129    4e-27
>>>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...
>>>> 129    4e-27
>>>>
>>>> Roopa.
>>>>
>>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>>
>>>>> I see it immediately (from making same bug many times) :
>>>>>
>>>>>
>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>> =>
>>>>> - '$organ[ORGN]');
>>>>> +"$organ[ORGN]");
>>>>>
>>>>>
>>>>> MAJ
>>>>>
>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>> rtbio.2009 at gmail.com>
>>>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>>>> Cc: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Saturday, January 09, 2010 11:57 AM
>>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>>
>>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei
>>>>>> as
>>>>>> the organism parameter,but when I tried to use the Organism parameter
>>>>>> from
>>>>>> the user,it was not working i.e., I was unable to get the target
>>>>>> sequences.
>>>>>> Please help me in this regard. My code is
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>>
>>>>>> #path for extra camel module
>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>> use Roopablast;
>>>>>>
>>>>>>
>>>>>> use Bio::SearchIO;
>>>>>> use Bio::Search::Result::BlastResult;
>>>>>> use Bio::Perl;
>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::DB::GenBank;
>>>>>>
>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>> my $outstring ="";
>>>>>>
>>>>>> &parse_form;
>>>>>>
>>>>>> print "Content-type: text/html\n\n";
>>>>>> print "<HTML>\n";
>>>>>> print "<head><title>RNAi Result</title>";
>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>> print "</head>\n";
>>>>>> print "<body>\n";
>>>>>> print " Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>> print "</BODY>\n";
>>>>>> print "</HTML>\n";
>>>>>>
>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>> exit if $pid;
>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>
>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>
>>>>>> print OUTFILE "<HTML>\n
>>>>>> <head><title>RNAi Result</title>
>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>> </head>\n
>>>>>> <body>\n
>>>>>>  Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>> wait......<br>
>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>> </BODY>\n
>>>>>> </HTML>\n";
>>>>>>
>>>>>> close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
>>>>>>
>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>
>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>> $in{'Threshold'});
>>>>>>
>>>>>>
>>>>>> sub blastcode
>>>>>> {
>>>>>>
>>>>>> $inpu1= $_[0];
>>>>>>
>>>>>> $organ= $_[1];
>>>>>>
>>>>>> open(NUC,'>',$nuc);
>>>>>> print NUC $inpu1,"\n";
>>>>>> close(NUC);
>>>>>>
>>>>>> my $prog = 'blastn';
>>>>>> my $db   = 'refseq_rna';
>>>>>> my $e_val= '1e-10';
>>>>>> my $organism= $organ;
>>>>>>
>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>
>>>>>> my @params = ( '-prog' => $prog,
>>>>>>        '-data' => $db,
>>>>>>        '-expect' => $e_val,
>>>>>>        '-readmethod' => 'SearchIO',
>>>>>>       '-Organism'   => $organism );
>>>>>>
>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>            print OUTFILE $inpu1;
>>>>>>             close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>>> =>
>>>>>> '$organ[ORGN]');
>>>>>>
>>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>
>>>>>>  #change a paramter
>>>>>>
>>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>> Brucei[ORGN]';
>>>>>>
>>>>>> #change a paramter
>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>> '$input2[ORGN]';
>>>>>>
>>>>>>  my $v = 1;
>>>>>>  #$v is just to turn on and off the messages
>>>>>>
>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>> '-organism' => $organ );
>>>>>>
>>>>>>
>>>>>> while (my $input = $str->next_seq())
>>>>>> {
>>>>>>  #Blast a sequence against a database:
>>>>>>   #Alternatively, you could  pass in a file with many
>>>>>>   #sequences rather than loop through sequence one at a time
>>>>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>   #and swap the two lines below for an example of that.
>>>>>>
>>>>>>            #open(OUTFILE,'>',$debugfile);
>>>>>>             # print OUTFILE $input;
>>>>>>             #close(OUTFILE);
>>>>>>
>>>>>>
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>
>>>>>>               open(OUTFILE,'>',$debugfile);
>>>>>>            #   print OUTFILE $r;
>>>>>>               close(OUTFILE);
>>>>>>
>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>  #   open(OUTFILE,'>',$debugfile);
>>>>>>   #           print OUTFILE "while entered";
>>>>>>    #         close(OUTFILE);
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>
>>>>>>     #         open(OUTFILE,'>',$debugfile);
>>>>>>      #        print OUTFILE "foreach entered";
>>>>>>       #      close(OUTFILE);
>>>>>>
>>>>>>       my $rc = $factory->retrieve_blast($rid);
>>>>>>
>>>>>>       if( !ref($rc) )
>>>>>>       {
>>>>>>       if( $rc < 0 )
>>>>>>       {
>>>>>>       $factory->remove_rid($rid);
>>>>>>       }
>>>>>>        open(OUTFILE,'>',$debugfile);
>>>>>>        #      print OUTFILE "if entered";
>>>>>>             close(OUTFILE);
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>       }
>>>>>>      else {
>>>>>>         #    open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "else entered";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>         my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>         print BLASTDEBUGFILE $result->next_hit();
>>>>>>         close(BLASTDEBUGFILE);
>>>>>>
>>>>>>       my $filename =
>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>
>>>>>>        # open(DEBUGFILE,'>',$debugfile);
>>>>>>        # open(new,'>',$filename);
>>>>>>        # @arra=<new>;
>>>>>>        # print DEBUGFILE @arra;
>>>>>>        # close(DEBUGFILE);
>>>>>>        # close(new);
>>>>>>
>>>>>>        $factory->save_output($filename);
>>>>>>  # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>      # close(BLASTDEBUGFILE);
>>>>>>
>>>>>>      $factory->remove_rid($rid);
>>>>>>
>>>>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>      print BLASTDEBUGFILE  $organism;
>>>>>>       close(BLASTDEBUGFILE);
>>>>>>
>>>>>>   # open(OUTFILE,'>',$outfile);
>>>>>>   # print OUTFILE "Test2 $result->database_name()";
>>>>>>   # close(OUTFILE);
>>>>>>
>>>>>> #$hit = $result->next_hit;
>>>>>> #open(new,'>',$debugfile);
>>>>>> #print $hit;
>>>>>> #close(new);
>>>>>>
>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>
>>>>>>           next unless ( $v > 0);
>>>>>>
>>>>>>         #     open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "$hit in while hits";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>>>>                 push(@seqs,$dna);
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>>  }
>>>>>>
>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>  #print OUTFILE $seqs[0];
>>>>>>  #close(OUTFILE);
>>>>>>
>>>>>> return(@seqs);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Roopa.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Roopa--
>>>>>>>
>>>>>>> I got your code to work with the following changes:
>>>>>>>
>>>>>>> +# the input should be a valid FASTA file...
>>>>>>> ...
>>>>>>> open(NUC,'>',$nuc);
>>>>>>> +print NUC ">seq (need a name line for valid fasta)\n";
>>>>>>> print NUC $inpu1, "\n";
>>>>>>> close(NUC);
>>>>>>> ...
>>>>>>>
>>>>>>> +# you can set these header parms in the call itself...
>>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params,
>>>>>>> -ENTREZ_QUERY =>
>>>>>>> ''Trypanosoma Brucei[ORGN]');
>>>>>>>
>>>>>>>  #change a paramter
>>>>>>> +# commented this out...
>>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>> 'Trypanosoma
>>>>>>> Brucei[ORGN]';
>>>>>>>
>>>>>>> MAJ
>>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>>>> rtbio.2009 at gmail.com
>>>>>>> >
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, January 08, 2010 10:00 AM
>>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>>>>>>
>>>>>>>
>>>>>>>  Hello all,
>>>>>>>
>>>>>>>>
>>>>>>>> I was trying Remote blast using Bioperl. My input data is a
>>>>>>>> Trypanosoma
>>>>>>>> brucei sequence in Fasta format. When I was trying to submit to
>>>>>>>> BLAST
>>>>>>>> using
>>>>>>>> the step
>>>>>>>> $r=$factory->submit_blast($input)
>>>>>>>> It was not returning anything which I checked by debugging the code.
>>>>>>>> It is
>>>>>>>> not blasting my input sequence even though I mentioned all the
>>>>>>>> parameters.I
>>>>>>>> would paste the code below.
>>>>>>>>
>>>>>>>> Please help me in solving put this problem. It is very urgent.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Roopa.
>>>>>>>>
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #path for extra camel module
>>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>>>> use Roopablast;
>>>>>>>>
>>>>>>>>
>>>>>>>> use Bio::SearchIO;
>>>>>>>> use Bio::Search::Result::BlastResult;
>>>>>>>> use Bio::Perl;
>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>> use Bio::Seq;
>>>>>>>> use Bio::SeqIO;
>>>>>>>> use Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>> my $outstring ="";
>>>>>>>>
>>>>>>>> &parse_form;
>>>>>>>>
>>>>>>>> print "Content-type: text/html\n\n";
>>>>>>>> print "<HTML>\n";
>>>>>>>> print "<head><title>RNAi Result</title>";
>>>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>>>> print "</head>\n";
>>>>>>>> print "<body>\n";
>>>>>>>> print " Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>>>> print "</BODY>\n";
>>>>>>>> print "</HTML>\n";
>>>>>>>>
>>>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>>>> exit if $pid;
>>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>>>> </head>\n
>>>>>>>> <body>\n
>>>>>>>>  Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>>>> wait......<br>
>>>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>>>> </BODY>\n
>>>>>>>> </HTML>\n";
>>>>>>>>
>>>>>>>> close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> @compseqs = blastcode($in{'Inputseq'});
>>>>>>>>
>>>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>>>
>>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>>>> $in{'Threshold'});
>>>>>>>>
>>>>>>>>
>>>>>>>> sub blastcode
>>>>>>>> {
>>>>>>>>
>>>>>>>> $inpu1= $_[0];
>>>>>>>>
>>>>>>>> #$organ= $_[1];
>>>>>>>>
>>>>>>>> open(NUC,'>',$nuc);
>>>>>>>> print NUC $inpu1;
>>>>>>>> close(NUC);
>>>>>>>>
>>>>>>>> my $prog = 'blastn';
>>>>>>>> my $db   = 'refseq_rna';
>>>>>>>> my $e_val= '1e-10';
>>>>>>>> my $organism= 'Trypanosoma Brucei';
>>>>>>>>
>>>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>       '-data' => $db,
>>>>>>>>       '-expect' => $e_val,
>>>>>>>>       '-readmethod' => 'SearchIO',
>>>>>>>>       '-Organism'   => $organism );
>>>>>>>>
>>>>>>>>          # open(OUTFILE,'>',$debugfile);
>>>>>>>>           #  print OUTFILE @params;
>>>>>>>>           # close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>
>>>>>>>>  #change a paramter
>>>>>>>>
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>>>> Brucei[ORGN]';
>>>>>>>>
>>>>>>>> #change a paramter
>>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>>> '$input2[ORGN]';
>>>>>>>>
>>>>>>>>  my $v = 1;
>>>>>>>>  #$v is just to turn on and off the messages
>>>>>>>>
>>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>>>> '-organism' => 'Trypanosoma Brucei' );
>>>>>>>>
>>>>>>>>
>>>>>>>> while (my $input = $str->next_seq())
>>>>>>>> {
>>>>>>>>  #Blast a sequence against a database:
>>>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>>>  #and swap the two lines below for an example of that.
>>>>>>>>
>>>>>>>>           open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE $input;
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  my $r = $factory->submit_blast($input);    #The program stops here
>>>>>>>> it
>>>>>>>> does not return any value and it does not enter the While
>>>>>>>> loop,Please help
>>>>>>>> me in this regard.#
>>>>>>>>              open(OUTFILE,'>',$debugfile);
>>>>>>>>              print OUTFILE $r;
>>>>>>>>              close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>>>
>>>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>>>    open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "while entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>   foreach my $rid ( @rids ) {
>>>>>>>>
>>>>>>>>             open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "foreach entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>>>
>>>>>>>>      if( !ref($rc) )
>>>>>>>>      {
>>>>>>>>      if( $rc < 0 )
>>>>>>>>      {
>>>>>>>>      $factory->remove_rid($rid);
>>>>>>>>      }
>>>>>>>>       open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "if entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>       print STDERR "." if ( $v > 0 );
>>>>>>>>       sleep 5;
>>>>>>>>      }
>>>>>>>>     else {
>>>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "else entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>        my $result = $rc->next_result();
>>>>>>>>       #save the output
>>>>>>>>      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>        print BLASTDEBUGFILE $result->next_hit();
>>>>>>>>        close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>      my $filename =
>>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>>>
>>>>>>>>       # open(DEBUGFILE,'>',$debugfile);
>>>>>>>>       # open(new,'>',$filename);
>>>>>>>>       # @arra=<new>;
>>>>>>>>       # print DEBUGFILE @arra;
>>>>>>>>       # close(DEBUGFILE);
>>>>>>>>       # close(new);
>>>>>>>>
>>>>>>>>       $factory->save_output($filename);
>>>>>>>>
>>>>>>>>     # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>>>     # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>>>     # close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>     $factory->remove_rid($rid);
>>>>>>>>
>>>>>>>>     open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>     print BLASTDEBUGFILE  $organism;
>>>>>>>>      close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>  # open(OUTFILE,'>',$outfile);
>>>>>>>>  # print OUTFILE "Test2 $result->database_name()";
>>>>>>>>  # close(OUTFILE);
>>>>>>>>
>>>>>>>> #$hit = $result->next_hit;
>>>>>>>> #open(new,'>',$debugfile);
>>>>>>>> #print $hit;
>>>>>>>> #close(new);
>>>>>>>>
>>>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>>>
>>>>>>>>          next unless ( $v > 0);
>>>>>>>>
>>>>>>>>        #     open(OUTFILE,'>',$debugfile);
>>>>>>>>         #    print OUTFILE "$hit in while hits";
>>>>>>>>          #  close(OUTFILE);
>>>>>>>>
>>>>>>>>     my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>>>         my $dna = $sequ->seq();        # get the sequence as a
>>>>>>>> string
>>>>>>>>                push(@seqs,$dna);
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>    }
>>>>>>>>  }
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>>>  #print OUTFILE $seqs[0];
>>>>>>>>  #close(OUTFILE);
>>>>>>>>
>>>>>>>> return(@seqs);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile) || die ;
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>>>>>>> <body>\n
>>>>>>>> <p><font face=\"Courier, monospace font set\">
>>>>>>>> Inputsequence: <br>";
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      print OUTFILE " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      print OUTFILE "<br>\n";
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> print OUTFILE "</font> <p>";
>>>>>>>>
>>>>>>>> $z=@compseqs;
>>>>>>>>
>>>>>>>> for($k=1;$k<$z;$k++) {
>>>>>>>>  print OUTFILE "<font face=\"Courier, monospace font
>>>>>>>> set\"><p>Compare
>>>>>>>> Sequence: <br>";
>>>>>>>>
>>>>>>>>  for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>>>>>>
>>>>>>>>      print OUTFILE substr ($compseqs[$k], $i, 1);
>>>>>>>>
>>>>>>>>      if ( ($i+1)%10==0){
>>>>>>>>          print OUTFILE " ";
>>>>>>>>      }
>>>>>>>>      if ( ($i+1)%60==0){
>>>>>>>>          print OUTFILE "<br>\n";
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  print OUTFILE "<p></font>";
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<p>
>>>>>>>> Window: <br>$in{'Windowsize'}
>>>>>>>> <p>
>>>>>>>> <p>
>>>>>>>> Threshold: <br>$in{'Threshold'}
>>>>>>>> <p>";
>>>>>>>> my $j=0;
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>>>>>>      if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>>>>>>          $j=$in{'Windowsize'};
>>>>>>>>      }
>>>>>>>>      $height=$out[$i]->{similar}*5;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ($j>0) {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>      $j--;
>>>>>>>>  }
>>>>>>>>  else {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      $outstring .= " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      $outstring .= "<br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%800==0){
>>>>>>>>      print OUTFILE "<br><br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>>>>>>> set\">$outstring</font>";
>>>>>>>>
>>>>>>>> #foreach (@out) {
>>>>>>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar}
>>>>>>>> matchs<p>";
>>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>>>>>>
>>>>>>>> #    }
>>>>>>>> #}
>>>>>>>>
>>>>>>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>>>>>>
>>>>>>>> close OUTFILE;
>>>>>>>>
>>>>>>>> #nameprint();
>>>>>>>>
>>>>>>>> sub parse_form {
>>>>>>>>  local ($buffer, @pairs, $pair, $name, $value);
>>>>>>>>  # Read in text
>>>>>>>>  $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>>>>>>  if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>>>>>>  {
>>>>>>>>      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>>>>>>  }
>>>>>>>>  else
>>>>>>>>  {
>>>>>>>>      $buffer = $ENV{'QUERY_STRING'};
>>>>>>>>  }
>>>>>>>>  @pairs = split(/&/, $buffer);
>>>>>>>>  foreach $pair (@pairs)
>>>>>>>>  {
>>>>>>>>      ($name, $value) = split(/=/, $pair);
>>>>>>>>      $value =~ tr/+/ /;
>>>>>>>>      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>>>>>>      $in{$name} = $value;
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

From bernd.web at gmail.com  Thu Jan 21 13:37:18 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 21 Jan 2010 19:37:18 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
	<c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
Message-ID: <716af09c1001211037p59b19a29l1967f1e514469e79@mail.gmail.com>

Hi,

Regarding RemoteBlast, my I add a query?
It seems that Bio::Tools::Run::RemoteBlast  is sending each sequence
seperately to the NCBI (at least in BP 1.5.2).
This means that for each Sequence a RID is to be checked. Is this
indeed the case?
The BLAST URL-API or batch interface supports sending multiple
sequences at once.

Regards,
Bernd

On Thu, Jan 21, 2010 at 7:28 PM, Roopa Raghuveer <rtbio.2009 at gmail.com> wrote:
> Hello Mark,
>
> This is Roopa again. I have a small problem again. I am working on Remote
> blast. The program works well. But the problem is this. ?The program
> accesses the server and gets the output correctly. I am trying to send the
> result sequences into an array and I found that always the first sequence
> among the Result sequences is missing. The code is
>
> ?my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => "$organ\[ORGN]");


From cjfields at illinois.edu  Thu Jan 21 23:31:25 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Jan 2010 22:31:25 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
In-Reply-To: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
Message-ID: <BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>

Jay,

Did you want to release it to CPAN?  I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight.

chris

On Jan 18, 2010, at 6:22 PM, Jay Hannah wrote:

> I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do.
> 
>   http://github.com/jhannah/bio-broodcomb
> 
> It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. 
> 
> The first two functions I stuck in the framework:
> 
> Find subsequences (Bio::BroodComb::SubSeq):
> 
>   use Bio::BroodComb;
>   my $bc = Bio::BroodComb->new();
>   $bc->load_large_seq(file => "large_seq.fasta");
>   $bc->load_small_seq(file => "small_seq.fasta");
>   $bc->find_subseqs();
>   print $bc->subseq_report1;
> 
> In-silico PCR (Bio::BroodComb::PCR):
> 
>  use Bio::BroodComb;
>  my $bc = Bio::BroodComb->new();
>  $bc->load_large_seq(file => "large_seq.fasta");
>  $bc->add_primerset(
>     description    => "U5/R",   # however you want it reported
>     forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA',
>     reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT',
>  );
>  $bc->find_pcr_hits();
>  $bc->find_pcr_products();
>  print $bc->pcr_report1;
> 
> I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever.
> 
> Suggestions, contributions welcome.   :)
> 
>   http://github.com/jhannah/bio-broodcomb
> 
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Fri Jan 22 01:17:14 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 21 Jan 2010 22:17:14 -0800
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
Message-ID: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>

I'm considering putting in allowable initialization parameter (and get/ 
set) for Bio::AlignIO that would allow setting of the alphabet.  This  
is then passed to Bio::LocatableSeq creation so that _guess_alphabet  
isn't called. This will allow removal of warnings about empty  
sequences because _guess_alphabet won't be called on a sequence if we  
have explictly set the alphabet.

This worked great on my local install and tests pass.  Any objections  
or concerns?

basically it means when you make an AlignIO you can specify the  
alphabet i.e.

my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
file => 'genome.fasaln');

I have some alignments with empty sequences and I think turning off  
the warnings is appropriate where I force the alphabet choice. It  
should also have a very modest speedup benefit too.

-jason
--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip

From rtbio.2009 at gmail.com  Fri Jan 22 04:54:32 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 22 Jan 2010 10:54:32 +0100
Subject: [Bioperl-l] Fwd:  Regarding blast in Bioperl
In-Reply-To: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
	<c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
Message-ID: <c7cac1601001220154r4f92651ejb79663898e0b8fc2@mail.gmail.com>

---------- Forwarded message ----------
From: Roopa Raghuveer <rtbio.2009 at gmail.com>
Date: Thu, Jan 21, 2010 at 7:28 PM
Subject: Re: [Bioperl-l] Regarding blast in Bioperl
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: bioperl-l at lists.open-bio.org


Hello Mark,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen <maj at fortinbras.us> wrote:

>  Excellent Roopa- it's my pleasure-- MAJ
>
> ----- Original Message -----
>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
> *To:* Mark A. Jensen <maj at fortinbras.us>
>  *Sent:* Saturday, January 09, 2010 6:41 PM
> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>
> Hi Mark,
>
> Thank you very very much. The code is working now. Thanks for the support
> and time you have spent on me.
>
> Thanks in advance
> Roopa.
>
> On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>>  There is still a bug with the double quotes. Use "$organ\[ORGN]", which
>> prevents perl from
>> looking for a member of an array called @organ. This would have shown up
>> if 'use strict;' had
>> been in place. Still don't know whether this would work precisely; can you
>> send me the query
>> sequence so I can reproduce your ouput?
>> thanks MAJ
>>
>>  ----- Original Message -----
>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>> *Sent:* Saturday, January 09, 2010 2:02 PM
>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>
>> Hi Mark,
>>
>> I tried it with double quotes but still i got the same o/p with sequences
>> from different species.
>>
>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...  1813
>> 0.0
>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...  1622
>> 0.0
>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...   773
>> 0.0
>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...   749
>> 0.0
>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   542
>> 2e-151
>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...   196
>> 3e-47
>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...   190
>> 1e-45
>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...   181
>> 7e-43
>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...   179
>> 2e-42
>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   178
>> 8e-42
>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...   170
>> 1e-39
>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...   169
>> 4e-39
>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...   167
>> 1e-38
>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...   150
>> 1e-33
>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...   145
>> 5e-32
>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...   143
>> 2e-31
>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...   143
>> 2e-31
>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...   138
>> 7e-30
>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...   138
>> 7e-30
>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...   136
>> 2e-29
>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...   136
>> 2e-29
>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...   134
>> 9e-29
>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...   132
>> 3e-28
>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...   132
>> 3e-28
>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...   132
>> 3e-28
>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...   131
>> 1e-27
>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...   131
>> 1e-27
>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...   129
>> 4e-27
>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...   129
>> 4e-27
>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...   129
>> 4e-27
>> ref|NM_001003470.1|  Danio rerio protein kinase, cAMP-dependen...   129
>> 4e-27
>> ref|XM_001141503.1|  PREDICTED: Pan troglodytes verus protein ...   127
>> 1e-26
>> ref|XM_001145269.1|  PREDICTED: Pan troglodytes protein kinase...   127
>> 1e-26
>> ref|XM_512434.2|  PREDICTED: Pan troglodytes cAMP-dependent pr...   127
>> 1e-26
>> ref|XM_001171457.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_001171437.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_847420.1|  PREDICTED: Canis familiaris similar to Serin...   127
>> 1e-26
>> ref|NM_207518.1|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>> ref|NM_002730.3|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>>
>>
>> Thanks in advance.
>>
>> Roopa.
>>
>> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>>  I understand you. Put in the double quotes and see what happens.
>>>
>>>  ----- Original Message -----
>>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>> *To:* Mark A. Jensen <maj at fortinbras.us>
>>>   *Sent:* Saturday, January 09, 2010 1:40 PM
>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>
>>> Hi Mark,
>>>
>>> Thanks for your reply. It was working when I specifically use the name of
>>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a
>>> $organ which takes the organism given by the user i.e., let it be anything
>>>
>>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc.,  I should get the
>>> sequences related to only those organisms.
>>>
>>> i.e., If the user enters Pseudomonas,the $organ parameter of the code
>>> takes Pseudomonas ,does BLAST and returns only those sequences that produce
>>> significant alignment with Pseudomonas(only).But this is not happening like
>>> that .
>>>
>>> Please help me in this regard.
>>>
>>> Thanks in advance
>>> Roopa
>>>
>>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>
>>>>  Hi Roopa-- You may get what you want if you make the change.
>>>> With single quotes, ENTREZ_QUERY is set to the literal string
>>>>
>>>>  $organ[ORGN]
>>>>
>>>> while, with double quotes, the variable value will be substituted,
>>>> and the parameter should be set to
>>>>
>>>>  Trypanosoma brucei[ORGN]
>>>>
>>>> I'm guess that it worked because the database ignored the strange
>>>> parameter,
>>>> and returned all the matches. Try this and if it doesn't work I look
>>>> harder.
>>>> cheers,
>>>> Mark
>>>>
>>>>  ----- Original Message -----
>>>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>>>> *Sent:* Saturday, January 09, 2010 1:24 PM
>>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>>
>>>> hello Mark,
>>>>
>>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in
>>>> double quotations,but. I would like to have only those specific sequences
>>>> which are specific for my Organism i.e., I need sequences only from the
>>>> organism that I entered.
>>>>
>>>> When the organism is Trypanosoma brucei,I could get even Leishmania and
>>>> other species as the similar sequences. But I want to get only trypanosoma
>>>> brucei sequences.
>>>>
>>>> Could you please help me out in this regard?
>>>>
>>>> Roopa.
>>>>
>>>> My output
>>>>
>>>> I/P organism: Trypanosoma brucei
>>>>
>>>> O/P:-
>>>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1813    0.0
>>>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1622    0.0
>>>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 773    0.0
>>>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 749    0.0
>>>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 542    2e-151
>>>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...
>>>> 196    3e-47
>>>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 190    1e-45
>>>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...
>>>> 181    7e-43
>>>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 179    2e-42
>>>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 178    8e-42
>>>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 170    1e-39
>>>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...
>>>> 168    4e-39
>>>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...
>>>> 167    1e-38
>>>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 150    1e-33
>>>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...
>>>> 145    5e-32
>>>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...
>>>> 143    2e-31
>>>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...
>>>> 143    2e-31
>>>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...
>>>> 138    7e-30
>>>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...
>>>> 138    7e-30
>>>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...
>>>> 136    2e-29
>>>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...
>>>> 136    2e-29
>>>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...
>>>> 134    9e-29
>>>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...
>>>> 132    3e-28
>>>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...
>>>> 132    3e-28
>>>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...
>>>> 132    3e-28
>>>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...
>>>> 131    1e-27
>>>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 131    1e-27
>>>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...
>>>> 129    4e-27
>>>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...
>>>> 129    4e-27
>>>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...
>>>> 129    4e-27
>>>>
>>>> Roopa.
>>>>
>>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>>
>>>>> I see it immediately (from making same bug many times) :
>>>>>
>>>>>
>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>> =>
>>>>> - '$organ[ORGN]');
>>>>> +"$organ[ORGN]");
>>>>>
>>>>>
>>>>> MAJ
>>>>>
>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>> rtbio.2009 at gmail.com>
>>>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>>>> Cc: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Saturday, January 09, 2010 11:57 AM
>>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>>
>>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei
>>>>>> as
>>>>>> the organism parameter,but when I tried to use the Organism parameter
>>>>>> from
>>>>>> the user,it was not working i.e., I was unable to get the target
>>>>>> sequences.
>>>>>> Please help me in this regard. My code is
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>>
>>>>>> #path for extra camel module
>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>> use Roopablast;
>>>>>>
>>>>>>
>>>>>> use Bio::SearchIO;
>>>>>> use Bio::Search::Result::BlastResult;
>>>>>> use Bio::Perl;
>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::DB::GenBank;
>>>>>>
>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>> my $outstring ="";
>>>>>>
>>>>>> &parse_form;
>>>>>>
>>>>>> print "Content-type: text/html\n\n";
>>>>>> print "<HTML>\n";
>>>>>> print "<head><title>RNAi Result</title>";
>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>> print "</head>\n";
>>>>>> print "<body>\n";
>>>>>> print " Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>> print "</BODY>\n";
>>>>>> print "</HTML>\n";
>>>>>>
>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>> exit if $pid;
>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>
>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>
>>>>>> print OUTFILE "<HTML>\n
>>>>>> <head><title>RNAi Result</title>
>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>> </head>\n
>>>>>> <body>\n
>>>>>>  Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>> wait......<br>
>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>> </BODY>\n
>>>>>> </HTML>\n";
>>>>>>
>>>>>> close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
>>>>>>
>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>
>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>> $in{'Threshold'});
>>>>>>
>>>>>>
>>>>>> sub blastcode
>>>>>> {
>>>>>>
>>>>>> $inpu1= $_[0];
>>>>>>
>>>>>> $organ= $_[1];
>>>>>>
>>>>>> open(NUC,'>',$nuc);
>>>>>> print NUC $inpu1,"\n";
>>>>>> close(NUC);
>>>>>>
>>>>>> my $prog = 'blastn';
>>>>>> my $db   = 'refseq_rna';
>>>>>> my $e_val= '1e-10';
>>>>>> my $organism= $organ;
>>>>>>
>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>
>>>>>> my @params = ( '-prog' => $prog,
>>>>>>        '-data' => $db,
>>>>>>        '-expect' => $e_val,
>>>>>>        '-readmethod' => 'SearchIO',
>>>>>>       '-Organism'   => $organism );
>>>>>>
>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>            print OUTFILE $inpu1;
>>>>>>             close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>>> =>
>>>>>> '$organ[ORGN]');
>>>>>>
>>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>
>>>>>>  #change a paramter
>>>>>>
>>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>> Brucei[ORGN]';
>>>>>>
>>>>>> #change a paramter
>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>> '$input2[ORGN]';
>>>>>>
>>>>>>  my $v = 1;
>>>>>>  #$v is just to turn on and off the messages
>>>>>>
>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>> '-organism' => $organ );
>>>>>>
>>>>>>
>>>>>> while (my $input = $str->next_seq())
>>>>>> {
>>>>>>  #Blast a sequence against a database:
>>>>>>   #Alternatively, you could  pass in a file with many
>>>>>>   #sequences rather than loop through sequence one at a time
>>>>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>   #and swap the two lines below for an example of that.
>>>>>>
>>>>>>            #open(OUTFILE,'>',$debugfile);
>>>>>>             # print OUTFILE $input;
>>>>>>             #close(OUTFILE);
>>>>>>
>>>>>>
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>
>>>>>>               open(OUTFILE,'>',$debugfile);
>>>>>>            #   print OUTFILE $r;
>>>>>>               close(OUTFILE);
>>>>>>
>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>  #   open(OUTFILE,'>',$debugfile);
>>>>>>   #           print OUTFILE "while entered";
>>>>>>    #         close(OUTFILE);
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>
>>>>>>     #         open(OUTFILE,'>',$debugfile);
>>>>>>      #        print OUTFILE "foreach entered";
>>>>>>       #      close(OUTFILE);
>>>>>>
>>>>>>       my $rc = $factory->retrieve_blast($rid);
>>>>>>
>>>>>>       if( !ref($rc) )
>>>>>>       {
>>>>>>       if( $rc < 0 )
>>>>>>       {
>>>>>>       $factory->remove_rid($rid);
>>>>>>       }
>>>>>>        open(OUTFILE,'>',$debugfile);
>>>>>>        #      print OUTFILE "if entered";
>>>>>>             close(OUTFILE);
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>       }
>>>>>>      else {
>>>>>>         #    open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "else entered";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>         my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>         print BLASTDEBUGFILE $result->next_hit();
>>>>>>         close(BLASTDEBUGFILE);
>>>>>>
>>>>>>       my $filename =
>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>
>>>>>>        # open(DEBUGFILE,'>',$debugfile);
>>>>>>        # open(new,'>',$filename);
>>>>>>        # @arra=<new>;
>>>>>>        # print DEBUGFILE @arra;
>>>>>>        # close(DEBUGFILE);
>>>>>>        # close(new);
>>>>>>
>>>>>>        $factory->save_output($filename);
>>>>>>  # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>      # close(BLASTDEBUGFILE);
>>>>>>
>>>>>>      $factory->remove_rid($rid);
>>>>>>
>>>>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>      print BLASTDEBUGFILE  $organism;
>>>>>>       close(BLASTDEBUGFILE);
>>>>>>
>>>>>>   # open(OUTFILE,'>',$outfile);
>>>>>>   # print OUTFILE "Test2 $result->database_name()";
>>>>>>   # close(OUTFILE);
>>>>>>
>>>>>> #$hit = $result->next_hit;
>>>>>> #open(new,'>',$debugfile);
>>>>>> #print $hit;
>>>>>> #close(new);
>>>>>>
>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>
>>>>>>           next unless ( $v > 0);
>>>>>>
>>>>>>         #     open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "$hit in while hits";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>>>>                 push(@seqs,$dna);
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>>  }
>>>>>>
>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>  #print OUTFILE $seqs[0];
>>>>>>  #close(OUTFILE);
>>>>>>
>>>>>> return(@seqs);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Roopa.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Roopa--
>>>>>>>
>>>>>>> I got your code to work with the following changes:
>>>>>>>
>>>>>>> +# the input should be a valid FASTA file...
>>>>>>> ...
>>>>>>> open(NUC,'>',$nuc);
>>>>>>> +print NUC ">seq (need a name line for valid fasta)\n";
>>>>>>> print NUC $inpu1, "\n";
>>>>>>> close(NUC);
>>>>>>> ...
>>>>>>>
>>>>>>> +# you can set these header parms in the call itself...
>>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params,
>>>>>>> -ENTREZ_QUERY =>
>>>>>>> ''Trypanosoma Brucei[ORGN]');
>>>>>>>
>>>>>>>  #change a paramter
>>>>>>> +# commented this out...
>>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>> 'Trypanosoma
>>>>>>> Brucei[ORGN]';
>>>>>>>
>>>>>>> MAJ
>>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>>>> rtbio.2009 at gmail.com
>>>>>>> >
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, January 08, 2010 10:00 AM
>>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>>>>>>
>>>>>>>
>>>>>>>  Hello all,
>>>>>>>
>>>>>>>>
>>>>>>>> I was trying Remote blast using Bioperl. My input data is a
>>>>>>>> Trypanosoma
>>>>>>>> brucei sequence in Fasta format. When I was trying to submit to
>>>>>>>> BLAST
>>>>>>>> using
>>>>>>>> the step
>>>>>>>> $r=$factory->submit_blast($input)
>>>>>>>> It was not returning anything which I checked by debugging the code.
>>>>>>>> It is
>>>>>>>> not blasting my input sequence even though I mentioned all the
>>>>>>>> parameters.I
>>>>>>>> would paste the code below.
>>>>>>>>
>>>>>>>> Please help me in solving put this problem. It is very urgent.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Roopa.
>>>>>>>>
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #path for extra camel module
>>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>>>> use Roopablast;
>>>>>>>>
>>>>>>>>
>>>>>>>> use Bio::SearchIO;
>>>>>>>> use Bio::Search::Result::BlastResult;
>>>>>>>> use Bio::Perl;
>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>> use Bio::Seq;
>>>>>>>> use Bio::SeqIO;
>>>>>>>> use Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>> my $outstring ="";
>>>>>>>>
>>>>>>>> &parse_form;
>>>>>>>>
>>>>>>>> print "Content-type: text/html\n\n";
>>>>>>>> print "<HTML>\n";
>>>>>>>> print "<head><title>RNAi Result</title>";
>>>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>>>> print "</head>\n";
>>>>>>>> print "<body>\n";
>>>>>>>> print " Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>>>> print "</BODY>\n";
>>>>>>>> print "</HTML>\n";
>>>>>>>>
>>>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>>>> exit if $pid;
>>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>>>> </head>\n
>>>>>>>> <body>\n
>>>>>>>>  Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>>>> wait......<br>
>>>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>>>> </BODY>\n
>>>>>>>> </HTML>\n";
>>>>>>>>
>>>>>>>> close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> @compseqs = blastcode($in{'Inputseq'});
>>>>>>>>
>>>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>>>
>>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>>>> $in{'Threshold'});
>>>>>>>>
>>>>>>>>
>>>>>>>> sub blastcode
>>>>>>>> {
>>>>>>>>
>>>>>>>> $inpu1= $_[0];
>>>>>>>>
>>>>>>>> #$organ= $_[1];
>>>>>>>>
>>>>>>>> open(NUC,'>',$nuc);
>>>>>>>> print NUC $inpu1;
>>>>>>>> close(NUC);
>>>>>>>>
>>>>>>>> my $prog = 'blastn';
>>>>>>>> my $db   = 'refseq_rna';
>>>>>>>> my $e_val= '1e-10';
>>>>>>>> my $organism= 'Trypanosoma Brucei';
>>>>>>>>
>>>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>       '-data' => $db,
>>>>>>>>       '-expect' => $e_val,
>>>>>>>>       '-readmethod' => 'SearchIO',
>>>>>>>>       '-Organism'   => $organism );
>>>>>>>>
>>>>>>>>          # open(OUTFILE,'>',$debugfile);
>>>>>>>>           #  print OUTFILE @params;
>>>>>>>>           # close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>
>>>>>>>>  #change a paramter
>>>>>>>>
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>>>> Brucei[ORGN]';
>>>>>>>>
>>>>>>>> #change a paramter
>>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>>> '$input2[ORGN]';
>>>>>>>>
>>>>>>>>  my $v = 1;
>>>>>>>>  #$v is just to turn on and off the messages
>>>>>>>>
>>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>>>> '-organism' => 'Trypanosoma Brucei' );
>>>>>>>>
>>>>>>>>
>>>>>>>> while (my $input = $str->next_seq())
>>>>>>>> {
>>>>>>>>  #Blast a sequence against a database:
>>>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>>>  #and swap the two lines below for an example of that.
>>>>>>>>
>>>>>>>>           open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE $input;
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  my $r = $factory->submit_blast($input);    #The program stops here
>>>>>>>> it
>>>>>>>> does not return any value and it does not enter the While
>>>>>>>> loop,Please help
>>>>>>>> me in this regard.#
>>>>>>>>              open(OUTFILE,'>',$debugfile);
>>>>>>>>              print OUTFILE $r;
>>>>>>>>              close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>>>
>>>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>>>    open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "while entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>   foreach my $rid ( @rids ) {
>>>>>>>>
>>>>>>>>             open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "foreach entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>>>
>>>>>>>>      if( !ref($rc) )
>>>>>>>>      {
>>>>>>>>      if( $rc < 0 )
>>>>>>>>      {
>>>>>>>>      $factory->remove_rid($rid);
>>>>>>>>      }
>>>>>>>>       open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "if entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>       print STDERR "." if ( $v > 0 );
>>>>>>>>       sleep 5;
>>>>>>>>      }
>>>>>>>>     else {
>>>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "else entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>        my $result = $rc->next_result();
>>>>>>>>       #save the output
>>>>>>>>      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>        print BLASTDEBUGFILE $result->next_hit();
>>>>>>>>        close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>      my $filename =
>>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>>>
>>>>>>>>       # open(DEBUGFILE,'>',$debugfile);
>>>>>>>>       # open(new,'>',$filename);
>>>>>>>>       # @arra=<new>;
>>>>>>>>       # print DEBUGFILE @arra;
>>>>>>>>       # close(DEBUGFILE);
>>>>>>>>       # close(new);
>>>>>>>>
>>>>>>>>       $factory->save_output($filename);
>>>>>>>>
>>>>>>>>     # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>>>     # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>>>     # close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>     $factory->remove_rid($rid);
>>>>>>>>
>>>>>>>>     open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>     print BLASTDEBUGFILE  $organism;
>>>>>>>>      close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>  # open(OUTFILE,'>',$outfile);
>>>>>>>>  # print OUTFILE "Test2 $result->database_name()";
>>>>>>>>  # close(OUTFILE);
>>>>>>>>
>>>>>>>> #$hit = $result->next_hit;
>>>>>>>> #open(new,'>',$debugfile);
>>>>>>>> #print $hit;
>>>>>>>> #close(new);
>>>>>>>>
>>>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>>>
>>>>>>>>          next unless ( $v > 0);
>>>>>>>>
>>>>>>>>        #     open(OUTFILE,'>',$debugfile);
>>>>>>>>         #    print OUTFILE "$hit in while hits";
>>>>>>>>          #  close(OUTFILE);
>>>>>>>>
>>>>>>>>     my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>>>         my $dna = $sequ->seq();        # get the sequence as a
>>>>>>>> string
>>>>>>>>                push(@seqs,$dna);
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>    }
>>>>>>>>  }
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>>>  #print OUTFILE $seqs[0];
>>>>>>>>  #close(OUTFILE);
>>>>>>>>
>>>>>>>> return(@seqs);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile) || die ;
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>>>>>>> <body>\n
>>>>>>>> <p><font face=\"Courier, monospace font set\">
>>>>>>>> Inputsequence: <br>";
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      print OUTFILE " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      print OUTFILE "<br>\n";
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> print OUTFILE "</font> <p>";
>>>>>>>>
>>>>>>>> $z=@compseqs;
>>>>>>>>
>>>>>>>> for($k=1;$k<$z;$k++) {
>>>>>>>>  print OUTFILE "<font face=\"Courier, monospace font
>>>>>>>> set\"><p>Compare
>>>>>>>> Sequence: <br>";
>>>>>>>>
>>>>>>>>  for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>>>>>>
>>>>>>>>      print OUTFILE substr ($compseqs[$k], $i, 1);
>>>>>>>>
>>>>>>>>      if ( ($i+1)%10==0){
>>>>>>>>          print OUTFILE " ";
>>>>>>>>      }
>>>>>>>>      if ( ($i+1)%60==0){
>>>>>>>>          print OUTFILE "<br>\n";
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  print OUTFILE "<p></font>";
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<p>
>>>>>>>> Window: <br>$in{'Windowsize'}
>>>>>>>> <p>
>>>>>>>> <p>
>>>>>>>> Threshold: <br>$in{'Threshold'}
>>>>>>>> <p>";
>>>>>>>> my $j=0;
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>>>>>>      if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>>>>>>          $j=$in{'Windowsize'};
>>>>>>>>      }
>>>>>>>>      $height=$out[$i]->{similar}*5;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ($j>0) {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>      $j--;
>>>>>>>>  }
>>>>>>>>  else {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      $outstring .= " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      $outstring .= "<br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%800==0){
>>>>>>>>      print OUTFILE "<br><br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>>>>>>> set\">$outstring</font>";
>>>>>>>>
>>>>>>>> #foreach (@out) {
>>>>>>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar}
>>>>>>>> matchs<p>";
>>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>>>>>>
>>>>>>>> #    }
>>>>>>>> #}
>>>>>>>>
>>>>>>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>>>>>>
>>>>>>>> close OUTFILE;
>>>>>>>>
>>>>>>>> #nameprint();
>>>>>>>>
>>>>>>>> sub parse_form {
>>>>>>>>  local ($buffer, @pairs, $pair, $name, $value);
>>>>>>>>  # Read in text
>>>>>>>>  $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>>>>>>  if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>>>>>>  {
>>>>>>>>      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>>>>>>  }
>>>>>>>>  else
>>>>>>>>  {
>>>>>>>>      $buffer = $ENV{'QUERY_STRING'};
>>>>>>>>  }
>>>>>>>>  @pairs = split(/&/, $buffer);
>>>>>>>>  foreach $pair (@pairs)
>>>>>>>>  {
>>>>>>>>      ($name, $value) = split(/=/, $pair);
>>>>>>>>      $value =~ tr/+/ /;
>>>>>>>>      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>>>>>>      $in{$name} = $value;
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

From maj at fortinbras.us  Fri Jan 22 07:34:59 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 07:34:59 -0500
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
Message-ID: <BB6A0E3FAC154E8FB690E5749375A1BC@NewLife>

I'm down with that.

----- Original Message ----- 
From: "Jason Stajich" <jason at bioperl.org>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 1:17 AM
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO


> I'm considering putting in allowable initialization parameter (and get/ 
> set) for Bio::AlignIO that would allow setting of the alphabet.  This  
> is then passed to Bio::LocatableSeq creation so that _guess_alphabet  
> isn't called. This will allow removal of warnings about empty  
> sequences because _guess_alphabet won't be called on a sequence if we  
> have explictly set the alphabet.
> 
> This worked great on my local install and tests pass.  Any objections  
> or concerns?
> 
> basically it means when you make an AlignIO you can specify the  
> alphabet i.e.
> 
> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
> file => 'genome.fasaln');
> 
> I have some alignments with empty sequences and I think turning off  
> the warnings is appropriate where I force the alphabet choice. It  
> should also have a very modest speedup benefit too.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From avilella at gmail.com  Fri Jan 22 08:07:26 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 13:07:26 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
Message-ID: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>

Hi,

I would like to write a script that merges fragments in a Bio::SimpleAlign
object on the basis of
some $seq->display_name rule.

I basically want to start with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.234     QWERTYU-------------------
seq2.345     ----------ASDFGH----------
seq2.456     -------------------ZXCVBNM

And end with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM

Can people suggest any Bio::SimpleAlign methods that would help here?

Cheers,

Albert.

From maj at fortinbras.us  Fri Jan 22 08:31:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 08:31:54 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
Message-ID: <EF1FEC1B43C146B6BBF827EA56171777@NewLife>

Here's one of my favorite tricks for this: XOR mask on gap symbol.
MAJ

use Bio::SeqIO;
use Bio::Seq;
use strict; 

my $seqio = Bio::SeqIO->new( -fh => \*DATA );

my $acc = $seqio->next_seq->seq ^ '-';
while ($_ = $seqio->next_seq ) {
    $acc ^= ($_->seq ^ '-');
}
my $mrg = Bio::Seq->new( -id => 'merged',
    -seq => $acc ^ '-' );
1;


__END__
>seq2.234     
QWERTYU-------------------
>seq2.345     
----------ASDFGH----------
>seq2.456     
-------------------ZXCVBNM

----- Original Message ----- 
From: "Albert Vilella" <avilella at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 8:07 AM
Subject: [Bioperl-l] Merging fragments in a simplealign


> Hi,
> 
> I would like to write a script that merges fragments in a Bio::SimpleAlign
> object on the basis of
> some $seq->display_name rule.
> 
> I basically want to start with something like this:
> 
> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> seq2.234     QWERTYU-------------------
> seq2.345     ----------ASDFGH----------
> seq2.456     -------------------ZXCVBNM
> 
> And end with something like this:
> 
> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
> 
> Can people suggest any Bio::SimpleAlign methods that would help here?
> 
> Cheers,
> 
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From cjfields at illinois.edu  Fri Jan 22 08:34:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 07:34:07 -0600
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
Message-ID: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>

Sounds good to me.  The warnings are a bit too tight on this module anyway.

I still think we have plans towards refactoring some of this, not sure how far along they are:

http://www.bioperl.org/wiki/Align_Refactor

chris

On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote:

> I'm considering putting in allowable initialization parameter (and get/set) for Bio::AlignIO that would allow setting of the alphabet.  This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet.
> 
> This worked great on my local install and tests pass.  Any objections or concerns?
> 
> basically it means when you make an AlignIO you can specify the alphabet i.e.
> 
> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', -file => 'genome.fasaln');
> 
> I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Fri Jan 22 08:40:57 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 07:40:57 -0600
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <EF1FEC1B43C146B6BBF827EA56171777@NewLife>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
Message-ID: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>

May be something for the cook/scrapbook?

chris

On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:

> Here's one of my favorite tricks for this: XOR mask on gap symbol.
> MAJ
> 
> use Bio::SeqIO;
> use Bio::Seq;
> use strict; 
> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
> 
> my $acc = $seqio->next_seq->seq ^ '-';
> while ($_ = $seqio->next_seq ) {
>   $acc ^= ($_->seq ^ '-');
> }
> my $mrg = Bio::Seq->new( -id => 'merged',
>   -seq => $acc ^ '-' );
> 1;
> 
> 
> __END__
>> seq2.234     
> QWERTYU-------------------
>> seq2.345     
> ----------ASDFGH----------
>> seq2.456     
> -------------------ZXCVBNM
> 
> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 22, 2010 8:07 AM
> Subject: [Bioperl-l] Merging fragments in a simplealign
> 
> 
>> Hi,
>> I would like to write a script that merges fragments in a Bio::SimpleAlign
>> object on the basis of
>> some $seq->display_name rule.
>> I basically want to start with something like this:
>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> seq2.234     QWERTYU-------------------
>> seq2.345     ----------ASDFGH----------
>> seq2.456     -------------------ZXCVBNM
>> And end with something like this:
>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>> Can people suggest any Bio::SimpleAlign methods that would help here?
>> Cheers,
>> Albert.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From holland at eaglegenomics.com  Fri Jan 22 05:51:52 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 22 Jan 2010 10:51:52 +0000
Subject: [Bioperl-l] [BioSQL-l] SwissProt DE lines and UniProt XML /
	TagTree as XML in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <8FECCBDE-2DE1-40EE-B5A4-73BDAC893E2D@eaglegenomics.com>

Nice idea. Currently, BioJava just stores the complete section as a string without parsing it, but it provides a parser module for converting it into useful tag/value format within a user's program (but not to be stored in BioSQL).

On 21 Jan 2010, at 12:33, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andrea at biocomp.unibo.it  Fri Jan 22 07:18:32 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 22 Jan 2010 13:18:32 +0100 (CET)
Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML
	in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <2b6e30c4628585042366646a7b46386e.squirrel@lipid.biocomp.unibo.it>

I think that the point here can be a little broader, since not only the
swissprot DE lines carry complex and structured data.
To define a common, language-independent way to store structured data into
the comment and *_qualifier_value tables of the actual BioSQL schema could
be very useful.
XML looks like a good candidate to me, and the UniprotXML format can be
used as reference or as a template to start from.
Each Bio* project will then parse and report this structured data in its
own programming language data structure.

Andrea


> Hi all,
>
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
>
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
>
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
>
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
>
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
>
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
>
> Regards,
>
> Peter
>


From avilella at gmail.com  Fri Jan 22 11:04:13 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 16:04:13 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
Message-ID: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>

Is there/should be a 'have_pairwise_overlap' method similar to this?

# $seq1 and $seq3 have matching ids
my $seq1 = $aln->each_seq_by_id($seq1->display_id);
my $seq3 = $aln->each_seq_by_id($seq3->display_id);

my $ret = $aln->have_pairwise_overlap($seq1,$seq3);

On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu> wrote:

> May be something for the cook/scrapbook?
>
> chris
>
> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>
> > Here's one of my favorite tricks for this: XOR mask on gap symbol.
> > MAJ
> >
> > use Bio::SeqIO;
> > use Bio::Seq;
> > use strict;
> > my $seqio = Bio::SeqIO->new( -fh => \*DATA );
> >
> > my $acc = $seqio->next_seq->seq ^ '-';
> > while ($_ = $seqio->next_seq ) {
> >   $acc ^= ($_->seq ^ '-');
> > }
> > my $mrg = Bio::Seq->new( -id => 'merged',
> >   -seq => $acc ^ '-' );
> > 1;
> >
> >
> > __END__
> >> seq2.234
> > QWERTYU-------------------
> >> seq2.345
> > ----------ASDFGH----------
> >> seq2.456
> > -------------------ZXCVBNM
> >
> > ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
> > To: <bioperl-l at lists.open-bio.org>
> > Sent: Friday, January 22, 2010 8:07 AM
> > Subject: [Bioperl-l] Merging fragments in a simplealign
> >
> >
> >> Hi,
> >> I would like to write a script that merges fragments in a
> Bio::SimpleAlign
> >> object on the basis of
> >> some $seq->display_name rule.
> >> I basically want to start with something like this:
> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> >> seq2.234     QWERTYU-------------------
> >> seq2.345     ----------ASDFGH----------
> >> seq2.456     -------------------ZXCVBNM
> >> And end with something like this:
> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> >> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
> >> Can people suggest any Bio::SimpleAlign methods that would help here?
> >> Cheers,
> >> Albert.
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>

From maj at fortinbras.us  Fri Jan 22 11:02:55 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 11:02:55 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
Message-ID: <BE7957A2791345DAB092D997A4656AA8@NewLife>

http://www.bioperl.org/wiki/Merge_gapped_sequences_across_a_common_region
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Albert Vilella" <avilella at gmail.com>; <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 8:40 AM
Subject: Re: [Bioperl-l] Merging fragments in a simplealign


> May be something for the cook/scrapbook?
> 
> chris
> 
> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
> 
>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>> MAJ
>> 
>> use Bio::SeqIO;
>> use Bio::Seq;
>> use strict; 
>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>> 
>> my $acc = $seqio->next_seq->seq ^ '-';
>> while ($_ = $seqio->next_seq ) {
>>   $acc ^= ($_->seq ^ '-');
>> }
>> my $mrg = Bio::Seq->new( -id => 'merged',
>>   -seq => $acc ^ '-' );
>> 1;
>> 
>> 
>> __END__
>>> seq2.234     
>> QWERTYU-------------------
>>> seq2.345     
>> ----------ASDFGH----------
>>> seq2.456     
>> -------------------ZXCVBNM
>> 
>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 22, 2010 8:07 AM
>> Subject: [Bioperl-l] Merging fragments in a simplealign
>> 
>> 
>>> Hi,
>>> I would like to write a script that merges fragments in a Bio::SimpleAlign
>>> object on the basis of
>>> some $seq->display_name rule.
>>> I basically want to start with something like this:
>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>> seq2.234     QWERTYU-------------------
>>> seq2.345     ----------ASDFGH----------
>>> seq2.456     -------------------ZXCVBNM
>>> And end with something like this:
>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>> Cheers,
>>> Albert.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
>

From avilella at gmail.com  Fri Jan 22 12:50:57 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 17:50:57 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
	<358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
Message-ID: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>

Or to rephrase my answer, what is the closest way for the code below that
already exists?

On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:

> Is there/should be a 'have_pairwise_overlap' method similar to this?
>
> # $seq1 and $seq3 have matching ids
> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>
> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>
>
> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> May be something for the cook/scrapbook?
>>
>> chris
>>
>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>
>> > Here's one of my favorite tricks for this: XOR mask on gap symbol.
>> > MAJ
>> >
>> > use Bio::SeqIO;
>> > use Bio::Seq;
>> > use strict;
>> > my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>> >
>> > my $acc = $seqio->next_seq->seq ^ '-';
>> > while ($_ = $seqio->next_seq ) {
>> >   $acc ^= ($_->seq ^ '-');
>> > }
>> > my $mrg = Bio::Seq->new( -id => 'merged',
>> >   -seq => $acc ^ '-' );
>> > 1;
>> >
>> >
>> > __END__
>> >> seq2.234
>> > QWERTYU-------------------
>> >> seq2.345
>> > ----------ASDFGH----------
>> >> seq2.456
>> > -------------------ZXCVBNM
>> >
>> > ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>> >
>> > To: <bioperl-l at lists.open-bio.org>
>> > Sent: Friday, January 22, 2010 8:07 AM
>> > Subject: [Bioperl-l] Merging fragments in a simplealign
>> >
>> >
>> >> Hi,
>> >> I would like to write a script that merges fragments in a
>> Bio::SimpleAlign
>> >> object on the basis of
>> >> some $seq->display_name rule.
>> >> I basically want to start with something like this:
>> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> >> seq2.234     QWERTYU-------------------
>> >> seq2.345     ----------ASDFGH----------
>> >> seq2.456     -------------------ZXCVBNM
>> >> And end with something like this:
>> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> >> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>> >> Can people suggest any Bio::SimpleAlign methods that would help here?
>> >> Cheers,
>> >> Albert.
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>

From jay at jays.net  Fri Jan 22 13:30:57 2010
From: jay at jays.net (Jay Hannah)
Date: Fri, 22 Jan 2010 12:30:57 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
In-Reply-To: <BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>
References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
	<BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>
Message-ID: <EAD0FFCE-6DDF-4723-8D08-70ECF157FAAA@jays.net>

On Jan 21, 2010, at 10:31 PM, Chris Fields wrote:
> Did you want to release it to CPAN?  I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight.

Yes, I was thinking I would. No one has (yet) told me it's the worst idea ever, so I'm feeling encouraged.  :)

Given smallish inputs / databases (up to a few million rows) where some lightweight schema + SQLite + BioPerl can get the job done, it's nice to have a little easy-to-run toolbox. New tables and Roles bolt on easily, so I'll be adding them as they surface at $work[1]. 

Thanks for your interest.   :)

Jay Hannah
http://github.com/jhannah/bio-broodcomb
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From dalalhina at gmail.com  Fri Jan 22 12:31:09 2010
From: dalalhina at gmail.com (hina dalal)
Date: Fri, 22 Jan 2010 17:31:09 +0000
Subject: [Bioperl-l] Bioperl installation failed
Message-ID: <425f75df1001220931t49f5c768j97d91d2dd1757f19@mail.gmail.com>

Hi


I have installed PERL from Activesate and now trying to install bioperl but
can not do it . Neither from PPM (it is showing error ?Ppm install failed:
404 not found?) nor from CPAN / manual installation. It is not allowing me
to download nmake, showing that ?the version of this file is not compatible
with the version of windows you are running. Check your computer system
information to see whether you need 32 bit or 64 bit of this program.? I am
using windows VISTA.


Please help.


Regards


Hina


From H.Dalal at sms.ed.ac.uk  Fri Jan 22 12:34:55 2010
From: H.Dalal at sms.ed.ac.uk (Hina Dalal)
Date: Fri, 22 Jan 2010 17:34:55 +0000
Subject: [Bioperl-l] BioPerl installation failed: please help
Message-ID: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>

Hi

I have installed PERL from Activesate and now trying to install  
bioperl but can not do it . Neither from PPM (it is showing error ?Ppm  
install failed: 404 not found?) nor from CPAN manual installation. It  
is not allowing me to download nmake, showing that ?the version of  
this file is not compatible with the version of windows you are  
running. Check your computer system information to see whether you  
need 32 bit or 64 bit of this program.?

Please help.

Regards

Hina


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jason at bioperl.org  Fri Jan 22 14:18:30 2010
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 22 Jan 2010 11:18:30 -0800
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
	<55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>
Message-ID: <59EC9331-FB2F-4338-AD58-2D501A528A18@bioperl.org>

Done, as of r16739. Look forward to the refactor work too.

-jason
On Jan 22, 2010, at 5:34 AM, Chris Fields wrote:

> Sounds good to me.  The warnings are a bit too tight on this module  
> anyway.
>
> I still think we have plans towards refactoring some of this, not  
> sure how far along they are:
>
> http://www.bioperl.org/wiki/Align_Refactor
>
> chris
>
> On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote:
>
>> I'm considering putting in allowable initialization parameter (and  
>> get/set) for Bio::AlignIO that would allow setting of the  
>> alphabet.  This is then passed to Bio::LocatableSeq creation so  
>> that _guess_alphabet isn't called. This will allow removal of  
>> warnings about empty sequences because _guess_alphabet won't be  
>> called on a sequence if we have explictly set the alphabet.
>>
>> This worked great on my local install and tests pass.  Any  
>> objections or concerns?
>>
>> basically it means when you make an AlignIO you can specify the  
>> alphabet i.e.
>>
>> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
>> file => 'genome.fasaln');
>>
>> I have some alignments with empty sequences and I think turning off  
>> the warnings is appropriate where I force the alphabet choice. It  
>> should also have a very modest speedup benefit too.
>>
>> -jason
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>> http://twitter.com/hyphaltip
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From cjfields at illinois.edu  Fri Jan 22 14:22:43 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 13:22:43 -0600
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
	<358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
	<358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
Message-ID: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>

This could exist, but should go into a general Utilities module.  Part of the Align refactoring was to pull a good number of the methods into a general utilities module, so this would fit into that category.

chris

On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote:

> Or to rephrase my answer, what is the closest way for the code below that
> already exists?
> 
> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:
> 
>> Is there/should be a 'have_pairwise_overlap' method similar to this?
>> 
>> # $seq1 and $seq3 have matching ids
>> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
>> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>> 
>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>> 
>> 
>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>> 
>>> May be something for the cook/scrapbook?
>>> 
>>> chris
>>> 
>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>> 
>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>>>> MAJ
>>>> 
>>>> use Bio::SeqIO;
>>>> use Bio::Seq;
>>>> use strict;
>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>>>> 
>>>> my $acc = $seqio->next_seq->seq ^ '-';
>>>> while ($_ = $seqio->next_seq ) {
>>>>  $acc ^= ($_->seq ^ '-');
>>>> }
>>>> my $mrg = Bio::Seq->new( -id => 'merged',
>>>>  -seq => $acc ^ '-' );
>>>> 1;
>>>> 
>>>> 
>>>> __END__
>>>>> seq2.234
>>>> QWERTYU-------------------
>>>>> seq2.345
>>>> ----------ASDFGH----------
>>>>> seq2.456
>>>> -------------------ZXCVBNM
>>>> 
>>>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>>>> 
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Friday, January 22, 2010 8:07 AM
>>>> Subject: [Bioperl-l] Merging fragments in a simplealign
>>>> 
>>>> 
>>>>> Hi,
>>>>> I would like to write a script that merges fragments in a
>>> Bio::SimpleAlign
>>>>> object on the basis of
>>>>> some $seq->display_name rule.
>>>>> I basically want to start with something like this:
>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>> seq2.234     QWERTYU-------------------
>>>>> seq2.345     ----------ASDFGH----------
>>>>> seq2.456     -------------------ZXCVBNM
>>>>> And end with something like this:
>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>>>> Cheers,
>>>>> Albert.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Fri Jan 22 14:29:07 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 14:29:07 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com><EF1FEC1B43C146B6BBF827EA56171777@NewLife><058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu><358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com><358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
	<14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>
Message-ID: <0F7B7E5FE70D4C5CB34B27045561823C@NewLife>

I'd recommend making an enhancement request via Bugzilla, so we don't forget-
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Albert Vilella" <avilella at gmail.com>
Cc: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 2:22 PM
Subject: Re: [Bioperl-l] Merging fragments in a simplealign


> This could exist, but should go into a general Utilities module.  Part of the 
> Align refactoring was to pull a good number of the methods into a general 
> utilities module, so this would fit into that category.
>
> chris
>
> On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote:
>
>> Or to rephrase my answer, what is the closest way for the code below that
>> already exists?
>>
>> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:
>>
>>> Is there/should be a 'have_pairwise_overlap' method similar to this?
>>>
>>> # $seq1 and $seq3 have matching ids
>>> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
>>> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>>>
>>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>>>
>>>
>>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>>>
>>>> May be something for the cook/scrapbook?
>>>>
>>>> chris
>>>>
>>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>>>
>>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>>>>> MAJ
>>>>>
>>>>> use Bio::SeqIO;
>>>>> use Bio::Seq;
>>>>> use strict;
>>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>>>>>
>>>>> my $acc = $seqio->next_seq->seq ^ '-';
>>>>> while ($_ = $seqio->next_seq ) {
>>>>>  $acc ^= ($_->seq ^ '-');
>>>>> }
>>>>> my $mrg = Bio::Seq->new( -id => 'merged',
>>>>>  -seq => $acc ^ '-' );
>>>>> 1;
>>>>>
>>>>>
>>>>> __END__
>>>>>> seq2.234
>>>>> QWERTYU-------------------
>>>>>> seq2.345
>>>>> ----------ASDFGH----------
>>>>>> seq2.456
>>>>> -------------------ZXCVBNM
>>>>>
>>>>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>>>>>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Friday, January 22, 2010 8:07 AM
>>>>> Subject: [Bioperl-l] Merging fragments in a simplealign
>>>>>
>>>>>
>>>>>> Hi,
>>>>>> I would like to write a script that merges fragments in a
>>>> Bio::SimpleAlign
>>>>>> object on the basis of
>>>>>> some $seq->display_name rule.
>>>>>> I basically want to start with something like this:
>>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>>> seq2.234     QWERTYU-------------------
>>>>>> seq2.345     ----------ASDFGH----------
>>>>>> seq2.456     -------------------ZXCVBNM
>>>>>> And end with something like this:
>>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>>>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>>>>> Cheers,
>>>>>> Albert.
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Fri Jan 22 14:33:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 14:33:41 -0500
Subject: [Bioperl-l] BioPerl installation failed: please help
In-Reply-To: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>
References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>
Message-ID: <2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife>

Hina-- 
See the protocol at 
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
for ActiveState installation. If it doesn't work, please let us know at which 
step the failure happened.
cheers, MAJ
----- Original Message ----- 
From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 12:34 PM
Subject: [Bioperl-l] BioPerl installation failed: please help


Hi

I have installed PERL from Activesate and now trying to install
bioperl but can not do it . Neither from PPM (it is showing error "Ppm
install failed: 404 not found") nor from CPAN manual installation. It
is not allowing me to download nmake, showing that "the version of
this file is not compatible with the version of windows you are
running. Check your computer system information to see whether you
need 32 bit or 64 bit of this program."

Please help.

Regards

Hina


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Fri Jan 22 15:13:15 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 15:13:15 -0500
Subject: [Bioperl-l] BioPerl installation failed: please help
In-Reply-To: <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk>
References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk><2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife>
	<20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk>
Message-ID: <9E5DE384E2C8416B8373E390ABDB7DFE@NewLife>

Ok Hina,
I'm not seeing any issues with the presence or availability of 
http://bioperl.org/DIST
from my machine. Can you access that url in a browser? If not, the king of the 
King's
Buildings may not be allowing access. Also, can you do the following:

C:> ppm-shell
ppm> repo list

Note the number of the repo that corresponds to bioperl (if any) and do

ppm> repo describe n

where 'n' is that number, and send the output along.

cheers, MAJ

----- Original Message ----- 
From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Sent: Friday, January 22, 2010 3:01 PM
Subject: Re: [Bioperl-l] BioPerl installation failed: please help


Hi Mark

warm regards

I was following that protocol only , but the problem is when I tried
to do it from PPM, and when I reach at the stem install BioPerl, it is
showing error "Ppm
install failed: 404 not found" in the end. and when I tried it by CPAN
/manual installation, I couldn't download nmake,its showing that "the
version of
this file is not compatible with the version of windows you are
running. Check your computer system information to see whether you
need 32 bit or 64 bit of this program and than contact the software
publisher."


What should I do? Please help.

Regards

Hina


Quoting "Mark A. Jensen" <maj at fortinbras.us>:

> Hina-- See the protocol at
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
> for ActiveState installation. If it doesn't work, please let us know at
> which step the failure happened.
> cheers, MAJ
> ----- Original Message ----- From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 22, 2010 12:34 PM
> Subject: [Bioperl-l] BioPerl installation failed: please help
>
>
> Hi
>
> I have installed PERL from Activesate and now trying to install
> bioperl but can not do it . Neither from PPM (it is showing error "Ppm
> install failed: 404 not found") nor from CPAN manual installation. It
> is not allowing me to download nmake, showing that "the version of
> this file is not compatible with the version of windows you are
> running. Check your computer system information to see whether you
> need 32 bit or 64 bit of this program."
>
> Please help.
>
> Regards
>
> Hina
>
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From pengyu.ut at gmail.com  Sun Jan 24 20:29:59 2010
From: pengyu.ut at gmail.com (Peng Yu)
Date: Sun, 24 Jan 2010 19:29:59 -0600
Subject: [Bioperl-l] Transcribe in bioperl
Message-ID: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>

I found the function 'translate' in bioperl. But I don't find
'transcribe'. Is there such a function?

From jason at bioperl.org  Sun Jan 24 21:06:48 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 24 Jan 2010 18:06:48 -0800
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
Message-ID: <BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>

What exactly do you want to do?
spliced_seq for a feature would be the closest thing...

-jason
On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:

> I found the function 'translate' in bioperl. But I don't find
> 'transcribe'. Is there such a function?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From pengyu.ut at gmail.com  Sun Jan 24 21:22:12 2010
From: pengyu.ut at gmail.com (Peng Yu)
Date: Sun, 24 Jan 2010 20:22:12 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
	<BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
Message-ID: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>

To convert from T to U. I could use perl's builtin function. But it is
semantically far away from 'transcribe'. If there is a function with
name 'transcribe', it will be better.

On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
> What exactly do you want to do?
> spliced_seq for a feature would be the closest thing...
>
> -jason
> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>
>> I found the function 'translate' in bioperl. But I don't find
>> 'transcribe'. Is there such a function?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
>
>

From maj at fortinbras.us  Sun Jan 24 21:48:33 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 24 Jan 2010 21:48:33 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
Message-ID: <FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>

Not a bad idea, a semantics-preserving/checking thing. 
transcribe() could return an object with alphabet == 'rna'
and the T's flipped, or bork if called against an object with alphbet != 'dna'.
I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to 
be stashed), if desired.

----- Original Message ----- 
From: "Peng Yu" <pengyu.ut at gmail.com>
To: "Jason Stajich" <jason at bioperl.org>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 24, 2010 9:22 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


> To convert from T to U. I could use perl's builtin function. But it is
> semantically far away from 'transcribe'. If there is a function with
> name 'transcribe', it will be better.
> 
> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>> What exactly do you want to do?
>> spliced_seq for a feature would be the closest thing...
>>
>> -jason
>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>
>>> I found the function 'translate' in bioperl. But I don't find
>>> 'transcribe'. Is there such a function?
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>> http://twitter.com/hyphaltip
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From cjfields at illinois.edu  Sun Jan 24 23:39:43 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 22:39:43 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
	<FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
Message-ID: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>

I think the main reason there hasn't been a transcribe() is that very few users ask for it.  Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() and/or translate() (i.e. they don't care about the intermediate mRNA).  I don't have a problem with adding a transcribe method to PrimarySeq, but (and Mark has already picked up on this) it should be constrained to DNA only and return RNA.  And there might be a case for adding the analogous reverse_translate().  

Also worth adding this to the proper interface class (PrimarySeqI, I think) so all Seq/PrimarySeq will have it (or have to implement their own).

chris

On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote:

> Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna'
> and the T's flipped, or bork if called against an object with alphbet != 'dna'.
> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired.
> 
> ----- Original Message ----- From: "Peng Yu" <pengyu.ut at gmail.com>
> To: "Jason Stajich" <jason at bioperl.org>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Sunday, January 24, 2010 9:22 PM
> Subject: Re: [Bioperl-l] Transcribe in bioperl
> 
> 
>> To convert from T to U. I could use perl's builtin function. But it is
>> semantically far away from 'transcribe'. If there is a function with
>> name 'transcribe', it will be better.
>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>>> What exactly do you want to do?
>>> spliced_seq for a feature would be the closest thing...
>>> 
>>> -jason
>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>> 
>>>> I found the function 'translate' in bioperl. But I don't find
>>>> 'transcribe'. Is there such a function?
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> --
>>> Jason Stajich
>>> jason.stajich at gmail.com
>>> jason at bioperl.org
>>> http://fungalgenomes.org/
>>> http://twitter.com/hyphaltip
>>> 
>>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Sun Jan 24 23:43:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 22:43:07 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
	<FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
	<B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
Message-ID: <489E0B85-0BC3-45DB-8660-494CF69F35FF@illinois.edu>


On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:

> ...And there might be a case for adding the analogous reverse_translate().  

Bah.  Meant reverse_transcribe().  Ah well.

chris

From dan.kortschak at adelaide.edu.au  Mon Jan 25 00:33:28 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 25 Jan 2010 16:03:28 +1030
Subject: [Bioperl-l] BEDTools module
Message-ID: <1264397608.4898.9.camel@epistle>

Hi All,

A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
and Ira Hall is now available in the bioperl-run subversion repository
(bioperl-run/trunk r16754).

Using BEDTools you can, among other things:

      * Intersecting two BED files in search of overlapping features.
      * Merging overlapping features.
      * Screening for paired-end (PE) overlaps between PE sequences and
        existing genomic features.
      * Calculating the depth and breadth of sequence coverage across
        defined "windows" in a genome.

(see <http://code.google.com/p/bedtools/> for manuals and downloads).

BEDTools is a suite of 17 commandline executable. The module attempts to
provide and options comprehensively and can return Bio::SeqIO or
Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
where specific handling has not been implemented - please give feedback
on desired features for this).

cheers
Dan


From cjfields at illinois.edu  Mon Jan 25 00:35:06 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 23:35:06 -0600
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
Message-ID: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>

Just a quick question for those using DNAStatistics.  I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data:

>seq1
GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>seq2
GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>seq3
GGTACCAGCAGGTGGTCCGCCTA------------------------------
>seq4
--------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC

Since seq3 and seq4 don't overlap, the distance can't be calculated.  In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage.  Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere?

chris

From jason at bioperl.org  Mon Jan 25 00:58:03 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 24 Jan 2010 21:58:03 -0800
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
In-Reply-To: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
Message-ID: <B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>

It could also return -1 which is used as place holder for NA in other  
programs that generate distance matrices.
-jason
On Jan 24, 2010, at 9:35 PM, Chris Fields wrote:

> Just a quick question for those using DNAStatistics.  I just fixed a  
> bug in Bio::Align::DNAStatistics that failed with a div by zero  
> error (bug 2901) on this data:
>
>> seq1
> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>> seq2
> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>> seq3
> GGTACCAGCAGGTGGTCCGCCTA------------------------------
>> seq4
> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC
>
> Since seq3 and seq4 don't overlap, the distance can't be  
> calculated.  In our case, I replace the score with 'NA' as a  
> placeholder, but I'm worried about downstream app breakage.  Anyone  
> have an objection to using 'NA' here, or know of ways this may lead  
> to problems elsewhere?
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From maj at fortinbras.us  Mon Jan 25 08:17:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 08:17:54 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org><366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com><FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
	<B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
Message-ID: <ED0F320909EF4DB99FF0C91423F83209@NewLife>

transcribe() and rev_transcribe added to Bio::PrimarySeqI, plus tests in 
t/Seq.t, @ r16757
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Peng Yu" <pengyu.ut at gmail.com>
Sent: Sunday, January 24, 2010 11:39 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


>I think the main reason there hasn't been a transcribe() is that very few users 
>ask for it.  Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() 
>and/or translate() (i.e. they don't care about the intermediate mRNA).  I don't 
>have a problem with adding a transcribe method to PrimarySeq, but (and Mark has 
>already picked up on this) it should be constrained to DNA only and return RNA. 
>And there might be a case for adding the analogous reverse_translate().
>
> Also worth adding this to the proper interface class (PrimarySeqI, I think) so 
> all Seq/PrimarySeq will have it (or have to implement their own).
>
> chris
>
> On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote:
>
>> Not a bad idea, a semantics-preserving/checking thing. transcribe() could 
>> return an object with alphabet == 'rna'
>> and the T's flipped, or bork if called against an object with alphbet != 
>> 'dna'.
>> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to 
>> be stashed), if desired.
>>
>> ----- Original Message ----- From: "Peng Yu" <pengyu.ut at gmail.com>
>> To: "Jason Stajich" <jason at bioperl.org>
>> Cc: <bioperl-l at lists.open-bio.org>
>> Sent: Sunday, January 24, 2010 9:22 PM
>> Subject: Re: [Bioperl-l] Transcribe in bioperl
>>
>>
>>> To convert from T to U. I could use perl's builtin function. But it is
>>> semantically far away from 'transcribe'. If there is a function with
>>> name 'transcribe', it will be better.
>>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>> What exactly do you want to do?
>>>> spliced_seq for a feature would be the closest thing...
>>>>
>>>> -jason
>>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>>>
>>>>> I found the function 'translate' in bioperl. But I don't find
>>>>> 'transcribe'. Is there such a function?
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at gmail.com
>>>> jason at bioperl.org
>>>> http://fungalgenomes.org/
>>>> http://twitter.com/hyphaltip
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Mon Jan 25 08:23:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 07:23:12 -0600
Subject: [Bioperl-l] BEDTools module
In-Reply-To: <1264397608.4898.9.camel@epistle>
References: <1264397608.4898.9.camel@epistle>
Message-ID: <0F5CE93E-0E6C-4317-806B-A463A9B0917E@illinois.edu>

Great work Dan!  

chris

On Jan 24, 2010, at 11:33 PM, Dan Kortschak wrote:

> Hi All,
> 
> A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
> and Ira Hall is now available in the bioperl-run subversion repository
> (bioperl-run/trunk r16754).
> 
> Using BEDTools you can, among other things:
> 
>      * Intersecting two BED files in search of overlapping features.
>      * Merging overlapping features.
>      * Screening for paired-end (PE) overlaps between PE sequences and
>        existing genomic features.
>      * Calculating the depth and breadth of sequence coverage across
>        defined "windows" in a genome.
> 
> (see <http://code.google.com/p/bedtools/> for manuals and downloads).
> 
> BEDTools is a suite of 17 commandline executable. The module attempts to
> provide and options comprehensively and can return Bio::SeqIO or
> Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
> where specific handling has not been implemented - please give feedback
> on desired features for this).
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 25 08:27:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 07:27:26 -0600
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
In-Reply-To: <B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>
References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
	<B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>
Message-ID: <D46CA8B2-780B-4AA5-B9E3-07EADC0D79C1@illinois.edu>

That works for me, just want to ensure we're DTRT.  I'll change it over.

chris

On Jan 24, 2010, at 11:58 PM, Jason Stajich wrote:

> It could also return -1 which is used as place holder for NA in other programs that generate distance matrices.
> -jason
> On Jan 24, 2010, at 9:35 PM, Chris Fields wrote:
> 
>> Just a quick question for those using DNAStatistics.  I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data:
>> 
>>> seq1
>> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>>> seq2
>> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>>> seq3
>> GGTACCAGCAGGTGGTCCGCCTA------------------------------
>>> seq4
>> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC
>> 
>> Since seq3 and seq4 don't overlap, the distance can't be calculated.  In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage.  Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere?
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jan 25 08:41:38 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 08:41:38 -0500
Subject: [Bioperl-l] BEDTools module
In-Reply-To: <1264397608.4898.9.camel@epistle>
References: <1264397608.4898.9.camel@epistle>
Message-ID: <8D494783F87E4C32BD797008E260C3C2@NewLife>

Rock 'n' roll, Dan!
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 12:33 AM
Subject: [Bioperl-l] BEDTools module


> Hi All,
> 
> A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
> and Ira Hall is now available in the bioperl-run subversion repository
> (bioperl-run/trunk r16754).
> 
> Using BEDTools you can, among other things:
> 
>      * Intersecting two BED files in search of overlapping features.
>      * Merging overlapping features.
>      * Screening for paired-end (PE) overlaps between PE sequences and
>        existing genomic features.
>      * Calculating the depth and breadth of sequence coverage across
>        defined "windows" in a genome.
> 
> (see <http://code.google.com/p/bedtools/> for manuals and downloads).
> 
> BEDTools is a suite of 17 commandline executable. The module attempts to
> provide and options comprehensively and can return Bio::SeqIO or
> Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
> where specific handling has not been implemented - please give feedback
> on desired features for this).
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From rtbio.2009 at gmail.com  Mon Jan 25 08:43:19 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 14:43:19 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
Message-ID: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>

Hello Mark,Chris and all,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");
- Show quoted text -


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".
time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.

From rtbio.2009 at gmail.com  Mon Jan 25 08:44:57 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 14:44:57 +0100
Subject: [Bioperl-l] remote blast bioperl
Message-ID: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>

Hello all,

I have a small problem again. I am working on Remote blast. The program
works well. But the problem is this.  The program accesses the server and
gets the output correctly. I am trying to send the result sequences into an
array and I found that always the first sequence among the Result sequences
is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");
- Show quoted text -


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".
time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.

From cjfields at illinois.edu  Mon Jan 25 09:05:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 08:05:44 -0600
Subject: [Bioperl-l] remote blast bioperl
In-Reply-To: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>
References: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>
Message-ID: <7E402CC5-9C66-4315-B437-7C4EC2317371@illinois.edu>

Roopa,

We have received all 4+ of your posts.  There is absolutely no need for you to keep repeatedly posting the same thing to the list.  Be patient, we'll try to get to you as soon as we can!

chris

On Jan 25, 2010, at 7:44 AM, Roopa Raghuveer wrote:

> Hello all,
> 
> I have a small problem again. I am working on Remote blast. The program works well. But the problem is this.  The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is
> 
>  my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]");
> - Show quoted text -
> 
> 
> while (my $input = $str->next_seq())
> {
>    #Blast a sequence against a database:
>     #Alternatively, you could  pass in a file with many
>     #sequences rather than loop through sequence one at a time
>     #Remove the loop starting 'while (my $input = $str->next_seq())'
>     #and swap the two lines below for an example of that.
> 
>              open(OUTFILE,'>',$debugfile);
>                print OUTFILE $input;
>               close(OUTFILE);
> 
> 
>    my $r = $factory->submit_blast($input);
> 
>                 open(OUTFILE,'>',$debugfile);
>              #   print OUTFILE $r;
>                 close(OUTFILE);
> 
> 
>    print STDERR "waiting...." if($v>0);
> 
>   while ( my @rids = $factory->each_rid ) {
>       open(OUTFILE,'>',$debugfile);
>             #   print OUTFILE "while entered";
>               close(OUTFILE);
>      foreach my $rid ( @rids ) {
> 
>                open(OUTFILE,'>',$debugfile);
>  #  print OUTFILE "foreach entered";
>               close(OUTFILE);
> 
>         my $rc = $factory->retrieve_blast($rid);
> 
>         if( !ref($rc) )
>         {
>         if( $rc < 0 )
>         {
>         $factory->remove_rid($rid);
>         }
>          open(OUTFILE,'>',$debugfile);
>               # print OUTFILE "if entered";
>               close(OUTFILE);
>          print STDERR "." if ( $v > 0 );
>          sleep 5;
>         }
>        else {
>               open(OUTFILE,'>',$debugfile);
>               # print OUTFILE "else entered";
>               close(OUTFILE);
> 
>           my $result = $rc->next_result();
>          #save the output
>         $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>           open(BLASTDEBUGFILE,'>',$blastdebugfile);
>           print BLASTDEBUGFILE $result->next_hit();
>           close(BLASTDEBUGFILE);
> 
>         my $filename = $serverpath."/blastdata_".
> time()."\.out";
> 
> 
>          # open(DEBUGFILE,'>',$debugfile);
>          # open(new,'>',$filename);
>          # @arra=<new>;
>          # print DEBUGFILE @arra;
>          # close(DEBUGFILE);
>          # close(new);
> 
>          $factory->save_output($filename);
> 
>        # open(BLASTDEBUGFILE,'>',$debugfile);
>        # print BLASTDEBUGFILE  "Hello $rid";
>        # close(BLASTDEBUGFILE);
> 
>        $factory->remove_rid($rid);
> 
>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>        print BLASTDEBUGFILE  $organism;
>         close(BLASTDEBUGFILE);
> 
>     # open(OUTFILE,'>',$outfile);
>     # print OUTFILE "Test2 $result->database_name()";
>     # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> $dummy=0;
> 
>    while ( my $hit = $result->next_hit ) {
> 
>             next unless ( $v >= 0);
> 
>           #     open(OUTFILE,'>',$debugfile);
>            #    print OUTFILE "$hit in while hits";
>             #  close(OUTFILE);
>  my $sequ = $gb->get_Seq_by_version($hit->name);
>            my $dna = $sequ->seq(); # get the sequence as a string
>         $dummy++;
>              open(OUTFILE,'>',$debugfile);
>           #     print OUTFILE $dummy;
>               close(OUTFILE);
>           push(@seqs,$dna);
>          }
>         }
>       }
>     }
>   }
> 
> $warum=@seqs;
>  open(OUTFILE,'>',$debugfile);
>              #  print OUTFILE $warum;
>                print OUTFILE @seqs;
> 
>               close(OUTFILE);
> return(@seqs);
> }
> 
> open(OUTFILE, '>',$outfile) || die ;
> 
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <meta http-equiv=\"expires\" content=\"0\"></head>\n
> <body>\n
> <p><font face=\"Courier, monospace font set\">
> Inputsequence: <br>";
> 
> 
> Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was  3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences.
> 
> Please help me in sorting out this problem.
> 
> Regards,
> Roopa.


From jiann-jy at hotmail.com  Sun Jan 24 21:03:55 2010
From: jiann-jy at hotmail.com (JY)
Date: Sun, 24 Jan 2010 18:03:55 -0800 (PST)
Subject: [Bioperl-l] how to retrieve accession number by taxon id??
Message-ID: <4cef88b5-fa53-4e63-9167-30075c10a058@k19g2000yqc.googlegroups.com>

i need to retrieve accession number and sequence to complete one of my
part in my project, but how to retrieve accession number  by the taxon
id.

From lpaulet at ual.es  Mon Jan 25 15:25:55 2010
From: lpaulet at ual.es (Lorenzo Carretero-Paulet)
Date: Mon, 25 Jan 2010 21:25:55 +0100
Subject: [Bioperl-l] HTMLResultWriter
Message-ID: <4B5DFE53.2000201@ual.es>

Hi all,

I'm trying to generate a subroutine that performs a BLAST search and 
returns the corresponding reports in txt, xml and html format. I?m 
experiencing problems with the latter, as the program returns the 
following error message:

"Can't call method "next_result" without a package or object reference 
at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e 
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e 
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                 -file   => ">$outputfilenameH");
while( my $result = _$blast_report_->next_result ) { # get a result from 
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo

From lpaulet at ual.es  Mon Jan 25 15:31:08 2010
From: lpaulet at ual.es (lpaulet at ual.es)
Date: Mon, 25 Jan 2010 21:31:08 +0100
Subject: [Bioperl-l] HTMLResultWriter
Message-ID: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>

Hi all,

I'm trying to generate a subroutine that performs a BLAST search and  
returns the corresponding reports in txt, xml and html format. I?m  
experiencing problems with the latter, as the program returns the  
following error message:

"Can't call method "next_result" without a package or object reference at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my $blast_report =qx/$blast -p blastp -d $database -i $query -e  
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e  
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                  -file   => ">$outputfilenameH");
while( my $result = $blast_report->next_result ) { # get a result from  
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


From dan.kortschak at adelaide.edu.au  Mon Jan 25 16:00:37 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 26 Jan 2010 07:30:37 +1030
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
Message-ID: <1264453237.4552.3.camel@epistle>

A reverse_translate to IUPAC degenerate codes is not a bad idea,
particularly for PCR primer design.

Dan

On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org
wrote:
> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:
> 
> > ...And there might be a case for adding the analogous
> reverse_translate().  
> 
> Bah.  Meant reverse_transcribe().  Ah well.
> 
> chris


From maj at fortinbras.us  Mon Jan 25 16:07:49 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 16:07:49 -0500
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>
References: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>
Message-ID: <F5772AAC495D475DBEEEF2311B16F941@NewLife>

Lorenzo--
your $blast_report is set to be (some of) the text returned
by a system call of a blast program; this isn't going to be
an object of any kind, and so no functions can be
called from it (as at "$blast_report->next_result"). You need
to parse the text generated by the blast call using Bio::SearchIO
to get a Bio::Search::Result::BlastResult object.
you could do

@blast_lines = qx/ ...your blast call... /;
open my $bf, ">my.blast";
print $bf, @blast_lines;
close $bf;
$blast_result = Bio::SearchIO->new(-file=>'my.blast',
                                                        -format => 'blast');

and carry on from there. But why not look at
Bio::Tools::Run::StandAloneBlast or
Bio::Tools::Run::StandAloneBlastPlus
to run your blasts within perl? These wrap the blast
programs and deliver BioPerl objects, rather than
plain text output.
cheers MAJ
----- Original Message ----- 
From: <lpaulet at ual.es>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 3:31 PM
Subject: [Bioperl-l] HTMLResultWriter


Hi all,

I'm trying to generate a subroutine that performs a BLAST search and
returns the corresponding reports in txt, xml and html format. I?m
experiencing problems with the latter, as the program returns the
following error message:

"Can't call method "next_result" without a package or object reference at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my $blast_report =qx/$blast -p blastp -d $database -i $query -e
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                  -file   => ">$outputfilenameH");
while( my $result = $blast_report->next_result ) { # get a result from
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Mon Jan 25 16:09:24 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 25 Jan 2010 22:09:24 +0100
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <4B5DFE53.2000201@ual.es>
References: <4B5DFE53.2000201@ual.es>
Message-ID: <FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>

> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/;

> while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory


_$blast_report_ is not a valid variable name, as far as I know. Plus there's a space between report and the final '_' in the first of the above two lines.

Does this code compile?

Dave


From Russell.Smithies at agresearch.co.nz  Mon Jan 25 16:14:15 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 26 Jan 2010 10:14:15 +1300
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
References: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>

That's a fair mix of incomplete code you've supplied!!
Did you read the documentation for RemoteBlast? The example there will do 99% of what you want.
http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm

I'm not entirely sure what you're trying to do (as you've left out a bit of your code) but I assume you're trying to retrieve and print the sequence for each hit.

Here's something that works, not sure exactly what/why you want to print but it should get you a bit further.

--Russell


================================
#!perl -w

use Bio::Tools::Run::RemoteBlast;
use Bio::DB::GenBank;

use CGI ':standard';

use strict;

my $q = new CGI;

my @params = (
               -prog         => 'blastn',
               -data         => 'nr',
               -expect       => '1e-30',
               -entrez_query => 'Homo sapiens [ORGN]',
               -readmethod   => 'SearchIO'
);

my $gb = Bio::DB::GenBank->new;

my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

#$v is just to turn on and off the messages
my $v = 1;

my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" );

while ( my $input = $str->next_seq() ) {

  my $r = $factory->submit_blast($input);

  print STDERR "waiting..." if ( $v > 0 );
  while ( my @rids = $factory->each_rid ) {
    foreach my $rid (@rids) {
      my @seqs = ();
      my $rc   = $factory->retrieve_blast($rid);
      if ( !ref($rc) ) {
        if ( $rc < 0 ) {
          $factory->remove_rid($rid);
        }
        print STDERR "." if ( $v > 0 );
        sleep 5;
      }
      else {
        my $result = $rc->next_result();

        #save the blast output
        my $filename = $result->query_accession . '.out';
        $factory->save_output($filename);
        $factory->remove_rid($rid);
        print "\nQuery Name: ", $result->query_name(), "\n";
        while ( my $hit = $result->next_hit ) {

          # store the hit sequences
          push @seqs, $gb->get_Seq_by_version( $hit->name );

          next unless ( $v > 0 );
          print "\thit name is ", $hit->name, "\n";
          while ( my $hsp = $hit->next_hsp ) {
            print "\t\tscore is ", $hsp->score, "\n";
          }
        }

        ## print the seqs you've retrieved??
        open( OUTFILE, '>', $result->query_accession . '.htm' );
        print OUTFILE $q->start_html('RNAi Result'),
          $q->h1('RNAi Result'),
          $q->h2('Input'),
          $q->pre( toString($input) ),
          $q->h2('Output');

        foreach (@seqs) {

          #there's probably a better way of printing the seq
          print OUTFILE $q->pre( toString($_) );
        }
        print OUTFILE $q->end_html;
        close OUTFILE;
      }
    }
  }
}

sub toString {
  my $s = shift;
  return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq;
}


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From biopython at maubp.freeserve.co.uk  Mon Jan 25 16:24:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 25 Jan 2010 21:24:33 +0000
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264453237.4552.3.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
Message-ID: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>

On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak
<dan.kortschak at adelaide.edu.au> wrote:
> A reverse_translate to IUPAC degenerate codes is not a bad idea,
> particularly for PCR primer design.

I would say it could be a bad idea. For any protein string there are
multiple possible back translations, and this cannot be captured
fully as a nucleotide string even using the IUPAC ambiguity chars.

We debated this back and forth for Biopython, and decided to leave it
out. It wasn't possible for a simple back translate to a simple string to
handle the use cases we considered, and other options like returning
a regular expression covering all possible back translations were too
complex (for a core sequence method/function).

Peter

From jason at bioperl.org  Mon Jan 25 16:26:55 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 25 Jan 2010 13:26:55 -0800
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
Message-ID: <98995830-DC7F-4404-A216-874EF5799DB6@bioperl.org>

It was already implemented several years ago -- reverse_translate  
Bio::Tools::CodonTable -> revtanslate


   my $seqobj    = Bio::PrimarySeq->new(-seq => 'FHGERHEL');
   my $iupac_str = $myCodonTable->reverse_translate_all($seqobj);


Chris had meant to say reverse_transcribe of RNA -> DNA FWIW.

-jason
On Jan 25, 2010, at 1:24 PM, Peter wrote:

> On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak
> <dan.kortschak at adelaide.edu.au> wrote:
>> A reverse_translate to IUPAC degenerate codes is not a bad idea,
>> particularly for PCR primer design.
>
> I would say it could be a bad idea. For any protein string there are
> multiple possible back translations, and this cannot be captured
> fully as a nucleotide string even using the IUPAC ambiguity chars.
>
> We debated this back and forth for Biopython, and decided to leave it
> out. It wasn't possible for a simple back translate to a simple  
> string to
> handle the use cases we considered, and other options like returning
> a regular expression covering all possible back translations were too
> complex (for a core sequence method/function).
>
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From maj at fortinbras.us  Mon Jan 25 16:19:24 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 16:19:24 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264453237.4552.3.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
Message-ID: <72B106F0D5FF4F1E858CC9BD1EF33142@NewLife>

I think we have that functionality in Bio::Tools::SeqPattern, 
courtesy of Bruno V---
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 4:00 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


>A reverse_translate to IUPAC degenerate codes is not a bad idea,
> particularly for PCR primer design.
> 
> Dan
> 
> On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org
> wrote:
>> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:
>> 
>> > ...And there might be a case for adding the analogous
>> reverse_translate().  
>> 
>> Bah.  Meant reverse_transcribe().  Ah well.
>> 
>> chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From dan.kortschak at adelaide.edu.au  Mon Jan 25 16:38:44 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 26 Jan 2010 08:08:44 +1030
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
Message-ID: <1264455524.4552.23.camel@epistle>

Good to see that these ideas have been considered.

I'd be interested to see this discussion, or at least the point dealing
with the problems that might arise. I'm at a loss as to how ambiguity
codes can't completely describe all possible coding sequences for any
given codon table (via Bio::Tools::CodonTable - in fact this already has
the revtranslate that could be fitted into a Bio::PrimarySeq method - to
answer Mark and Jason's comments, I think that /if/ a reverse_translate
method exists, it makes logical sense to have it tied to a sequence
object, calling the B:T:CT method on the seq object itself rather than
only in Bio::Tools, 2?). Pete, tcn you provide an example of the
problems?

thanks
Dan

On Mon, 2010-01-25 at 21:24 +0000, Peter wrote:
> I would say it could be a bad idea. For any protein string there are
> multiple possible back translations, and this cannot be captured
> fully as a nucleotide string even using the IUPAC ambiguity chars.


From lpaulet at ual.es  Mon Jan 25 16:53:07 2010
From: lpaulet at ual.es (lpaulet at ual.es)
Date: Mon, 25 Jan 2010 22:53:07 +0100
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>
References: <4B5DFE53.2000201@ual.es>
	<FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>
Message-ID: <20100125225307.2zl2cn2hkcsgccso@webmail.ual.es>

Thanks Dave and Mark.

Quoting Dave Messina <David.Messina at sbc.su.se>:

>> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e   
>> $E_value -b 20000 -o $outputfilenameB/;
>
>> while( my $result = _$blast_report_->next_result ) { # get a result  
>>  from Bio::SearchIO parsing or build it up in memory
>
>
> _$blast_report_ is not a valid variable name, as far as I know. Plus  
>  there's a space between report and the final '_' in the first of  
> the  above two lines.
>
> Does this code compile?
>
> Dave
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From rtbio.2009 at gmail.com  Mon Jan 25 17:35:32 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 23:35:32 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>
References: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>
Message-ID: <c7cac1601001251435k7b75ffbbj64cfa36faf8d89bb@mail.gmail.com>

Hello Russell,

Thank you very much for your reply. My problem is that Remote blast is
getting well executed with my code and I am getting the .out file with
sequences producing significant alignments. But, when I am trying to
retrieve the sequences into an array @seqs, I am able to retrieve all the
sequences except for the first hit. If the number of hits that I get in the
.out file to be 3, I am able to retrieve only 2 hits i.e., I am able to get
only 2 sequences. If there is only one significant hit for my sequence, then
the name and description of the sequence appears in the .out file, but I am
unable to get it into the array,the array count shows 0 and there would not
be any sequence in the array.

I hope that you have got me now.

Here comes my code,

use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds.";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";


open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes <br>
 This page will automatically reload in 30 seconds  <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'},$in{'Organism'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1,"\n";
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

            # open(OUTFILE,'>',$debugfile);
             #  print OUTFILE @params;
             # close(OUTFILE);


my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
"$organ\[ORGN]");

 #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");

while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);
 my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {


        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
       print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output

      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);


       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dna;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=scalar(@seqs);
              open(OUTFILE,'>',$debugfile);
               print OUTFILE $warum;
             #  print OUTFILE @seqs;
              close(OUTFILE);
      return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

        print OUTFILE substr ($in{'Inputseq'}, $i, 1);

        if ( ($i+1)%10==0){
                print OUTFILE " ";
        }
        if ( ($i+1)%60==0){
                print OUTFILE "<br>\n";
        }
}


print OUTFILE "</font> <p>";

$z=@compseqs;

for($k=0;$k<$z;$k++) {
        print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
Sequence: <br>";

        for ($i=0; $i<length ($compseqs[$k]); $i++) {

                print OUTFILE substr ($compseqs[$k], $i, 1);

                if ( ($i+1)%10==0){
                        print OUTFILE " ";
                }
                if ( ($i+1)%60==0){
                        print OUTFILE "<br>\n";
                }
        }
        print OUTFILE "<p></font>";
}

print OUTFILE "<p>
Window: <br>$in{'Windowsize'}
<p>
<p>
Threshold: <br>$in{'Threshold'}
<p>";
my $j=0;

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

        if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
                if ($out[$i]->{similar}<=$in{'Threshold'}){
                        $j=$in{'Windowsize'};
                }
                $height=$out[$i]->{similar}*5;
        }

        if ($j>0) {
                print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
height=\"5\">";
                $outstring .= "<font color=\"green\">".substr
($in{'Inputseq'}, $i, 1)."</font>";
                $j--;
        }
        else {
                print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
height=\"5\">";
                $outstring .= "<font color=\"red\">".substr
($in{'Inputseq'}, $i, 1)."</font>";
        }

        if ( ($i+1)%10==0){
                $outstring .= " ";
        }
        if ( ($i+1)%60==0){
                $outstring .= "<br>\n";

        }
        if ( ($i+1)%800==0){
                print OUTFILE "<br><br>\n";

        }
}

print OUTFILE "<br><br><font face=\"Courier, monospace font
set\">$outstring</font>";

#foreach (@out) {
#print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
#if ($_->{similar}<=$in{'Threshold'}){

#       }
#}

print OUTFILE "</BODY>\n</HTML>\n";

close OUTFILE;

#nameprint();

sub parse_form {
    local ($buffer, @pairs, $pair, $name, $value);
    # Read in text
    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
    if ($ENV{'REQUEST_METHOD'} eq "POST")
    {
      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    }
    else
    {
        $buffer = $ENV{'QUERY_STRING'};
    }
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs)
    {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $in{$name} = $value;
    }
}

Regards,
Roopa.


On Mon, Jan 25, 2010 at 10:14 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

> That's a fair mix of incomplete code you've supplied!!
> Did you read the documentation for RemoteBlast? The example there will do
> 99% of what you want.
> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm<http://search.cpan.org/%7Ecjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm>
>
> I'm not entirely sure what you're trying to do (as you've left out a bit of
> your code) but I assume you're trying to retrieve and print the sequence for
> each hit.
>
> Here's something that works, not sure exactly what/why you want to print
> but it should get you a bit further.
>
> --Russell
>
>
> ================================
> #!perl -w
>
> use Bio::Tools::Run::RemoteBlast;
> use Bio::DB::GenBank;
>
> use CGI ':standard';
>
> use strict;
>
> my $q = new CGI;
>
> my @params = (
>               -prog         => 'blastn',
>               -data         => 'nr',
>               -expect       => '1e-30',
>               -entrez_query => 'Homo sapiens [ORGN]',
>               -readmethod   => 'SearchIO'
> );
>
> my $gb = Bio::DB::GenBank->new;
>
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>
> #$v is just to turn on and off the messages
> my $v = 1;
>
> my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" );
>
> while ( my $input = $str->next_seq() ) {
>
>   my $r = $factory->submit_blast($input);
>
>   print STDERR "waiting..." if ( $v > 0 );
>  while ( my @rids = $factory->each_rid ) {
>     foreach my $rid (@rids) {
>      my @seqs = ();
>       my $rc   = $factory->retrieve_blast($rid);
>      if ( !ref($rc) ) {
>        if ( $rc < 0 ) {
>          $factory->remove_rid($rid);
>        }
>         print STDERR "." if ( $v > 0 );
>        sleep 5;
>      }
>      else {
>         my $result = $rc->next_result();
>
>         #save the blast output
>        my $filename = $result->query_accession . '.out';
>        $factory->save_output($filename);
>        $factory->remove_rid($rid);
>        print "\nQuery Name: ", $result->query_name(), "\n";
>         while ( my $hit = $result->next_hit ) {
>
>           # store the hit sequences
>          push @seqs, $gb->get_Seq_by_version( $hit->name );
>
>          next unless ( $v > 0 );
>          print "\thit name is ", $hit->name, "\n";
>          while ( my $hsp = $hit->next_hsp ) {
>            print "\t\tscore is ", $hsp->score, "\n";
>          }
>        }
>
>        ## print the seqs you've retrieved??
>        open( OUTFILE, '>', $result->query_accession . '.htm' );
>        print OUTFILE $q->start_html('RNAi Result'),
>          $q->h1('RNAi Result'),
>          $q->h2('Input'),
>          $q->pre( toString($input) ),
>          $q->h2('Output');
>
>        foreach (@seqs) {
>
>          #there's probably a better way of printing the seq
>          print OUTFILE $q->pre( toString($_) );
>        }
>        print OUTFILE $q->end_html;
>        close OUTFILE;
>      }
>    }
>  }
> }
>
> sub toString {
>  my $s = shift;
>  return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq;
> }
>
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>

From ajmackey at gmail.com  Tue Jan 26 08:24:43 2010
From: ajmackey at gmail.com (Aaron Mackey)
Date: Tue, 26 Jan 2010 08:24:43 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264455524.4552.23.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org> 
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> 
	<1264455524.4552.23.camel@epistle>
Message-ID: <24c96eca1001260524s3d46e850hfdcc461e22210972@mail.gmail.com>

There's also Bio::Tools::IUPAC; given a sequence with IUPAC ambiguity codes,
it provides a SeqIO stream that enumerates all the possible unambiguous
realizations.  Not the right solution for every situation, but quite useful
when you need it.

-Aaron


On Mon, Jan 25, 2010 at 4:38 PM, Dan Kortschak <
dan.kortschak at adelaide.edu.au> wrote:

> Good to see that these ideas have been considered.
>
> I'd be interested to see this discussion, or at least the point dealing
> with the problems that might arise. I'm at a loss as to how ambiguity
> codes can't completely describe all possible coding sequences for any
> given codon table (via Bio::Tools::CodonTable - in fact this already has
> the revtranslate that could be fitted into a Bio::PrimarySeq method - to
> answer Mark and Jason's comments, I think that /if/ a reverse_translate
> method exists, it makes logical sense to have it tied to a sequence
> object, calling the B:T:CT method on the seq object itself rather than
> only in Bio::Tools, 2?). Pete, tcn you provide an example of the
> problems?
>
> thanks
> Dan
>
> On Mon, 2010-01-25 at 21:24 +0000, Peter wrote:
> > I would say it could be a bad idea. For any protein string there are
> > multiple possible back translations, and this cannot be captured
> > fully as a nucleotide string even using the IUPAC ambiguity chars.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From nml5566 at gmail.com  Tue Jan 26 16:10:54 2010
From: nml5566 at gmail.com (Nathan Liles)
Date: Tue, 26 Jan 2010 15:10:54 -0600
Subject: [Bioperl-l] SVN access
Message-ID: <4B5F5A5E.2070406@gmail.com>

Does anyone know who I need to talk to for getting developer access for 
the Bioperl SVN? I want to submit a patch to the genbank2gff3 converter.

Thanks,
Nathan

From Russell.Smithies at agresearch.co.nz  Tue Jan 26 20:40:40 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 14:40:40 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>

Grrrrrr, I hate eutils!!!!

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
STACK: get_desc.pl:32
-----------------------------------------------------------


Nice error message though :-)


--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> Sent: Monday, 11 January 2010 10:05 a.m.
> To: 'Chris Fields'
> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> I've started to go off eUtils recently (not BioPerl's fault) as I've often
> been finding that with large queries, chunks of the resulting data is
> missing.
> For example, before Xmas I was creating species-specific databases by
> using eUtils to get a list of GI numbers back for a taxid, then retrieving
> the fasta sequences in chunks of 500.
> Very regularly, in the middle of the fasta there would be a message about
> resource unavailable eg.
>   >test_sequence_1
>   TACGATCATCGCTResource UnavailableTACGACTCTGCT
>   >test_sequence_2
>   TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> 
> Often this wasn't detected until formatdb complained about invalid
> characters.
> Inquiries to NCBI as to why this was happening and what to do about it
> returned stupid answers ("do each sequence manually thru the web
> interface", or "use eUtils").
> As we have a nice fast network connection, I now prefer to download very
> large gzip files (i.e. all of refseq) and extract what I need.
> 
> I can't help but think that NCBI could solve a lot of problems if they
> gzipped the output from eUtils queries - it's something I've requested
> regularly for the last 5 years or so!!
> 
> --Russell
> 
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Monday, 11 January 2010 9:50 a.m.
> > To: Smithies, Russell
> > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> >
> > One could also use Bio::DB::Taxonomy, which indexes the same files or
> > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
> > details).
> >
> > chris
> >
> > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >
> > > An alternate non-BioPerly way (that may be faster given NCBI's
> flakiness
> > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
> > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
> and
> > do lookups.
> > > In that same dir, taxdump.tar.gz contains a file called names.dmp
> which
> > lists taxids and descriptions (and synonyms)
> > >
> > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> > could do this:
> > >
> > >   my $taxid  = $gi_taxid_nucl{$accession};
> > >   my $org_name = $names{$taxid};
> > >
> > > --Russell
> > >
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > >> Sent: Saturday, 26 December 2009 4:52 p.m.
> > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >> number?
> > >>
> > >> Bhakti,
> > >> The following example (using EUtilities) may serve your purpose:
> > >>
> > >> use Bio::DB::EUtilities;
> > >>
> > >> my (%taxa, @taxa);
> > >> my (%names, %idmap);
> > >>
> > >> # these are protein ids; nuc ids will work by changing -dbfrom =>
> > >> 'nucleotide',
> > >> # (probably)
> > >>
> > >> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > >>
> > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > >>                                       -db => 'taxonomy',
> > >>                                       -dbfrom => 'protein',
> > >>                                       -correspondence => 1,
> > >>                                       -id => \@ids);
> > >>
> > >> # iterate through the LinkSet objects
> > >> while (my $ds = $factory->next_LinkSet) {
> > >>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > >> }
> > >>
> > >> @taxa = @taxa{@ids};
> > >>
> > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > >>        -db    => 'taxonomy',
> > >>        -id    => \@taxa );
> > >>
> > >> while (local $_ = $factory->next_DocSum) {
> > >>    $names{($_->get_contents_by_name('TaxId'))[0]} =
> > >> ($_->get_contents_by_name('ScientificName'))[0];
> > >> }
> > >>
> > >> foreach (@ids) {
> > >>    $idmap{$_} = $names{$taxa{$_}};
> > >> }
> > >>
> > >> # %idmap is
> > >> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > >> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > >> #    68536103 => 'Corynebacterium jeikeium K411'
> > >> #    730439 => 'Bacillus caldolyticus'
> > >> #    89318838 => undef    (this record has been removed from the db)
> > >>
> > >> 1;
> > >>
> > >> You probably will need to break up your 30000 into chunks
> > >> (say, 1000-3000 each), and do the above on each chunk with a
> > >>
> > >> sleep 3;
> > >>
> > >> or so separating the queries.
> > >> MAJ
> > >> ----- Original Message -----
> > >> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > >> To: <bioperl-l at lists.open-bio.org>
> > >> Sent: Friday, December 25, 2009 9:46 PM
> > >> Subject: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > >>
> > >>
> > >>> Hi,
> > >>>
> > >>> Does anyone know how to retrieve the "Source" or the "Species name"
> > >> given
> > >>> the accession number using Bioperl.   I have these 30,000 accession
> > >> numbers
> > >>> for which I need to get the source organisms.  Any kind of help will
> > be
> > >>> appreciated.
> > >>>
> > >>> Thanks
> > >>>
> > >>> BD
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>
> > >>>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> =======================================================================
> > > Attention: The information contained in this message and/or
> attachments
> > > from AgResearch Limited is intended only for the persons or entities
> > > to which it is addressed and may contain confidential and/or
> privileged
> > > material. Any review, retransmission, dissemination or other use of,
> or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipients is prohibited by
> AgResearch
> > > Limited. If you have received this message in error, please notify the
> > > sender immediately.
> > >
> =======================================================================
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 26 20:46:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 26 Jan 2010 19:46:26 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
Message-ID: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>

It's unfortunate but I have heard this problem popping up quite a bit more frequently lately.  Not to push too many buttons but NCBI isn't very forthcoming with help these days; they have become quite insular.  Not sure if they're short-staffed due to budget or if there are other issues.

chris

On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:

> Grrrrrr, I hate eutils!!!!
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> STACK: get_desc.pl:32
> -----------------------------------------------------------
> 
> 
> Nice error message though :-)
> 
> 
> --Russell
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>> Sent: Monday, 11 January 2010 10:05 a.m.
>> To: 'Chris Fields'
>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> I've started to go off eUtils recently (not BioPerl's fault) as I've often
>> been finding that with large queries, chunks of the resulting data is
>> missing.
>> For example, before Xmas I was creating species-specific databases by
>> using eUtils to get a list of GI numbers back for a taxid, then retrieving
>> the fasta sequences in chunks of 500.
>> Very regularly, in the middle of the fasta there would be a message about
>> resource unavailable eg.
>>> test_sequence_1
>>  TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>> test_sequence_2
>>  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>> 
>> Often this wasn't detected until formatdb complained about invalid
>> characters.
>> Inquiries to NCBI as to why this was happening and what to do about it
>> returned stupid answers ("do each sequence manually thru the web
>> interface", or "use eUtils").
>> As we have a nice fast network connection, I now prefer to download very
>> large gzip files (i.e. all of refseq) and extract what I need.
>> 
>> I can't help but think that NCBI could solve a lot of problems if they
>> gzipped the output from eUtils queries - it's something I've requested
>> regularly for the last 5 years or so!!
>> 
>> --Russell
>> 
>> 
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>> To: Smithies, Russell
>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>> 
>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
>>> details).
>>> 
>>> chris
>>> 
>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>> 
>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>> flakiness
>>> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>> and
>>> do lookups.
>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>> which
>>> lists taxids and descriptions (and synonyms)
>>>> 
>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>> could do this:
>>>> 
>>>>  my $taxid  = $gi_taxid_nucl{$accession};
>>>>  my $org_name = $names{$taxid};
>>>> 
>>>> --Russell
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>> 
>>>>> Bhakti,
>>>>> The following example (using EUtilities) may serve your purpose:
>>>>> 
>>>>> use Bio::DB::EUtilities;
>>>>> 
>>>>> my (%taxa, @taxa);
>>>>> my (%names, %idmap);
>>>>> 
>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>> 'nucleotide',
>>>>> # (probably)
>>>>> 
>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>> 
>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>                                      -db => 'taxonomy',
>>>>>                                      -dbfrom => 'protein',
>>>>>                                      -correspondence => 1,
>>>>>                                      -id => \@ids);
>>>>> 
>>>>> # iterate through the LinkSet objects
>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>   $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>> }
>>>>> 
>>>>> @taxa = @taxa{@ids};
>>>>> 
>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>       -db    => 'taxonomy',
>>>>>       -id    => \@taxa );
>>>>> 
>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>   $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>> }
>>>>> 
>>>>> foreach (@ids) {
>>>>>   $idmap{$_} = $names{$taxa{$_}};
>>>>> }
>>>>> 
>>>>> # %idmap is
>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>> 
>>>>> 1;
>>>>> 
>>>>> You probably will need to break up your 30000 into chunks
>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>> 
>>>>> sleep 3;
>>>>> 
>>>>> or so separating the queries.
>>>>> MAJ
>>>>> ----- Original Message -----
>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>>>> 
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>> given
>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>> numbers
>>>>>> for which I need to get the source organisms.  Any kind of help will
>>> be
>>>>>> appreciated.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> BD
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>> =======================================================================
>>>> Attention: The information contained in this message and/or
>> attachments
>>>> from AgResearch Limited is intended only for the persons or entities
>>>> to which it is addressed and may contain confidential and/or
>> privileged
>>>> material. Any review, retransmission, dissemination or other use of,
>> or
>>>> taking of any action in reliance upon, this information by persons or
>>>> entities other than the intended recipients is prohibited by
>> AgResearch
>>>> Limited. If you have received this message in error, please notify the
>>>> sender immediately.
>>>> 
>> =======================================================================
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Jan 26 20:59:15 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 14:59:15 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>

I've had a wide selection of errors lately:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
STACK: get_desc.pl:32
-----------------------------------------------------------

And I never get a good explanation from NCBI or suggestions on how to avoid it.


--Russell
	

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 27 January 2010 2:46 p.m.
> To: Smithies, Russell
> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> It's unfortunate but I have heard this problem popping up quite a bit more
> frequently lately.  Not to push too many buttons but NCBI isn't very
> forthcoming with help these days; they have become quite insular.  Not
> sure if they're short-staffed due to budget or if there are other issues.
> 
> chris
> 
> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> 
> > Grrrrrr, I hate eutils!!!!
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> (Connection refused)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > STACK: Bio::Tools::EUtilities::parse_data
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > STACK: Bio::Tools::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > STACK: Bio::DB::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > STACK: get_desc.pl:32
> > -----------------------------------------------------------
> >
> >
> > Nice error message though :-)
> >
> >
> > --Russell
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> >> Sent: Monday, 11 January 2010 10:05 a.m.
> >> To: 'Chris Fields'
> >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> I've started to go off eUtils recently (not BioPerl's fault) as I've
> often
> >> been finding that with large queries, chunks of the resulting data is
> >> missing.
> >> For example, before Xmas I was creating species-specific databases by
> >> using eUtils to get a list of GI numbers back for a taxid, then
> retrieving
> >> the fasta sequences in chunks of 500.
> >> Very regularly, in the middle of the fasta there would be a message
> about
> >> resource unavailable eg.
> >>> test_sequence_1
> >>  TACGATCATCGCTResource UnavailableTACGACTCTGCT
> >>> test_sequence_2
> >>  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> >>
> >> Often this wasn't detected until formatdb complained about invalid
> >> characters.
> >> Inquiries to NCBI as to why this was happening and what to do about it
> >> returned stupid answers ("do each sequence manually thru the web
> >> interface", or "use eUtils").
> >> As we have a nice fast network connection, I now prefer to download
> very
> >> large gzip files (i.e. all of refseq) and extract what I need.
> >>
> >> I can't help but think that NCBI could solve a lot of problems if they
> >> gzipped the output from eUtils queries - it's something I've requested
> >> regularly for the last 5 years or so!!
> >>
> >> --Russell
> >>
> >>
> >>> -----Original Message-----
> >>> From: Chris Fields [mailto:cjfields at illinois.edu]
> >>> Sent: Monday, 11 January 2010 9:50 a.m.
> >>> To: Smithies, Russell
> >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >>> number?
> >>>
> >>> One could also use Bio::DB::Taxonomy, which indexes the same files or
> >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> the
> >>> details).
> >>>
> >>> chris
> >>>
> >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >>>
> >>>> An alternate non-BioPerly way (that may be faster given NCBI's
> >> flakiness
> >>> lately) would be to download the gi_taxid_nucl.zip or
> gi_taxid_prot.zip
> >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
> >> and
> >>> do lookups.
> >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> >> which
> >>> lists taxids and descriptions (and synonyms)
> >>>>
> >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> >>> could do this:
> >>>>
> >>>>  my $taxid  = $gi_taxid_nucl{$accession};
> >>>>  my $org_name = $names{$taxid};
> >>>>
> >>>> --Russell
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> >>>>> number?
> >>>>>
> >>>>> Bhakti,
> >>>>> The following example (using EUtilities) may serve your purpose:
> >>>>>
> >>>>> use Bio::DB::EUtilities;
> >>>>>
> >>>>> my (%taxa, @taxa);
> >>>>> my (%names, %idmap);
> >>>>>
> >>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >>>>> 'nucleotide',
> >>>>> # (probably)
> >>>>>
> >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>>>>
> >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>>>>                                      -db => 'taxonomy',
> >>>>>                                      -dbfrom => 'protein',
> >>>>>                                      -correspondence => 1,
> >>>>>                                      -id => \@ids);
> >>>>>
> >>>>> # iterate through the LinkSet objects
> >>>>> while (my $ds = $factory->next_LinkSet) {
> >>>>>   $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >>>>> }
> >>>>>
> >>>>> @taxa = @taxa{@ids};
> >>>>>
> >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>>>>       -db    => 'taxonomy',
> >>>>>       -id    => \@taxa );
> >>>>>
> >>>>> while (local $_ = $factory->next_DocSum) {
> >>>>>   $names{($_->get_contents_by_name('TaxId'))[0]} =
> >>>>> ($_->get_contents_by_name('ScientificName'))[0];
> >>>>> }
> >>>>>
> >>>>> foreach (@ids) {
> >>>>>   $idmap{$_} = $names{$taxa{$_}};
> >>>>> }
> >>>>>
> >>>>> # %idmap is
> >>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> >>>>> #    730439 => 'Bacillus caldolyticus'
> >>>>> #    89318838 => undef    (this record has been removed from the db)
> >>>>>
> >>>>> 1;
> >>>>>
> >>>>> You probably will need to break up your 30000 into chunks
> >>>>> (say, 1000-3000 each), and do the above on each chunk with a
> >>>>>
> >>>>> sleep 3;
> >>>>>
> >>>>> or so separating the queries.
> >>>>> MAJ
> >>>>> ----- Original Message -----
> >>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >>>>> To: <bioperl-l at lists.open-bio.org>
> >>>>> Sent: Friday, December 25, 2009 9:46 PM
> >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> >>> number?
> >>>>>
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
> >>>>> given
> >>>>>> the accession number using Bioperl.   I have these 30,000 accession
> >>>>> numbers
> >>>>>> for which I need to get the source organisms.  Any kind of help
> will
> >>> be
> >>>>>> appreciated.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> BD
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >> =======================================================================
> >>>> Attention: The information contained in this message and/or
> >> attachments
> >>>> from AgResearch Limited is intended only for the persons or entities
> >>>> to which it is addressed and may contain confidential and/or
> >> privileged
> >>>> material. Any review, retransmission, dissemination or other use of,
> >> or
> >>>> taking of any action in reliance upon, this information by persons or
> >>>> entities other than the intended recipients is prohibited by
> >> AgResearch
> >>>> Limited. If you have received this message in error, please notify
> the
> >>>> sender immediately.
> >>>>
> >> =======================================================================
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 26 21:42:22 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 26 Jan 2010 20:42:22 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
Message-ID: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>

Makes me wonder if they're pushing more users towards the SOAP-based services and away from eutils.

chris

On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:

> I've had a wide selection of errors lately:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> STACK: get_desc.pl:32
> -----------------------------------------------------------
> 
> And I never get a good explanation from NCBI or suggestions on how to avoid it.
> 
> 
> --Russell
> 	
> 
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields at illinois.edu]
>> Sent: Wednesday, 27 January 2010 2:46 p.m.
>> To: Smithies, Russell
>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> It's unfortunate but I have heard this problem popping up quite a bit more
>> frequently lately.  Not to push too many buttons but NCBI isn't very
>> forthcoming with help these days; they have become quite insular.  Not
>> sure if they're short-staffed due to budget or if there are other issues.
>> 
>> chris
>> 
>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
>> 
>>> Grrrrrr, I hate eutils!!!!
>>> 
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
>> (Connection refused)
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>> STACK: Bio::Tools::EUtilities::parse_data
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>>> STACK: Bio::Tools::EUtilities::get_ids
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>>> STACK: Bio::DB::EUtilities::get_ids
>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>>> STACK: get_desc.pl:32
>>> -----------------------------------------------------------
>>> 
>>> 
>>> Nice error message though :-)
>>> 
>>> 
>>> --Russell
>>> 
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>>>> Sent: Monday, 11 January 2010 10:05 a.m.
>>>> To: 'Chris Fields'
>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>> number?
>>>> 
>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
>> often
>>>> been finding that with large queries, chunks of the resulting data is
>>>> missing.
>>>> For example, before Xmas I was creating species-specific databases by
>>>> using eUtils to get a list of GI numbers back for a taxid, then
>> retrieving
>>>> the fasta sequences in chunks of 500.
>>>> Very regularly, in the middle of the fasta there would be a message
>> about
>>>> resource unavailable eg.
>>>>> test_sequence_1
>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>>>> test_sequence_2
>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>>>> 
>>>> Often this wasn't detected until formatdb complained about invalid
>>>> characters.
>>>> Inquiries to NCBI as to why this was happening and what to do about it
>>>> returned stupid answers ("do each sequence manually thru the web
>>>> interface", or "use eUtils").
>>>> As we have a nice fast network connection, I now prefer to download
>> very
>>>> large gzip files (i.e. all of refseq) and extract what I need.
>>>> 
>>>> I can't help but think that NCBI could solve a lot of problems if they
>>>> gzipped the output from eUtils queries - it's something I've requested
>>>> regularly for the last 5 years or so!!
>>>> 
>>>> --Russell
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>>>> To: Smithies, Russell
>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>> 
>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
>> the
>>>>> details).
>>>>> 
>>>>> chris
>>>>> 
>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>>>> 
>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>>>> flakiness
>>>>> lately) would be to download the gi_taxid_nucl.zip or
>> gi_taxid_prot.zip
>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>>>> and
>>>>> do lookups.
>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>>>> which
>>>>> lists taxids and descriptions (and synonyms)
>>>>>> 
>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>>>> could do this:
>>>>>> 
>>>>>> my $taxid  = $gi_taxid_nucl{$accession};
>>>>>> my $org_name = $names{$taxid};
>>>>>> 
>>>>>> --Russell
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
>> accession
>>>>>>> number?
>>>>>>> 
>>>>>>> Bhakti,
>>>>>>> The following example (using EUtilities) may serve your purpose:
>>>>>>> 
>>>>>>> use Bio::DB::EUtilities;
>>>>>>> 
>>>>>>> my (%taxa, @taxa);
>>>>>>> my (%names, %idmap);
>>>>>>> 
>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>>>> 'nucleotide',
>>>>>>> # (probably)
>>>>>>> 
>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>>>> 
>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>>>                                     -db => 'taxonomy',
>>>>>>>                                     -dbfrom => 'protein',
>>>>>>>                                     -correspondence => 1,
>>>>>>>                                     -id => \@ids);
>>>>>>> 
>>>>>>> # iterate through the LinkSet objects
>>>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>>>> }
>>>>>>> 
>>>>>>> @taxa = @taxa{@ids};
>>>>>>> 
>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>>>      -db    => 'taxonomy',
>>>>>>>      -id    => \@taxa );
>>>>>>> 
>>>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>>>> }
>>>>>>> 
>>>>>>> foreach (@ids) {
>>>>>>>  $idmap{$_} = $names{$taxa{$_}};
>>>>>>> }
>>>>>>> 
>>>>>>> # %idmap is
>>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>>>> 
>>>>>>> 1;
>>>>>>> 
>>>>>>> You probably will need to break up your 30000 into chunks
>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>>>> 
>>>>>>> sleep 3;
>>>>>>> 
>>>>>>> or so separating the queries.
>>>>>>> MAJ
>>>>>>> ----- Original Message -----
>>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>>>> 
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>>>> given
>>>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>>>> numbers
>>>>>>>> for which I need to get the source organisms.  Any kind of help
>> will
>>>>> be
>>>>>>>> appreciated.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> BD
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> 
>>>> =======================================================================
>>>>>> Attention: The information contained in this message and/or
>>>> attachments
>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>> to which it is addressed and may contain confidential and/or
>>>> privileged
>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>> or
>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>> entities other than the intended recipients is prohibited by
>>>> AgResearch
>>>>>> Limited. If you have received this message in error, please notify
>> the
>>>>>> sender immediately.
>>>>>> 
>>>> =======================================================================
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Jan 26 21:45:58 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 15:45:58 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>

Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today).

--Russell

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 27 January 2010 3:42 p.m.
> To: Smithies, Russell
> Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Makes me wonder if they're pushing more users towards the SOAP-based
> services and away from eutils.
> 
> chris
> 
> On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> 
> > I've had a wide selection of errors lately:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource
> temporarily unavailable)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > STACK: Bio::Tools::EUtilities::parse_data
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > STACK: Bio::Tools::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > STACK: Bio::DB::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > STACK: get_desc.pl:32
> > -----------------------------------------------------------
> >
> > And I never get a good explanation from NCBI or suggestions on how to
> avoid it.
> >
> >
> > --Russell
> >
> >
> >> -----Original Message-----
> >> From: Chris Fields [mailto:cjfields at illinois.edu]
> >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> >> To: Smithies, Russell
> >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> It's unfortunate but I have heard this problem popping up quite a bit
> more
> >> frequently lately.  Not to push too many buttons but NCBI isn't very
> >> forthcoming with help these days; they have become quite insular.  Not
> >> sure if they're short-staffed due to budget or if there are other
> issues.
> >>
> >> chris
> >>
> >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> >>
> >>> Grrrrrr, I hate eutils!!!!
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> >> (Connection refused)
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> >>> STACK: Bio::Tools::EUtilities::parse_data
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> >>> STACK: Bio::Tools::EUtilities::get_ids
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> >>> STACK: Bio::DB::EUtilities::get_ids
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> >>> STACK: get_desc.pl:32
> >>> -----------------------------------------------------------
> >>>
> >>>
> >>> Nice error message though :-)
> >>>
> >>>
> >>> --Russell
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> >>>> To: 'Chris Fields'
> >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> bio.org'
> >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >>>> number?
> >>>>
> >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
> >> often
> >>>> been finding that with large queries, chunks of the resulting data is
> >>>> missing.
> >>>> For example, before Xmas I was creating species-specific databases by
> >>>> using eUtils to get a list of GI numbers back for a taxid, then
> >> retrieving
> >>>> the fasta sequences in chunks of 500.
> >>>> Very regularly, in the middle of the fasta there would be a message
> >> about
> >>>> resource unavailable eg.
> >>>>> test_sequence_1
> >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> >>>>> test_sequence_2
> >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> >>>>
> >>>> Often this wasn't detected until formatdb complained about invalid
> >>>> characters.
> >>>> Inquiries to NCBI as to why this was happening and what to do about
> it
> >>>> returned stupid answers ("do each sequence manually thru the web
> >>>> interface", or "use eUtils").
> >>>> As we have a nice fast network connection, I now prefer to download
> >> very
> >>>> large gzip files (i.e. all of refseq) and extract what I need.
> >>>>
> >>>> I can't help but think that NCBI could solve a lot of problems if
> they
> >>>> gzipped the output from eUtils queries - it's something I've
> requested
> >>>> regularly for the last 5 years or so!!
> >>>>
> >>>> --Russell
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> >>>>> To: Smithies, Russell
> >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> bio.org'
> >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> >>>>> number?
> >>>>>
> >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files
> or
> >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> >> the
> >>>>> details).
> >>>>>
> >>>>> chris
> >>>>>
> >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >>>>>
> >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> >>>> flakiness
> >>>>> lately) would be to download the gi_taxid_nucl.zip or
> >> gi_taxid_prot.zip
> >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a
> hash
> >>>> and
> >>>>> do lookups.
> >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> >>>> which
> >>>>> lists taxids and descriptions (and synonyms)
> >>>>>>
> >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so
> I
> >>>>> could do this:
> >>>>>>
> >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> >>>>>> my $org_name = $names{$taxid};
> >>>>>>
> >>>>>> --Russell
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> >> accession
> >>>>>>> number?
> >>>>>>>
> >>>>>>> Bhakti,
> >>>>>>> The following example (using EUtilities) may serve your purpose:
> >>>>>>>
> >>>>>>> use Bio::DB::EUtilities;
> >>>>>>>
> >>>>>>> my (%taxa, @taxa);
> >>>>>>> my (%names, %idmap);
> >>>>>>>
> >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >>>>>>> 'nucleotide',
> >>>>>>> # (probably)
> >>>>>>>
> >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>>>>>>
> >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>>>>>>                                     -db => 'taxonomy',
> >>>>>>>                                     -dbfrom => 'protein',
> >>>>>>>                                     -correspondence => 1,
> >>>>>>>                                     -id => \@ids);
> >>>>>>>
> >>>>>>> # iterate through the LinkSet objects
> >>>>>>> while (my $ds = $factory->next_LinkSet) {
> >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >>>>>>> }
> >>>>>>>
> >>>>>>> @taxa = @taxa{@ids};
> >>>>>>>
> >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>>>>>>      -db    => 'taxonomy',
> >>>>>>>      -id    => \@taxa );
> >>>>>>>
> >>>>>>> while (local $_ = $factory->next_DocSum) {
> >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> >>>>>>> }
> >>>>>>>
> >>>>>>> foreach (@ids) {
> >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> >>>>>>> }
> >>>>>>>
> >>>>>>> # %idmap is
> >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> >>>>>>> #    730439 => 'Bacillus caldolyticus'
> >>>>>>> #    89318838 => undef    (this record has been removed from the
> db)
> >>>>>>>
> >>>>>>> 1;
> >>>>>>>
> >>>>>>> You probably will need to break up your 30000 into chunks
> >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> >>>>>>>
> >>>>>>> sleep 3;
> >>>>>>>
> >>>>>>> or so separating the queries.
> >>>>>>> MAJ
> >>>>>>> ----- Original Message -----
> >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >>>>>>> To: <bioperl-l at lists.open-bio.org>
> >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> >>>>> number?
> >>>>>>>
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> name"
> >>>>>>> given
> >>>>>>>> the accession number using Bioperl.   I have these 30,000
> accession
> >>>>>>> numbers
> >>>>>>>> for which I need to get the source organisms.  Any kind of help
> >> will
> >>>>> be
> >>>>>>>> appreciated.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> BD
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>
> =======================================================================
> >>>>>> Attention: The information contained in this message and/or
> >>>> attachments
> >>>>>> from AgResearch Limited is intended only for the persons or
> entities
> >>>>>> to which it is addressed and may contain confidential and/or
> >>>> privileged
> >>>>>> material. Any review, retransmission, dissemination or other use
> of,
> >>>> or
> >>>>>> taking of any action in reliance upon, this information by persons
> or
> >>>>>> entities other than the intended recipients is prohibited by
> >>>> AgResearch
> >>>>>> Limited. If you have received this message in error, please notify
> >> the
> >>>>>> sender immediately.
> >>>>>>
> >>>>
> =======================================================================
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Jan 27 10:14:22 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 27 Jan 2010 10:14:22 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com><1CE23DE1068C4FA2BD543D167A1AA901@NewLife><18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz><F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz><4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
Message-ID: <C1C922A99DF24679955608955B2A73B1@NewLife>

Precisely the MO behind SoapEU...get the jump on 'em.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
Cc: <bioperl-l at lists.open-bio.org>; "'Mark A. Jensen'" <maj at fortinbras.us>
Sent: Tuesday, January 26, 2010 9:42 PM
Subject: Re: [Bioperl-l] how to retrieve organism name from accession number?


> Makes me wonder if they're pushing more users towards the SOAP-based services 
> and away from eutils.
>
> chris
>
> On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
>
>> I've had a wide selection of errors lately:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource 
>> temporarily unavailable)
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>> STACK: Bio::Tools::EUtilities::parse_data 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>> STACK: Bio::Tools::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>> STACK: Bio::DB::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>> STACK: get_desc.pl:32
>> -----------------------------------------------------------
>>
>> And I never get a good explanation from NCBI or suggestions on how to avoid 
>> it.
>>
>>
>> --Russell
>>
>>
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Wednesday, 27 January 2010 2:46 p.m.
>>> To: Smithies, Russell
>>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>>
>>> It's unfortunate but I have heard this problem popping up quite a bit more
>>> frequently lately.  Not to push too many buttons but NCBI isn't very
>>> forthcoming with help these days; they have become quite insular.  Not
>>> sure if they're short-staffed due to budget or if there are other issues.
>>>
>>> chris
>>>
>>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
>>>
>>>> Grrrrrr, I hate eutils!!!!
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
>>> (Connection refused)
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>>> STACK: Bio::Tools::EUtilities::parse_data
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>>>> STACK: Bio::Tools::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>>>> STACK: Bio::DB::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>>>> STACK: get_desc.pl:32
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>> Nice error message though :-)
>>>>
>>>>
>>>> --Russell
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>>>>> Sent: Monday, 11 January 2010 10:05 a.m.
>>>>> To: 'Chris Fields'
>>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>>
>>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
>>> often
>>>>> been finding that with large queries, chunks of the resulting data is
>>>>> missing.
>>>>> For example, before Xmas I was creating species-specific databases by
>>>>> using eUtils to get a list of GI numbers back for a taxid, then
>>> retrieving
>>>>> the fasta sequences in chunks of 500.
>>>>> Very regularly, in the middle of the fasta there would be a message
>>> about
>>>>> resource unavailable eg.
>>>>>> test_sequence_1
>>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>>>>> test_sequence_2
>>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>>>>>
>>>>> Often this wasn't detected until formatdb complained about invalid
>>>>> characters.
>>>>> Inquiries to NCBI as to why this was happening and what to do about it
>>>>> returned stupid answers ("do each sequence manually thru the web
>>>>> interface", or "use eUtils").
>>>>> As we have a nice fast network connection, I now prefer to download
>>> very
>>>>> large gzip files (i.e. all of refseq) and extract what I need.
>>>>>
>>>>> I can't help but think that NCBI could solve a lot of problems if they
>>>>> gzipped the output from eUtils queries - it's something I've requested
>>>>> regularly for the last 5 years or so!!
>>>>>
>>>>> --Russell
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>>>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>>>>> To: Smithies, Russell
>>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>
>>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
>>> the
>>>>>> details).
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>>>>>
>>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>>>>> flakiness
>>>>>> lately) would be to download the gi_taxid_nucl.zip or
>>> gi_taxid_prot.zip
>>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>>>>> and
>>>>>> do lookups.
>>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>>>>> which
>>>>>> lists taxids and descriptions (and synonyms)
>>>>>>>
>>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>>>>> could do this:
>>>>>>>
>>>>>>> my $taxid  = $gi_taxid_nucl{$accession};
>>>>>>> my $org_name = $names{$taxid};
>>>>>>>
>>>>>>> --Russell
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
>>> accession
>>>>>>>> number?
>>>>>>>>
>>>>>>>> Bhakti,
>>>>>>>> The following example (using EUtilities) may serve your purpose:
>>>>>>>>
>>>>>>>> use Bio::DB::EUtilities;
>>>>>>>>
>>>>>>>> my (%taxa, @taxa);
>>>>>>>> my (%names, %idmap);
>>>>>>>>
>>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>>>>> 'nucleotide',
>>>>>>>> # (probably)
>>>>>>>>
>>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>>>>>
>>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>>>>                                     -db => 'taxonomy',
>>>>>>>>                                     -dbfrom => 'protein',
>>>>>>>>                                     -correspondence => 1,
>>>>>>>>                                     -id => \@ids);
>>>>>>>>
>>>>>>>> # iterate through the LinkSet objects
>>>>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>>>>> }
>>>>>>>>
>>>>>>>> @taxa = @taxa{@ids};
>>>>>>>>
>>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>>>>      -db    => 'taxonomy',
>>>>>>>>      -id    => \@taxa );
>>>>>>>>
>>>>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>>>>> }
>>>>>>>>
>>>>>>>> foreach (@ids) {
>>>>>>>>  $idmap{$_} = $names{$taxa{$_}};
>>>>>>>> }
>>>>>>>>
>>>>>>>> # %idmap is
>>>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>>>>>
>>>>>>>> 1;
>>>>>>>>
>>>>>>>> You probably will need to break up your 30000 into chunks
>>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>>>>>
>>>>>>>> sleep 3;
>>>>>>>>
>>>>>>>> or so separating the queries.
>>>>>>>> MAJ
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>>>>> given
>>>>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>>>>> numbers
>>>>>>>>> for which I need to get the source organisms.  Any kind of help
>>> will
>>>>>> be
>>>>>>>>> appreciated.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> BD
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>> =======================================================================
>>>>>>> Attention: The information contained in this message and/or
>>>>> attachments
>>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>>> to which it is addressed and may contain confidential and/or
>>>>> privileged
>>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>>> or
>>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>>> entities other than the intended recipients is prohibited by
>>>>> AgResearch
>>>>>>> Limited. If you have received this message in error, please notify
>>> the
>>>>>>> sender immediately.
>>>>>>>
>>>>> =======================================================================
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bhakti.dwivedi at gmail.com  Wed Jan 27 14:42:06 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Wed, 27 Jan 2010 14:42:06 -0500
Subject: [Bioperl-l] Designing primers from multiple sequence alignment of
	amino acid sequences
Message-ID: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>

Hi,

I have to design primers from the multiple sequence alignments of amino acid
sequences.  The sequences I am working with are quite diverged and often the
available primer design programs (such as CODEHOP/iCODEHOP) fail to find any
primer sets.   But, when I look  at the alignment manually, I could see the
regions that I could use to make primers.

So I  designed the degenerate primers the old-fashioned way, starting from
selecting the conserved regions (6-10aa long) from the alignment  to
translating the selected regions to DNA using the appropriate codon usage
table, and then finally checking the primer sets (potential forward and
reverse primers) using tools like OLIGOANALYZER.  In the end, I did find few
good primer sets, but getting them to work in reality is something I will
have to wait and see.

While doing this process manually, I really felt the need to automate it (it
was not just one alignment I did, I worked with several of those).   I was
wondering if there is anyway bioperl can help me here, or making a perl
script is the only way to go.

I would appreciate your suggestions/comments.  Thanks!  (apologize for a
long email..)


Regards
Bhakti

From Kevin.M.Brown at asu.edu  Wed Jan 27 15:23:57 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 27 Jan 2010 13:23:57 -0700
Subject: [Bioperl-l] Designing primers from multiple sequence alignment
	ofamino acid sequences
In-Reply-To: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>
References: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4068498DB@EX02.asurite.ad.asu.edu>

Bioperl is just a collection of tools, not a full blown application.
Most of what you want can be done with the objects available from within
the toolkit, but the application (perl script) would still need to be
written to put the objects to use. You could use clustalw from within
perl to align the sequences (Bio::Tools::Run::Alignment::Clustalw), find
the conserved regions (Bio::SimpleAlign), reverse translate them
(Bio::Tools::CodonTable), then come up with an algorithm for primer
analysis and selction (or even use other apps like primer3
(Bio::Tools::Run::Primer3) from within perl).

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Bhakti Dwivedi
> Sent: Wednesday, January 27, 2010 12:42 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Designing primers from multiple sequence 
> alignment ofamino acid sequences
> 
> Hi,
> 
> I have to design primers from the multiple sequence 
> alignments of amino acid
> sequences.  The sequences I am working with are quite 
> diverged and often the
> available primer design programs (such as CODEHOP/iCODEHOP) 
> fail to find any
> primer sets.   But, when I look  at the alignment manually, I 
> could see the
> regions that I could use to make primers.
> 
> So I  designed the degenerate primers the old-fashioned way, 
> starting from
> selecting the conserved regions (6-10aa long) from the alignment  to
> translating the selected regions to DNA using the appropriate 
> codon usage
> table, and then finally checking the primer sets (potential 
> forward and
> reverse primers) using tools like OLIGOANALYZER.  In the end, 
> I did find few
> good primer sets, but getting them to work in reality is 
> something I will
> have to wait and see.
> 
> While doing this process manually, I really felt the need to 
> automate it (it
> was not just one alignment I did, I worked with several of 
> those).   I was
> wondering if there is anyway bioperl can help me here, or 
> making a perl
> script is the only way to go.
> 
> I would appreciate your suggestions/comments.  Thanks!  
> (apologize for a
> long email..)
> 
> 
> Regards
> Bhakti
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From mike.stubbington at bbsrc.ac.uk  Thu Jan 28 10:41:49 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Thu, 28 Jan 2010 15:41:49 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
Message-ID: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>

Dear all,

I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. 

I have perl code that creates an array of bioperl sequence objects called @primers

I then create a StandAloneBlastPlus factory using the following code?

	my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
		-db_dir => '/Users/stubbing/localBlast/',
		-db_name => 'MouseGenome'
	);

and then attempt to blast my primers using this?

	my @shortPrimers;
	my $count=1;
	foreach (@primers) {
		my $currentSeq = $_;
		print "Checking primer $count/$primerNumber ";
		if ($_->length < 40) {
			push(@shortPrimers,$_);
			print "Too short!\n";
		}
		else {
			print "BLASTing...";
			my $blastResult = $blastFactory->blastn(-query => $currentSeq);
		}
		$count++;
	}

This fails with the following error?

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> line 532.

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

Line 63 in my code is (as you might expect) the one that calls blastn on my factory object.

I'd appreciate any help you might be able to provide to shed light on this.

Thanks in advance,

Mike


From maj at fortinbras.us  Thu Jan 28 10:56:14 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 10:56:14 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
Message-ID: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>

Mike - please try updating your bioperl-live (the core) to the latest code 
(revision 16761 or so).
CommandExts is a work in progress; from the stack errors it looks like you've 
got an older version.
Try it then ping us back, if you would--
Thanks
Mark
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 10:41 AM
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error 
running blastn


Dear all,

I am attempting to blast some primers against the mouse genome. I have created a 
local mouse genome blast database and I can search against it using 'blastn' at 
the command line.

I have perl code that creates an array of bioperl sequence objects called 
@primers

I then create a StandAloneBlastPlus factory using the following code?

my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_dir => '/Users/stubbing/localBlast/',
-db_name => 'MouseGenome'
);

and then attempt to blast my primers using this?

my @shortPrimers;
my $count=1;
foreach (@primers) {
my $currentSeq = $_;
print "Checking primer $count/$primerNumber ";
if ($_->length < 40) {
push(@shortPrimers,$_);
print "Too short!\n";
}
else {
print "BLASTing...";
my $blastResult = $blastFactory->blastn(-query => $currentSeq);
}
$count++;
}

This fails with the following error?

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

Line 63 in my code is (as you might expect) the one that calls blastn on my 
factory object.

I'd appreciate any help you might be able to provide to shed light on this.

Thanks in advance,

Mike


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mike.stubbington at bbsrc.ac.uk  Thu Jan 28 11:18:12 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Thu, 28 Jan 2010 16:18:12 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
In-Reply-To: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
	<56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
Message-ID: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>

Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> line 532.

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code 
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've 
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error 
> running blastn
> 
> 
> Dear all,
> 
> I am attempting to blast some primers against the mouse genome. I have created a 
> local mouse genome blast database and I can search against it using 'blastn' at 
> the command line.
> 
> I have perl code that creates an array of bioperl sequence objects called 
> @primers
> 
> I then create a StandAloneBlastPlus factory using the following code?
> 
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
> 
> and then attempt to blast my primers using this?
> 
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
> 
> This fails with the following error?
> 
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> 
> line 532.
> 
> STACK Bio::Tools::Run::WrapperBase::_run 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
> 
> Line 63 in my code is (as you might expect) the one that calls blastn on my 
> factory object.
> 
> I'd appreciate any help you might be able to provide to shed light on this.
> 
> Thanks in advance,
> 
> Mike
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Thu Jan 28 11:28:52 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 11:28:52 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
	<56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
Message-ID: <C7FF329BCA044F19B3D690FE67319192@NewLife>

Thanks Mike-- will have a look asap- cheers MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 11:18 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
> running blastn
>
>
> Dear all,
>
> I am attempting to blast some primers against the mouse genome. I have created 
> a
> local mouse genome blast database and I can search against it using 'blastn' 
> at
> the command line.
>
> I have perl code that creates an array of bioperl sequence objects called
> @primers
>
> I then create a StandAloneBlastPlus factory using the following code?
>
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
>
> and then attempt to blast my primers using this?
>
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
>
> This fails with the following error?
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> Line 63 in my code is (as you might expect) the one that calls blastn on my
> factory object.
>
> I'd appreciate any help you might be able to provide to shed light on this.
>
> Thanks in advance,
>
> Mike
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Thu Jan 28 13:26:27 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 12:26:27 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
Message-ID: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu>

Russell,

Just curious, but have you tried setting the return email parameter
(-email)?  NCBI recently stated that all queries would eventually
require a return email of some sort (not sure if it's validated or not).
I think that was set for around late spring.  I'm changing the code in
svn to require it for that very purpose.

chris  


 Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today).
> 
> --Russell
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > To: Smithies, Russell
> > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > 
> > Makes me wonder if they're pushing more users towards the SOAP-based
> > services and away from eutils.
> > 
> > chris
> > 
> > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > 
> > > I've had a wide selection of errors lately:
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource
> > temporarily unavailable)
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > STACK: Bio::Tools::EUtilities::parse_data
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > STACK: Bio::Tools::EUtilities::get_ids
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > STACK: Bio::DB::EUtilities::get_ids
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > STACK: get_desc.pl:32
> > > -----------------------------------------------------------
> > >
> > > And I never get a good explanation from NCBI or suggestions on how to
> > avoid it.
> > >
> > >
> > > --Russell
> > >
> > >
> > >> -----Original Message-----
> > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > >> To: Smithies, Russell
> > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >> number?
> > >>
> > >> It's unfortunate but I have heard this problem popping up quite a bit
> > more
> > >> frequently lately.  Not to push too many buttons but NCBI isn't very
> > >> forthcoming with help these days; they have become quite insular.  Not
> > >> sure if they're short-staffed due to budget or if there are other
> > issues.
> > >>
> > >> chris
> > >>
> > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > >>
> > >>> Grrrrrr, I hate eutils!!!!
> > >>>
> > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > >> (Connection refused)
> > >>> STACK: Error::throw
> > >>> STACK: Bio::Root::Root::throw
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > >>> STACK: Bio::Tools::EUtilities::parse_data
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > >>> STACK: Bio::Tools::EUtilities::get_ids
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > >>> STACK: Bio::DB::EUtilities::get_ids
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > >>> STACK: get_desc.pl:32
> > >>> -----------------------------------------------------------
> > >>>
> > >>>
> > >>> Nice error message though :-)
> > >>>
> > >>>
> > >>> --Russell
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > >>>> To: 'Chris Fields'
> > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > bio.org'
> > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >>>> number?
> > >>>>
> > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
> > >> often
> > >>>> been finding that with large queries, chunks of the resulting data is
> > >>>> missing.
> > >>>> For example, before Xmas I was creating species-specific databases by
> > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > >> retrieving
> > >>>> the fasta sequences in chunks of 500.
> > >>>> Very regularly, in the middle of the fasta there would be a message
> > >> about
> > >>>> resource unavailable eg.
> > >>>>> test_sequence_1
> > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > >>>>> test_sequence_2
> > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > >>>>
> > >>>> Often this wasn't detected until formatdb complained about invalid
> > >>>> characters.
> > >>>> Inquiries to NCBI as to why this was happening and what to do about
> > it
> > >>>> returned stupid answers ("do each sequence manually thru the web
> > >>>> interface", or "use eUtils").
> > >>>> As we have a nice fast network connection, I now prefer to download
> > >> very
> > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > >>>>
> > >>>> I can't help but think that NCBI could solve a lot of problems if
> > they
> > >>>> gzipped the output from eUtils queries - it's something I've
> > requested
> > >>>> regularly for the last 5 years or so!!
> > >>>>
> > >>>> --Russell
> > >>>>
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > >>>>> To: Smithies, Russell
> > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > bio.org'
> > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > >>>>> number?
> > >>>>>
> > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files
> > or
> > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> > >> the
> > >>>>> details).
> > >>>>>
> > >>>>> chris
> > >>>>>
> > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > >>>>>
> > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > >>>> flakiness
> > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > >> gi_taxid_prot.zip
> > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a
> > hash
> > >>>> and
> > >>>>> do lookups.
> > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> > >>>> which
> > >>>>> lists taxids and descriptions (and synonyms)
> > >>>>>>
> > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so
> > I
> > >>>>> could do this:
> > >>>>>>
> > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > >>>>>> my $org_name = $names{$taxid};
> > >>>>>>
> > >>>>>> --Russell
> > >>>>>>
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > >> accession
> > >>>>>>> number?
> > >>>>>>>
> > >>>>>>> Bhakti,
> > >>>>>>> The following example (using EUtilities) may serve your purpose:
> > >>>>>>>
> > >>>>>>> use Bio::DB::EUtilities;
> > >>>>>>>
> > >>>>>>> my (%taxa, @taxa);
> > >>>>>>> my (%names, %idmap);
> > >>>>>>>
> > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> > >>>>>>> 'nucleotide',
> > >>>>>>> # (probably)
> > >>>>>>>
> > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > >>>>>>>
> > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > >>>>>>>                                     -db => 'taxonomy',
> > >>>>>>>                                     -dbfrom => 'protein',
> > >>>>>>>                                     -correspondence => 1,
> > >>>>>>>                                     -id => \@ids);
> > >>>>>>>
> > >>>>>>> # iterate through the LinkSet objects
> > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> @taxa = @taxa{@ids};
> > >>>>>>>
> > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > >>>>>>>      -db    => 'taxonomy',
> > >>>>>>>      -id    => \@taxa );
> > >>>>>>>
> > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> foreach (@ids) {
> > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> # %idmap is
> > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > >>>>>>> #    89318838 => undef    (this record has been removed from the
> > db)
> > >>>>>>>
> > >>>>>>> 1;
> > >>>>>>>
> > >>>>>>> You probably will need to break up your 30000 into chunks
> > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > >>>>>>>
> > >>>>>>> sleep 3;
> > >>>>>>>
> > >>>>>>> or so separating the queries.
> > >>>>>>> MAJ
> > >>>>>>> ----- Original Message -----
> > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> > >>>>> number?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > name"
> > >>>>>>> given
> > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > accession
> > >>>>>>> numbers
> > >>>>>>>> for which I need to get the source organisms.  Any kind of help
> > >> will
> > >>>>> be
> > >>>>>>>> appreciated.
> > >>>>>>>>
> > >>>>>>>> Thanks
> > >>>>>>>>
> > >>>>>>>> BD
> > >>>>>>>> _______________________________________________
> > >>>>>>>> Bioperl-l mailing list
> > >>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> Bioperl-l mailing list
> > >>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>
> > >>>>
> > =======================================================================
> > >>>>>> Attention: The information contained in this message and/or
> > >>>> attachments
> > >>>>>> from AgResearch Limited is intended only for the persons or
> > entities
> > >>>>>> to which it is addressed and may contain confidential and/or
> > >>>> privileged
> > >>>>>> material. Any review, retransmission, dissemination or other use
> > of,
> > >>>> or
> > >>>>>> taking of any action in reliance upon, this information by persons
> > or
> > >>>>>> entities other than the intended recipients is prohibited by
> > >>>> AgResearch
> > >>>>>> Limited. If you have received this message in error, please notify
> > >> the
> > >>>>>> sender immediately.
> > >>>>>>
> > >>>>
> > =======================================================================
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> Bioperl-l mailing list
> > >>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> Bioperl-l mailing list
> > >>>> Bioperl-l at lists.open-bio.org
> > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jan 28 13:47:04 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 13:47:04 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
Message-ID: <FD6E9A89F6034CCB856E22553ED893D7@NewLife>

Hi Mike,
Believe I found the real bug causing the problem (was not accounting for
the db_dir parameter). Crashes should now also throw much more helpful
errors. Please try the code at r16774, and shout back.
thanks --
MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 11:18 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
> running blastn
>
>
> Dear all,
>
> I am attempting to blast some primers against the mouse genome. I have created 
> a
> local mouse genome blast database and I can search against it using 'blastn' 
> at
> the command line.
>
> I have perl code that creates an array of bioperl sequence objects called
> @primers
>
> I then create a StandAloneBlastPlus factory using the following code?
>
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
>
> and then attempt to blast my primers using this?
>
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
>
> This fails with the following error?
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> Line 63 in my code is (as you might expect) the one that calls blastn on my
> factory object.
>
> I'd appreciate any help you might be able to provide to shed light on this.
>
> Thanks in advance,
>
> Mike
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jan 28 14:00:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:00:26 -0600
Subject: [Bioperl-l] EUtilities policy change
Message-ID: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>

All,

Per NCBI's recent change in eutils user policy (effective June 1):

http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html

Both the tool and email parameters ('-tool', '-email') are now required
when making requests.  Note this will significantly break all modules
requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
and Taxonomy stuff as well, IIRC).  This also applies to web services
(SOAP-based access).  Mark, not sure how this affects your SOAP-based
modules.

I have reconfigured Bio::DB::EUtilities to follow this policy; the
default tool setting has been 'bioperl' and will remain that way.
However, there has been no default email, therefore setting this is now
required for future requests unless we (the bioperl devs) decide there
is a safe default email to utilize.  My gut tells me, however, that
falling back to a default email opens up a can of worms for the devs and
is very likely a 'BAD IDEA'(TM).  

Regardless, be aware that, after June 1, NCBI will very likely exclude
requests with no email and will notify users who are considered to be
violating their policies.

I will likely make further changes to Bio::DB::EUtilities in the
meantime to ensure that using the tools by default will not violate
NCBI's policy (e.g. override this at your own risk).  

chris


From maj at fortinbras.us  Thu Jan 28 14:05:43 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 14:05:43 -0500
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
Message-ID: <8F49B5ED151143FA86E977B4D4F44265@NewLife>

Thanks Chris-- 
The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
I agree that a default email is a bad idea (tm) (unless maybe it's 
hilmar's...?). I'd say a warning on unset email parameters is a responsible
"there be dragons" sort of treatment.
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
Cc: "Mark A. Jensen" <maj at fortinbras.us>
Sent: Thursday, January 28, 2010 2:00 PM
Subject: EUtilities policy change


> All,
> 
> Per NCBI's recent change in eutils user policy (effective June 1):
> 
> http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
> 
> Both the tool and email parameters ('-tool', '-email') are now required
> when making requests.  Note this will significantly break all modules
> requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
> and Taxonomy stuff as well, IIRC).  This also applies to web services
> (SOAP-based access).  Mark, not sure how this affects your SOAP-based
> modules.
> 
> I have reconfigured Bio::DB::EUtilities to follow this policy; the
> default tool setting has been 'bioperl' and will remain that way.
> However, there has been no default email, therefore setting this is now
> required for future requests unless we (the bioperl devs) decide there
> is a safe default email to utilize.  My gut tells me, however, that
> falling back to a default email opens up a can of worms for the devs and
> is very likely a 'BAD IDEA'(TM).  
> 
> Regardless, be aware that, after June 1, NCBI will very likely exclude
> requests with no email and will notify users who are considered to be
> violating their policies.
> 
> I will likely make further changes to Bio::DB::EUtilities in the
> meantime to ensure that using the tools by default will not violate
> NCBI's policy (e.g. override this at your own risk).  
> 
> chris
> 
> 
>

From cjfields at illinois.edu  Thu Jan 28 14:18:22 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:18:22 -0600
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <8F49B5ED151143FA86E977B4D4F44265@NewLife>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
	<8F49B5ED151143FA86E977B4D4F44265@NewLife>
Message-ID: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu>

I think warning is fine for now.  I've reimplemented that so it occurs
lazily (warns only when a request is actually made).

Will also change the tool to 'BioPerl' (currently 'bioperl', all lc).
We'll obviously have to address this in the test suite as well in some
way, maybe ask for an email if network tests are requested.

chris 

On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote:
> Thanks Chris-- 
> The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
> I agree that a default email is a bad idea (tm) (unless maybe it's 
> hilmar's...?). I'd say a warning on unset email parameters is a responsible
> "there be dragons" sort of treatment.
> MAJ
> ----- Original Message ----- 
> From: "Chris Fields" <cjfields at illinois.edu>
> To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
> Cc: "Mark A. Jensen" <maj at fortinbras.us>
> Sent: Thursday, January 28, 2010 2:00 PM
> Subject: EUtilities policy change
> 
> 
> > All,
> > 
> > Per NCBI's recent change in eutils user policy (effective June 1):
> > 
> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
> > 
> > Both the tool and email parameters ('-tool', '-email') are now required
> > when making requests.  Note this will significantly break all modules
> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
> > and Taxonomy stuff as well, IIRC).  This also applies to web services
> > (SOAP-based access).  Mark, not sure how this affects your SOAP-based
> > modules.
> > 
> > I have reconfigured Bio::DB::EUtilities to follow this policy; the
> > default tool setting has been 'bioperl' and will remain that way.
> > However, there has been no default email, therefore setting this is now
> > required for future requests unless we (the bioperl devs) decide there
> > is a safe default email to utilize.  My gut tells me, however, that
> > falling back to a default email opens up a can of worms for the devs and
> > is very likely a 'BAD IDEA'(TM).  
> > 
> > Regardless, be aware that, after June 1, NCBI will very likely exclude
> > requests with no email and will notify users who are considered to be
> > violating their policies.
> > 
> > I will likely make further changes to Bio::DB::EUtilities in the
> > meantime to ensure that using the tools by default will not violate
> > NCBI's policy (e.g. override this at your own risk).  
> > 
> > chris
> > 
> > 
> >


From Russell.Smithies at agresearch.co.nz  Thu Jan 28 14:25:38 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 29 Jan 2010 08:25:38 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
	<1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>

Yes, I usually set the 'tool' and 'email' parameters.
I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well...

--Russell

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Friday, 29 January 2010 7:26 a.m.
> To: Smithies, Russell
> Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Russell,
> 
> Just curious, but have you tried setting the return email parameter
> (-email)?  NCBI recently stated that all queries would eventually
> require a return email of some sort (not sure if it's validated or not).
> I think that was set for around late spring.  I'm changing the code in
> svn to require it for that very purpose.
> 
> chris
> 
> 
>  Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi
> still works if you don't mind a bit of manual button clicking. It's
> handling chunks of 100,000 records OK (today).
> >
> > --Russell
> >
> > > -----Original Message-----
> > > From: Chris Fields [mailto:cjfields at illinois.edu]
> > > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > > To: Smithies, Russell
> > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > > number?
> > >
> > > Makes me wonder if they're pushing more users towards the SOAP-based
> > > services and away from eutils.
> > >
> > > chris
> > >
> > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > >
> > > > I've had a wide selection of errors lately:
> > > >
> > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11
> (Resource
> > > temporarily unavailable)
> > > > STACK: Error::throw
> > > > STACK: Bio::Root::Root::throw
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > STACK: Bio::Tools::EUtilities::parse_data
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > STACK: Bio::Tools::EUtilities::get_ids
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > STACK: Bio::DB::EUtilities::get_ids
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > STACK: get_desc.pl:32
> > > > -----------------------------------------------------------
> > > >
> > > > And I never get a good explanation from NCBI or suggestions on how
> to
> > > avoid it.
> > > >
> > > >
> > > > --Russell
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > > >> To: Smithies, Russell
> > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> > > >> number?
> > > >>
> > > >> It's unfortunate but I have heard this problem popping up quite a
> bit
> > > more
> > > >> frequently lately.  Not to push too many buttons but NCBI isn't
> very
> > > >> forthcoming with help these days; they have become quite insular.
> Not
> > > >> sure if they're short-staffed due to budget or if there are other
> > > issues.
> > > >>
> > > >> chris
> > > >>
> > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > > >>
> > > >>> Grrrrrr, I hate eutils!!!!
> > > >>>
> > > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > > >> (Connection refused)
> > > >>> STACK: Error::throw
> > > >>> STACK: Bio::Root::Root::throw
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > >>> STACK: Bio::Tools::EUtilities::parse_data
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > >>> STACK: Bio::Tools::EUtilities::get_ids
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > >>> STACK: Bio::DB::EUtilities::get_ids
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > >>> STACK: get_desc.pl:32
> > > >>> -----------------------------------------------------------
> > > >>>
> > > >>>
> > > >>> Nice error message though :-)
> > > >>>
> > > >>>
> > > >>> --Russell
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > > >>>> To: 'Chris Fields'
> > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > > bio.org'
> > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> > > >>>> number?
> > > >>>>
> > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as
> I've
> > > >> often
> > > >>>> been finding that with large queries, chunks of the resulting
> data is
> > > >>>> missing.
> > > >>>> For example, before Xmas I was creating species-specific
> databases by
> > > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > > >> retrieving
> > > >>>> the fasta sequences in chunks of 500.
> > > >>>> Very regularly, in the middle of the fasta there would be a
> message
> > > >> about
> > > >>>> resource unavailable eg.
> > > >>>>> test_sequence_1
> > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > > >>>>> test_sequence_2
> > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > > >>>>
> > > >>>> Often this wasn't detected until formatdb complained about
> invalid
> > > >>>> characters.
> > > >>>> Inquiries to NCBI as to why this was happening and what to do
> about
> > > it
> > > >>>> returned stupid answers ("do each sequence manually thru the web
> > > >>>> interface", or "use eUtils").
> > > >>>> As we have a nice fast network connection, I now prefer to
> download
> > > >> very
> > > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > > >>>>
> > > >>>> I can't help but think that NCBI could solve a lot of problems if
> > > they
> > > >>>> gzipped the output from eUtils queries - it's something I've
> > > requested
> > > >>>> regularly for the last 5 years or so!!
> > > >>>>
> > > >>>> --Russell
> > > >>>>
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > > >>>>> To: Smithies, Russell
> > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > > bio.org'
> > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > accession
> > > >>>>> number?
> > > >>>>>
> > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same
> files
> > > or
> > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD
> for
> > > >> the
> > > >>>>> details).
> > > >>>>>
> > > >>>>> chris
> > > >>>>>
> > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > > >>>>>
> > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > > >>>> flakiness
> > > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > > >> gi_taxid_prot.zip
> > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into
> a
> > > hash
> > > >>>> and
> > > >>>>> do lookups.
> > > >>>>>> In that same dir, taxdump.tar.gz contains a file called
> names.dmp
> > > >>>> which
> > > >>>>> lists taxids and descriptions (and synonyms)
> > > >>>>>>
> > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes
> so
> > > I
> > > >>>>> could do this:
> > > >>>>>>
> > > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > > >>>>>> my $org_name = $names{$taxid};
> > > >>>>>>
> > > >>>>>> --Russell
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> -----Original Message-----
> > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > >> accession
> > > >>>>>>> number?
> > > >>>>>>>
> > > >>>>>>> Bhakti,
> > > >>>>>>> The following example (using EUtilities) may serve your
> purpose:
> > > >>>>>>>
> > > >>>>>>> use Bio::DB::EUtilities;
> > > >>>>>>>
> > > >>>>>>> my (%taxa, @taxa);
> > > >>>>>>> my (%names, %idmap);
> > > >>>>>>>
> > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom
> =>
> > > >>>>>>> 'nucleotide',
> > > >>>>>>> # (probably)
> > > >>>>>>>
> > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > > >>>>>>>
> > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > > >>>>>>>                                     -db => 'taxonomy',
> > > >>>>>>>                                     -dbfrom => 'protein',
> > > >>>>>>>                                     -correspondence => 1,
> > > >>>>>>>                                     -id => \@ids);
> > > >>>>>>>
> > > >>>>>>> # iterate through the LinkSet objects
> > > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> @taxa = @taxa{@ids};
> > > >>>>>>>
> > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > > >>>>>>>      -db    => 'taxonomy',
> > > >>>>>>>      -id    => \@taxa );
> > > >>>>>>>
> > > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> foreach (@ids) {
> > > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> # %idmap is
> > > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > > >>>>>>> #    89318838 => undef    (this record has been removed from
> the
> > > db)
> > > >>>>>>>
> > > >>>>>>> 1;
> > > >>>>>>>
> > > >>>>>>> You probably will need to break up your 30000 into chunks
> > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > > >>>>>>>
> > > >>>>>>> sleep 3;
> > > >>>>>>>
> > > >>>>>>> or so separating the queries.
> > > >>>>>>> MAJ
> > > >>>>>>> ----- Original Message -----
> > > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from
> accession
> > > >>>>> number?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Hi,
> > > >>>>>>>>
> > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > > name"
> > > >>>>>>> given
> > > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > > accession
> > > >>>>>>> numbers
> > > >>>>>>>> for which I need to get the source organisms.  Any kind of
> help
> > > >> will
> > > >>>>> be
> > > >>>>>>>> appreciated.
> > > >>>>>>>>
> > > >>>>>>>> Thanks
> > > >>>>>>>>
> > > >>>>>>>> BD
> > > >>>>>>>> _______________________________________________
> > > >>>>>>>> Bioperl-l mailing list
> > > >>>>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> _______________________________________________
> > > >>>>>>> Bioperl-l mailing list
> > > >>>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>
> > > >>>>
> > >
> =======================================================================
> > > >>>>>> Attention: The information contained in this message and/or
> > > >>>> attachments
> > > >>>>>> from AgResearch Limited is intended only for the persons or
> > > entities
> > > >>>>>> to which it is addressed and may contain confidential and/or
> > > >>>> privileged
> > > >>>>>> material. Any review, retransmission, dissemination or other
> use
> > > of,
> > > >>>> or
> > > >>>>>> taking of any action in reliance upon, this information by
> persons
> > > or
> > > >>>>>> entities other than the intended recipients is prohibited by
> > > >>>> AgResearch
> > > >>>>>> Limited. If you have received this message in error, please
> notify
> > > >> the
> > > >>>>>> sender immediately.
> > > >>>>>>
> > > >>>>
> > >
> =======================================================================
> > > >>>>>>
> > > >>>>>> _______________________________________________
> > > >>>>>> Bioperl-l mailing list
> > > >>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> Bioperl-l mailing list
> > > >>>> Bioperl-l at lists.open-bio.org
> > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Thu Jan 28 14:30:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:30:12 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
	<1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>
Message-ID: <1264707012.5473.51.camel@cjfields.igb.uiuc.edu>

Russell,

Okay, just wanted to make sure.  The email/tool requirements weren't
actually enforced up until now, which is forcing us to do a bit of
re-work on the various tools that don't have it set by default (at least
warn users unaware of it).  

And I agree, gzipped archives would be nice!

chris

On Fri, 2010-01-29 at 08:25 +1300, Smithies, Russell wrote:
> Yes, I usually set the 'tool' and 'email' parameters.
> I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well...
> 
> --Russell
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Friday, 29 January 2010 7:26 a.m.
> > To: Smithies, Russell
> > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > 
> > Russell,
> > 
> > Just curious, but have you tried setting the return email parameter
> > (-email)?  NCBI recently stated that all queries would eventually
> > require a return email of some sort (not sure if it's validated or not).
> > I think that was set for around late spring.  I'm changing the code in
> > svn to require it for that very purpose.
> > 
> > chris
> > 
> > 
> >  Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi
> > still works if you don't mind a bit of manual button clicking. It's
> > handling chunks of 100,000 records OK (today).
> > >
> > > --Russell
> > >
> > > > -----Original Message-----
> > > > From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > > > To: Smithies, Russell
> > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > > > number?
> > > >
> > > > Makes me wonder if they're pushing more users towards the SOAP-based
> > > > services and away from eutils.
> > > >
> > > > chris
> > > >
> > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > > >
> > > > > I've had a wide selection of errors lately:
> > > > >
> > > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11
> > (Resource
> > > > temporarily unavailable)
> > > > > STACK: Error::throw
> > > > > STACK: Bio::Root::Root::throw
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > > STACK: Bio::Tools::EUtilities::parse_data
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > > STACK: Bio::Tools::EUtilities::get_ids
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > > STACK: Bio::DB::EUtilities::get_ids
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > > STACK: get_desc.pl:32
> > > > > -----------------------------------------------------------
> > > > >
> > > > > And I never get a good explanation from NCBI or suggestions on how
> > to
> > > > avoid it.
> > > > >
> > > > >
> > > > > --Russell
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > > > >> To: Smithies, Russell
> > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >> number?
> > > > >>
> > > > >> It's unfortunate but I have heard this problem popping up quite a
> > bit
> > > > more
> > > > >> frequently lately.  Not to push too many buttons but NCBI isn't
> > very
> > > > >> forthcoming with help these days; they have become quite insular.
> > Not
> > > > >> sure if they're short-staffed due to budget or if there are other
> > > > issues.
> > > > >>
> > > > >> chris
> > > > >>
> > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > > > >>
> > > > >>> Grrrrrr, I hate eutils!!!!
> > > > >>>
> > > > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > > > >> (Connection refused)
> > > > >>> STACK: Error::throw
> > > > >>> STACK: Bio::Root::Root::throw
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > >>> STACK: Bio::Tools::EUtilities::parse_data
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > >>> STACK: Bio::Tools::EUtilities::get_ids
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > >>> STACK: Bio::DB::EUtilities::get_ids
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > >>> STACK: get_desc.pl:32
> > > > >>> -----------------------------------------------------------
> > > > >>>
> > > > >>>
> > > > >>> Nice error message though :-)
> > > > >>>
> > > > >>>
> > > > >>> --Russell
> > > > >>>
> > > > >>>> -----Original Message-----
> > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > > > >>>> To: 'Chris Fields'
> > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > > > bio.org'
> > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >>>> number?
> > > > >>>>
> > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as
> > I've
> > > > >> often
> > > > >>>> been finding that with large queries, chunks of the resulting
> > data is
> > > > >>>> missing.
> > > > >>>> For example, before Xmas I was creating species-specific
> > databases by
> > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > > > >> retrieving
> > > > >>>> the fasta sequences in chunks of 500.
> > > > >>>> Very regularly, in the middle of the fasta there would be a
> > message
> > > > >> about
> > > > >>>> resource unavailable eg.
> > > > >>>>> test_sequence_1
> > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > > > >>>>> test_sequence_2
> > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > > > >>>>
> > > > >>>> Often this wasn't detected until formatdb complained about
> > invalid
> > > > >>>> characters.
> > > > >>>> Inquiries to NCBI as to why this was happening and what to do
> > about
> > > > it
> > > > >>>> returned stupid answers ("do each sequence manually thru the web
> > > > >>>> interface", or "use eUtils").
> > > > >>>> As we have a nice fast network connection, I now prefer to
> > download
> > > > >> very
> > > > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > > > >>>>
> > > > >>>> I can't help but think that NCBI could solve a lot of problems if
> > > > they
> > > > >>>> gzipped the output from eUtils queries - it's something I've
> > > > requested
> > > > >>>> regularly for the last 5 years or so!!
> > > > >>>>
> > > > >>>> --Russell
> > > > >>>>
> > > > >>>>
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > > > >>>>> To: Smithies, Russell
> > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > > > bio.org'
> > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > > accession
> > > > >>>>> number?
> > > > >>>>>
> > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same
> > files
> > > > or
> > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD
> > for
> > > > >> the
> > > > >>>>> details).
> > > > >>>>>
> > > > >>>>> chris
> > > > >>>>>
> > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > > > >>>>>
> > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > > > >>>> flakiness
> > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > > > >> gi_taxid_prot.zip
> > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into
> > a
> > > > hash
> > > > >>>> and
> > > > >>>>> do lookups.
> > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called
> > names.dmp
> > > > >>>> which
> > > > >>>>> lists taxids and descriptions (and synonyms)
> > > > >>>>>>
> > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes
> > so
> > > > I
> > > > >>>>> could do this:
> > > > >>>>>>
> > > > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > > > >>>>>> my $org_name = $names{$taxid};
> > > > >>>>>>
> > > > >>>>>> --Russell
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> -----Original Message-----
> > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > > >> accession
> > > > >>>>>>> number?
> > > > >>>>>>>
> > > > >>>>>>> Bhakti,
> > > > >>>>>>> The following example (using EUtilities) may serve your
> > purpose:
> > > > >>>>>>>
> > > > >>>>>>> use Bio::DB::EUtilities;
> > > > >>>>>>>
> > > > >>>>>>> my (%taxa, @taxa);
> > > > >>>>>>> my (%names, %idmap);
> > > > >>>>>>>
> > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom
> > =>
> > > > >>>>>>> 'nucleotide',
> > > > >>>>>>> # (probably)
> > > > >>>>>>>
> > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > > > >>>>>>>
> > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > > > >>>>>>>                                     -db => 'taxonomy',
> > > > >>>>>>>                                     -dbfrom => 'protein',
> > > > >>>>>>>                                     -correspondence => 1,
> > > > >>>>>>>                                     -id => \@ids);
> > > > >>>>>>>
> > > > >>>>>>> # iterate through the LinkSet objects
> > > > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > > > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> @taxa = @taxa{@ids};
> > > > >>>>>>>
> > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > > > >>>>>>>      -db    => 'taxonomy',
> > > > >>>>>>>      -id    => \@taxa );
> > > > >>>>>>>
> > > > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > > > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> foreach (@ids) {
> > > > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> # %idmap is
> > > > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > > > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > > > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > > > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > > > >>>>>>> #    89318838 => undef    (this record has been removed from
> > the
> > > > db)
> > > > >>>>>>>
> > > > >>>>>>> 1;
> > > > >>>>>>>
> > > > >>>>>>> You probably will need to break up your 30000 into chunks
> > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > > > >>>>>>>
> > > > >>>>>>> sleep 3;
> > > > >>>>>>>
> > > > >>>>>>> or so separating the queries.
> > > > >>>>>>> MAJ
> > > > >>>>>>> ----- Original Message -----
> > > > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > > > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >>>>> number?
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>> Hi,
> > > > >>>>>>>>
> > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > > > name"
> > > > >>>>>>> given
> > > > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > > > accession
> > > > >>>>>>> numbers
> > > > >>>>>>>> for which I need to get the source organisms.  Any kind of
> > help
> > > > >> will
> > > > >>>>> be
> > > > >>>>>>>> appreciated.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks
> > > > >>>>>>>>
> > > > >>>>>>>> BD
> > > > >>>>>>>> _______________________________________________
> > > > >>>>>>>> Bioperl-l mailing list
> > > > >>>>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> _______________________________________________
> > > > >>>>>>> Bioperl-l mailing list
> > > > >>>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>>>
> > > > >>>>
> > > >
> > =======================================================================
> > > > >>>>>> Attention: The information contained in this message and/or
> > > > >>>> attachments
> > > > >>>>>> from AgResearch Limited is intended only for the persons or
> > > > entities
> > > > >>>>>> to which it is addressed and may contain confidential and/or
> > > > >>>> privileged
> > > > >>>>>> material. Any review, retransmission, dissemination or other
> > use
> > > > of,
> > > > >>>> or
> > > > >>>>>> taking of any action in reliance upon, this information by
> > persons
> > > > or
> > > > >>>>>> entities other than the intended recipients is prohibited by
> > > > >>>> AgResearch
> > > > >>>>>> Limited. If you have received this message in error, please
> > notify
> > > > >> the
> > > > >>>>>> sender immediately.
> > > > >>>>>>
> > > > >>>>
> > > >
> > =======================================================================
> > > > >>>>>>
> > > > >>>>>> _______________________________________________
> > > > >>>>>> Bioperl-l mailing list
> > > > >>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>
> > > > >>>>
> > > > >>>> _______________________________________________
> > > > >>>> Bioperl-l mailing list
> > > > >>>> Bioperl-l at lists.open-bio.org
> > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 


From maj at fortinbras.us  Thu Jan 28 14:55:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 14:55:31 -0500
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu><8F49B5ED151143FA86E977B4D4F44265@NewLife>
	<1264706302.5473.48.camel@cjfields.igb.uiuc.edu>
Message-ID: <CD70565A9D3F44C4A0D7BA6462E021E0@NewLife>

Ok, SoapEU now warns on no email; passes email onto the fetch stage
during autofetch -- cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl-l" <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 2:18 PM
Subject: Re: [Bioperl-l] EUtilities policy change


>I think warning is fine for now.  I've reimplemented that so it occurs
> lazily (warns only when a request is actually made).
> 
> Will also change the tool to 'BioPerl' (currently 'bioperl', all lc).
> We'll obviously have to address this in the test suite as well in some
> way, maybe ask for an email if network tests are requested.
> 
> chris 
> 
> On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote:
>> Thanks Chris-- 
>> The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
>> I agree that a default email is a bad idea (tm) (unless maybe it's 
>> hilmar's...?). I'd say a warning on unset email parameters is a responsible
>> "there be dragons" sort of treatment.
>> MAJ
>> ----- Original Message ----- 
>> From: "Chris Fields" <cjfields at illinois.edu>
>> To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
>> Cc: "Mark A. Jensen" <maj at fortinbras.us>
>> Sent: Thursday, January 28, 2010 2:00 PM
>> Subject: EUtilities policy change
>> 
>> 
>> > All,
>> > 
>> > Per NCBI's recent change in eutils user policy (effective June 1):
>> > 
>> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
>> > 
>> > Both the tool and email parameters ('-tool', '-email') are now required
>> > when making requests.  Note this will significantly break all modules
>> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
>> > and Taxonomy stuff as well, IIRC).  This also applies to web services
>> > (SOAP-based access).  Mark, not sure how this affects your SOAP-based
>> > modules.
>> > 
>> > I have reconfigured Bio::DB::EUtilities to follow this policy; the
>> > default tool setting has been 'bioperl' and will remain that way.
>> > However, there has been no default email, therefore setting this is now
>> > required for future requests unless we (the bioperl devs) decide there
>> > is a safe default email to utilize.  My gut tells me, however, that
>> > falling back to a default email opens up a can of worms for the devs and
>> > is very likely a 'BAD IDEA'(TM).  
>> > 
>> > Regardless, be aware that, after June 1, NCBI will very likely exclude
>> > requests with no email and will notify users who are considered to be
>> > violating their policies.
>> > 
>> > I will likely make further changes to Bio::DB::EUtilities in the
>> > meantime to ensure that using the tools by default will not violate
>> > NCBI's policy (e.g. override this at your own risk).  
>> > 
>> > chris
>> > 
>> > 
>> >
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>

From chapmanb at 50mail.com  Thu Jan 28 15:35:05 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 28 Jan 2010 15:35:05 -0500
Subject: [Bioperl-l] OpenBio solution challenge: Project updates at BOSC 2010
Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu>

Hello all;
The BOSC 2010 organizing committee is hard at work getting prepared for this
July's meeting in Boston:

http://www.open-bio.org/wiki/BOSC_2010

One of the items we've traditionally had at the conference is a project 
update from each of the OpenBio affiliated groups. This year, we're thinking
about organizing these talks around a central theme: the OpenBio solution
challenge. We start with a biological question of general interest, and each
of the project talks would focus around how you would solve that problem 
using your toolkit and programming language.

This is meant to provide a challenge for OpenBio contributors, a nice tutorial
style overview of various projects and approaches for other programmers, and a
fun opportunity to compete and learn from other projects. Conference attendees
will vote on their favorite solution, with the winner receiving fame and
fortune (warning: fortune not guaranteed).

For this to be successful, it of course requires interest and enthusiasm from
y'all fine folks involved with the projects. Specifically:

- Is there interest from your group in participating in the challenge? You'll
  want at least a few people to work on it, and someone to give a presentation 
  at BOSC.

- Do you have suggestions on a good theme or specific biological problem to
  tackle? We'll hope to pick something in a sweet spot that is challenging 
  enough to be of interest, yet reasonable for presentation and preparation.

Let's discuss ideas and get this together. Since the schedule for BOSC is
developing rapidly, please give us an idea if you're interested by
February 12th, and copy responses to the BOSC mailing list as a central 
place for discussion.

bosc at open-bio.org

Thanks,
Brad, Michael, and the BOSC organizing committee

From markw at illuminae.com  Thu Jan 28 16:17:44 2010
From: markw at illuminae.com (Mark Wilkinson)
Date: Thu, 28 Jan 2010 13:17:44 -0800
Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project
 updates at BOSC 2010
In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
Message-ID: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>


Brad, this sounds exciting!

One thing strikes me, though - by asking for the sub-projects to propose
the "grand challenge" themselves the one thing you can guarantee is that
the "grand challenge" is solvable (or more likely, already solved!)

Other "grand challenge" kinds of meetings have an independent third party
pose the problem that has to be solved, and then all groups work toward a
solution and compare their results.  This would, IMO, be more revealing of
the "state of the art" in each Open-Bio project, and point out where the
weaknesses are that we should be focusing on...  Someone (for example,
you!) could act as the moderator to ensure that the "grand challenge" was
at least a reasonable one, within the scope of what an Open-Bio project
*should* be able to solve...

Just my CAD $0.02

Mark


On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman <chapmanb at 50mail.com>  
wrote:

> Hello all;
> The BOSC 2010 organizing committee is hard at work getting prepared for  
> this
> July's meeting in Boston:
>
> http://www.open-bio.org/wiki/BOSC_2010
>
> One of the items we've traditionally had at the conference is a project
> update from each of the OpenBio affiliated groups. This year, we're  
> thinking
> about organizing these talks around a central theme: the OpenBio solution
> challenge. We start with a biological question of general interest, and  
> each
> of the project talks would focus around how you would solve that problem
> using your toolkit and programming language.
>
> This is meant to provide a challenge for OpenBio contributors, a nice  
> tutorial
> style overview of various projects and approaches for other programmers,  
> and a
> fun opportunity to compete and learn from other projects. Conference  
> attendees
> will vote on their favorite solution, with the winner receiving fame and
> fortune (warning: fortune not guaranteed).
>
> For this to be successful, it of course requires interest and enthusiasm  
> from
> y'all fine folks involved with the projects. Specifically:
>
> - Is there interest from your group in participating in the challenge?  
> You'll
>   want at least a few people to work on it, and someone to give a  
> presentation
>   at BOSC.
>
> - Do you have suggestions on a good theme or specific biological problem  
> to
>   tackle? We'll hope to pick something in a sweet spot that is  
> challenging
>   enough to be of interest, yet reasonable for presentation and  
> preparation.
>
> Let's discuss ideas and get this together. Since the schedule for BOSC is
> developing rapidly, please give us an idea if you're interested by
> February 12th, and copy responses to the BOSC mailing list as a central
> place for discussion.
>
> bosc at open-bio.org
>
> Thanks,
> Brad, Michael, and the BOSC organizing committee
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


-- 
Mark D Wilkinson, PI Bioinformatics
Assistant Professor, Medical Genetics
The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research
Providence Heart + Lung Institute
University of British Columbia - St. Paul's Hospital
Vancouver, BC, Canada

From HWillis at scripps.edu  Thu Jan 28 20:03:10 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 Jan 2010 20:03:10 -0500
Subject: [Bioperl-l] [Biojava-dev] [MOBY-dev] OpenBio solution
 challenge: Project updates at BOSC 2010
In-Reply-To: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
	<op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu>

Brad

I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution.

Scooter


On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote:

> 
> Brad, this sounds exciting!
> 
> One thing strikes me, though - by asking for the sub-projects to propose
> the "grand challenge" themselves the one thing you can guarantee is that
> the "grand challenge" is solvable (or more likely, already solved!)
> 
> Other "grand challenge" kinds of meetings have an independent third party
> pose the problem that has to be solved, and then all groups work toward a
> solution and compare their results.  This would, IMO, be more revealing of
> the "state of the art" in each Open-Bio project, and point out where the
> weaknesses are that we should be focusing on...  Someone (for example,
> you!) could act as the moderator to ensure that the "grand challenge" was
> at least a reasonable one, within the scope of what an Open-Bio project
> *should* be able to solve...
> 
> Just my CAD $0.02
> 
> Mark
> 
> 
> 
> On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman <chapmanb at 50mail.com>  
> wrote:
> 
>> Hello all;
>> The BOSC 2010 organizing committee is hard at work getting prepared for  
>> this
>> July's meeting in Boston:
>> 
>> http://www.open-bio.org/wiki/BOSC_2010
>> 
>> One of the items we've traditionally had at the conference is a project
>> update from each of the OpenBio affiliated groups. This year, we're  
>> thinking
>> about organizing these talks around a central theme: the OpenBio solution
>> challenge. We start with a biological question of general interest, and  
>> each
>> of the project talks would focus around how you would solve that problem
>> using your toolkit and programming language.
>> 
>> This is meant to provide a challenge for OpenBio contributors, a nice  
>> tutorial
>> style overview of various projects and approaches for other programmers,  
>> and a
>> fun opportunity to compete and learn from other projects. Conference  
>> attendees
>> will vote on their favorite solution, with the winner receiving fame and
>> fortune (warning: fortune not guaranteed).
>> 
>> For this to be successful, it of course requires interest and enthusiasm  
>> from
>> y'all fine folks involved with the projects. Specifically:
>> 
>> - Is there interest from your group in participating in the challenge?  
>> You'll
>>  want at least a few people to work on it, and someone to give a  
>> presentation
>>  at BOSC.
>> 
>> - Do you have suggestions on a good theme or specific biological problem  
>> to
>>  tackle? We'll hope to pick something in a sweet spot that is  
>> challenging
>>  enough to be of interest, yet reasonable for presentation and  
>> preparation.
>> 
>> Let's discuss ideas and get this together. Since the schedule for BOSC is
>> developing rapidly, please give us an idea if you're interested by
>> February 12th, and copy responses to the BOSC mailing list as a central
>> place for discussion.
>> 
>> bosc at open-bio.org
>> 
>> Thanks,
>> Brad, Michael, and the BOSC organizing committee
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
> 
> 
> -- 
> Mark D Wilkinson, PI Bioinformatics
> Assistant Professor, Medical Genetics
> The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research
> Providence Heart + Lung Institute
> University of British Columbia - St. Paul's Hospital
> Vancouver, BC, Canada
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From biopython at maubp.freeserve.co.uk  Fri Jan 29 05:36:40 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 29 Jan 2010 10:36:40 +0000
Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project
	updates at BOSC 2010
In-Reply-To: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
	<op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com>

Hi all,

This is a great topic but should be continue it on just the one mailing list?
Is there a suitable BOSC list, or how about the general Open Bio list?

On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson <markw at illuminae.com> wrote:
>
> Brad, this sounds exciting!
>
> One thing strikes me, though - by asking for the sub-projects to propose
> the "grand challenge" themselves the one thing you can guarantee is that
> the "grand challenge" is solvable (or more likely, already solved!)
>
> Other "grand challenge" kinds of meetings have an independent third party
> pose the problem that has to be solved, and then all groups work toward a
> solution and compare their results. ?This would, IMO, be more revealing of
> the "state of the art" in each Open-Bio project, and point out where the
> weaknesses are that we should be focusing on... ?Someone (for example,
> you!) could act as the moderator to ensure that the "grand challenge" was
> at least a reasonable one, within the scope of what an Open-Bio project
> *should* be able to solve...
>
> Just my CAD $0.02
>
> Mark

One possible problem with having Brad act as moderator is his ties to
Biopython (plus it would be a shame if we'd be one man down for trying
to solve the challenges - grin). Having a project representative "sign off"
on the challenge might work - or simply the whole of the BOSC committee
which is quite balanced. Alternatively some kind of panel of challenges does
seem a good way to reduce individual project bias (as suggest by Scooter),
but there will still need to be a judging committee.

I'm curious what kind of challenges the BOSC committee had in mind -
would something like taking a newly sequence bacteria and producing
an automated annotation as a GenBank, EMBL, or GFF  file be too
ambitious for example? There are already several major projects
to do this e.g. RAST http://rast.nmpdr.org/

Peter
(@Biopython)


From mike.stubbington at bbsrc.ac.uk  Fri Jan 29 08:25:25 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Fri, 29 Jan 2010 13:25:25 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
In-Reply-To: <FD6E9A89F6034CCB856E22553ED893D7@NewLife>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
	<FD6E9A89F6034CCB856E22553ED893D7@NewLife>
Message-ID: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>

Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
	-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with 

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
> error running blastn
> 
> 
> Hi,
> 
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
> 
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
> line 532.
> 
> STACK Bio::Tools::Run::WrapperBase::_run 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
> 
> M
> 
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
> 
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
>> running blastn
>> 
>> 
>> Dear all,
>> 
>> I am attempting to blast some primers against the mouse genome. I have created 
>> a
>> local mouse genome blast database and I can search against it using 'blastn' 
>> at
>> the command line.
>> 
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>> 
>> I then create a StandAloneBlastPlus factory using the following code?
>> 
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>> 
>> and then attempt to blast my primers using this?
>> 
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>> 
>> This fails with the following error?
>> 
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
>> line 532.
>> 
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>> 
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>> 
>> I'd appreciate any help you might be able to provide to shed light on this.
>> 
>> Thanks in advance,
>> 
>> Mike
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Fri Jan 29 08:36:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 29 Jan 2010 08:36:54 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
	<FD6E9A89F6034CCB856E22553ED893D7@NewLife>
	<ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
Message-ID: <DF05D2C7E8CC4CF18E6AE56077EB738A@NewLife>

Hi Mike-
Well, at least we're getting more informative errors. I think it's
still my bad; will look again. Both of your calls should work.
(thanks for the positive control too)
Thanks for your patience and the help--
MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Brian Osborne" <bosborne11 at verizon.net>
Sent: Friday, January 29, 2010 8:25 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file 
found for nucleotide database [MouseGenome] in search path 
[/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I 
construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
> error running blastn
>
>
> Hi,
>
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, 
> <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> M
>
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
>
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
>> error
>> running blastn
>>
>>
>> Dear all,
>>
>> I am attempting to blast some primers against the mouse genome. I have 
>> created
>> a
>> local mouse genome blast database and I can search against it using 'blastn'
>> at
>> the command line.
>>
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>>
>> I then create a StandAloneBlastPlus factory using the following code?
>>
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>>
>> and then attempt to blast my primers using this?
>>
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>>
>> This fails with the following error?
>>
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, 
>> <DATA>
>> line 532.
>>
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>>
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>>
>> I'd appreciate any help you might be able to provide to shed light on this.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From maj at fortinbras.us  Fri Jan 29 08:47:48 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 29 Jan 2010 08:47:48 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife><05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk><FD6E9A89F6034CCB856E22553ED893D7@NewLife>
	<ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
Message-ID: <2B7BF6CD46AE441AB24203E169D9C503@NewLife>

Mike et al--
I've entered this as Bug #3003 on http://bugzilla.bioperl.org;
we'll do further ping-pongs on this issue via the comment facility
there--
cheers MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; <Brian at portal.open-bio.org>; "Osborne" 
<bosborne11 at verizon.net>
Sent: Friday, January 29, 2010 8:25 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file 
found for nucleotide database [MouseGenome] in search path 
[/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I 
construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
> error running blastn
>
>
> Hi,
>
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, 
> <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> M
>
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
>
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
>> error
>> running blastn
>>
>>
>> Dear all,
>>
>> I am attempting to blast some primers against the mouse genome. I have 
>> created
>> a
>> local mouse genome blast database and I can search against it using 'blastn'
>> at
>> the command line.
>>
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>>
>> I then create a StandAloneBlastPlus factory using the following code?
>>
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>>
>> and then attempt to blast my primers using this?
>>
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>>
>> This fails with the following error?
>>
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, 
>> <DATA>
>> line 532.
>>
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>>
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>>
>> I'd appreciate any help you might be able to provide to shed light on this.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From help at gmod.org  Fri Jan 29 17:03:48 2010
From: help at gmod.org (Dave Clements, GMOD Help Desk)
Date: Fri, 29 Jan 2010 14:03:48 -0800
Subject: [Bioperl-l] 2010 GMOD Summer School - Americas
In-Reply-To: <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com>
References: <71ee57c71001291351q47994b82w10dffb390dbf2837@mail.gmail.com>
	<71ee57c71001291354m68548823s3e3fbd2e49e9b332@mail.gmail.com>
	<71ee57c71001291356p5e7f1aadi2bf437c93014a393@mail.gmail.com>
	<71ee57c71001291357h67112e2fkcf835687e59f66ae@mail.gmail.com>
	<71ee57c71001291358k74781b08n232534d8895c5ec1@mail.gmail.com>
	<71ee57c71001291400y28e40eb6i112ea91df977dc67@mail.gmail.com>
	<71ee57c71001291400n6133982eh3a02293ff741900b@mail.gmail.com>
	<71ee57c71001291401y505b56baic61c11754d88a444@mail.gmail.com>
	<71ee57c71001291402s23e3f2e9w2562d6acf85bd4ae@mail.gmail.com>
	<71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com>
Message-ID: <71ee57c71001291403s19be18f3s3a1d5a314c74def@mail.gmail.com>

Hello all,

I am pleased to announce that we are now accepting applications for:

? 2010 GMOD Summer School - Americas
? ? 6-9 May 2010
? ? NESCent, Durham, NC, USA
? ? http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas

This will be a hands-on multi-day course aimed at teaching new GMOD
users/administrators how to get GMOD Components up and running. The
course will introduce participants to the GMOD project and then focus
on installation, configuration and integration of popular GMOD
Components. The course will be held May 6-9, at NESCent in Durham, NC.

These components will be covered:
? ?* Apollo - genome annotation editor
? ?* Chado - a modular and extensible database schema
? ?* Galaxy - workflow system
? ?* GBrowse - the Generic Genome Browser
? ?* GBrowse_syn - A generic synteny browser
? ?* JBrowse - genome browser
? ?* MAKER - genome annotation pipeline
? ?* Tripal - web front end for Chado

The deadline for applying is the end of Friday, February 22. Admission
is competitive and is based on the strength of the application
(especially the statement of interest). In 2009 there were over 50
applications for the 25 slots. Any applications received after the
deadline will be placed on the waiting list.

See the course page for details and an application link:
?http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas

Thanks,

Dave Clements
GMOD Help Desk

PS: We are also investigating holding a GMOD course in the
Asia/Pacific region, sometime this fall. Watch the GMOD mailing lists
and the GMOD News page/RSS feed for updates.
--
Please keep responses on the list!
http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas
http://gmod.org/wiki/GMOD_News
Was this helpful? http://gmod.org/wiki/Help_Desk_Feedback


From bhakti.dwivedi at gmail.com  Sat Jan 30 17:38:40 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sat, 30 Jan 2010 17:38:40 -0500
Subject: [Bioperl-l] how to map blast results on to the genome?
Message-ID: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>

Does anyone know how I can graphically map the blast results (m -8 format)
to the genome using bio-perl?

Thanks

Bhakti

From jason at bioperl.org  Sat Jan 30 18:56:14 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 30 Jan 2010 15:56:14 -0800
Subject: [Bioperl-l] how to map blast results on to the genome?
In-Reply-To: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>
References: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>
Message-ID: <68937A7D-291F-419A-9ED7-7A87D9B4C78A@bioperl.org>

Did you try BioGraphics and read the HOWTO on it -- http://bioperl.org/wiki/HOWTO:Graphics
On Jan 30, 2010, at 2:38 PM, Bhakti Dwivedi wrote:

> Does anyone know how I can graphically map the blast results (m -8  
> format)
> to the genome using bio-perl?
>
> Thanks
>
> Bhakti
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From David.Messina at sbc.su.se  Sun Jan 31 12:43:52 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 31 Jan 2010 18:43:52 +0100
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
	<18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
Message-ID: <BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>

Hey Rui,

My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet.

I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two.

Dave


From bluecurio at gmail.com  Sun Jan 31 22:22:37 2010
From: bluecurio at gmail.com (Daniel Renfro)
Date: Sun, 31 Jan 2010 21:22:37 -0600
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects
Message-ID: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com>

Hello all,

A colleague and I have been working on a (Bio)Perl package to compare two
Seq objects. This is in response to a need we found in our lab -- we wanted
to see the changes to GenBank files through time, but wanted an automated
way to do this. This led to what I'm calling the SeqDiff.pm package. I
thought it would be a good idea to inform the community and get some
feedback.

The package takes two Seq objects as arguments, arbitrarily called "old" and
"new." It then matches the features from the old object with the new object.
This is done based on some criteria -- in our case we decided the features
must be of the same type (have the same primary_tag) and have at least one
matching database cross-reference (db_xref) in common.  The left-over
features (ones that did not have a match) are dropped into arrays called
"lost" and "gained." The matching is done in about NlogN time, as each
matching pair are removed from subsequent searches.

The matched features and iterated through and the differences are
calculated. Each feature is examined recursively and any differences are
reported. Optionally you can give the new() method a flag so that everything
is returned (differences and similarities.) You can set callbacks for
different types of objects (like anything that isa('Bio::LocationI')) if you
want a custom comparison for specific BioPerl objects. This comparison step
is the computationally slow part, and currently everything is held in
memory. I think it'd be better to do this piece-meal, using the BioPerl-ish
next() and last() methods.

Maybe this was a little verbose, but that is the SeqDiff package in a
nutshell. I hope to soon release v1.0. If you have any questions or comments
I'd love to hear them.

-Daniel Renfro

Hu Lab Research Associate
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4055

From maj at fortinbras.us  Sun Jan 31 22:47:05 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 31 Jan 2010 22:47:05 -0500
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects
In-Reply-To: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com>
References: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com>
Message-ID: <5DC96D65B6A447C3802AF5D745FF4AA4@NewLife>

Daniel-- this sounds interesting and useful, I +1 it. Your intuition about
in-memory vs streaming sounds correct to me; features can be many, and
diffing many (MANY) sequences may bork. Maybe our feature-rich users
can chime in. (...however, I did just hear about a magic spell called 
'File::Map',
might check that out on CPAN.)
cheers- MAJ
----- Original Message ----- 
From: "Daniel Renfro" <bluecurio at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 31, 2010 10:22 PM
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects


> Hello all,
>
> A colleague and I have been working on a (Bio)Perl package to compare two
> Seq objects. This is in response to a need we found in our lab -- we wanted
> to see the changes to GenBank files through time, but wanted an automated
> way to do this. This led to what I'm calling the SeqDiff.pm package. I
> thought it would be a good idea to inform the community and get some
> feedback.
>
> The package takes two Seq objects as arguments, arbitrarily called "old" and
> "new." It then matches the features from the old object with the new object.
> This is done based on some criteria -- in our case we decided the features
> must be of the same type (have the same primary_tag) and have at least one
> matching database cross-reference (db_xref) in common.  The left-over
> features (ones that did not have a match) are dropped into arrays called
> "lost" and "gained." The matching is done in about NlogN time, as each
> matching pair are removed from subsequent searches.
>
> The matched features and iterated through and the differences are
> calculated. Each feature is examined recursively and any differences are
> reported. Optionally you can give the new() method a flag so that everything
> is returned (differences and similarities.) You can set callbacks for
> different types of objects (like anything that isa('Bio::LocationI')) if you
> want a custom comparison for specific BioPerl objects. This comparison step
> is the computationally slow part, and currently everything is held in
> memory. I think it'd be better to do this piece-meal, using the BioPerl-ish
> next() and last() methods.
>
> Maybe this was a little verbose, but that is the SeqDiff package in a
> nutshell. I hope to soon release v1.0. If you have any questions or comments
> I'd love to hear them.
>
> -Daniel Renfro
>
> Hu Lab Research Associate
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4055
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rui.faria at upf.edu  Sun Jan 31 12:17:09 2010
From: rui.faria at upf.edu (Rui Faria)
Date: Sun, 31 Jan 2010 18:17:09 +0100 (CET)
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
Message-ID: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>

Hi Dave,

we reported the bug on codeml about errors when the user gives its own tree file, some time ago. Did you have any chances to look at it?

We basically wanted to know your opinion on where the problem may be, since we are not the most experienced "perlers" on the planet :) 

I'm asking this because we have to deal with that right now. If someone could check where is the problem, to understand if it has an easy solution, that would be of great help.

Best,

Rui


-----Mensaje Original-----
De Dave Messina <David.Messina at sbc.su.se>
Enviado Jue 31/12/2009 11:55 AM
Para Rui Faria <rui.faria at upf.edu>
Cc Jason Stajich <jason at bioperl.org>; sandraneto_ at hotmail.com; bioperl-l List <bioperl-l at lists.open-bio.org>
Asunto Re: question about a PAML module

Hi Rui and Sandra,

Could you file this as a bug report at 

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

?

Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report:
- sample input files (a sequence file and a tree file, probably)
- a script which reproduces the problem
- the output (error messages) like you show below

When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this.

There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon.


Dave


From rui.faria at upf.edu  Sun Jan 31 13:56:56 2010
From: rui.faria at upf.edu (Rui Faria)
Date: Sun, 31 Jan 2010 19:56:56 +0100 (CET)
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
	<18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
	<BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>
Message-ID: <11398434.1264964216856.JavaMail.oracle@rif1.s.upf.edu>

Many thanks!

We hope one day that we become experts we can retribute!

Rui

-----Mensaje Original-----
De Dave Messina <David.Messina at sbc.su.se>
Enviado Dom 31/01/2010 06:43 PM
Para Rui Faria <rui.faria at upf.edu>
Cc Jason Stajich <jason at bioperl.org>; sandraneto_ at hotmail.com; bioperl-l List <bioperl-l at lists.open-bio.org>
Asunto Re: question about a PAML module

Hey Rui,

My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet.

I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two.

Dave


From avilella at gmail.com  Sat Jan  2 03:57:28 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sat, 2 Jan 2010 08:57:28 +0000
Subject: [Bioperl-l] Downloading from dbEST by taxon range
Message-ID: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>

Hi all and happy 2010 for those that follow the Gregorian calendar,

A question that is a bit in between bioperl and NCBI. I would like to use
bioperl to download sequences fom dbEST. For that, my idea is to use
Bio::DB::Genbank and get the sequences by gi id.

Now, I want my script to download sequences for a given NCBI taxonomy clade.

For example, if I want to download all fish (clupeocephala) sequences in dbEST,
I can browse it around with the dbEST webpage using "clupeocephala[taxonomy]",
so I am thinking there should be a way to do it programmatically.

How can I query NCBI dbEST through bioperl to give me the list of GI ids I am
looking for given a taxon id?

Thanks in advance,

Albert.


From jason at bioperl.org  Sat Jan  2 11:35:22 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 2 Jan 2010 08:35:22 -0800
Subject: [Bioperl-l] Downloading from dbEST by taxon range
In-Reply-To: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
Message-ID: <D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>

DId you try Bio::DB::Query::GenBank ?
You'd want to use -db => 'nucest' and then you just put in an Entrez  
query as per the example.  you can include dates in the query so you  
can do updates to your locally retrieved data in a script that runs  
periodically.

-jason
On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:

> Hi all and happy 2010 for those that follow the Gregorian calendar,
>
> A question that is a bit in between bioperl and NCBI. I would like  
> to use
> bioperl to download sequences fom dbEST. For that, my idea is to use
> Bio::DB::Genbank and get the sequences by gi id.
>
> Now, I want my script to download sequences for a given NCBI  
> taxonomy clade.
>
> For example, if I want to download all fish (clupeocephala)  
> sequences in dbEST,
> I can browse it around with the dbEST webpage using  
> "clupeocephala[taxonomy]",
> so I am thinking there should be a way to do it programmatically.
>
> How can I query NCBI dbEST through bioperl to give me the list of GI  
> ids I am
> looking for given a taxon id?
>
> Thanks in advance,
>
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From avilella at gmail.com  Sun Jan  3 04:08:33 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 3 Jan 2010 09:08:33 +0000
Subject: [Bioperl-l] Downloading from dbEST by taxon range
In-Reply-To: <D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>
References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
	<D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>
Message-ID: <358f4d651001030108p6a92fb27k5fa39be6bebb3a9c@mail.gmail.com>

Thanks Jason!
For the sake of completion, here is the script I needed:

---------------------
#!/usr/bin/perl
use strict;

use Bio::SeqIO;
use Bio::DB::Taxonomy;
use Bio::DB::Query::GenBank;
use Bio::DB::GenBank;
use Bio::SeqIO;
use Getopt::Long;

my $keyword_type = 'EST';
my $outdir = '.';
my $taxon_name = undef;
my $db_type = 'nucest';

GetOptions('keyword_type:s' => \$keyword_type,
           't|taxon_name:s' => \$taxon_name,
           'db_type:s' => \$db_type,
           'outdir:s' => \$outdir);

my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]";
my $db = Bio::DB::Query::GenBank->new
  (-db => $db_type,
   -query => $query_string,
   -mindate => '2007',
   -maxdate => '2010');

my $taxon_name_string = $taxon_name; $taxon_name_string =~ s/\ /\_/g;
my $outfile = $outdir . "/" . $taxon_name_string . ".". $db_type . ".fasta";
my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta');

print $db->count,"\n";
my $gb = Bio::DB::GenBank->new();
my $stream = $gb->get_Stream_by_query($db);
while (my $seq = $stream->next_seq) {
  # Filtering reads shorter than 800
  next unless (length($seq->seq) > 800);
  $out->write_seq($seq);
}
$out->close;
---------------------

On Sat, Jan 2, 2010 at 4:35 PM, Jason Stajich <jason at bioperl.org> wrote:
> DId you try Bio::DB::Query::GenBank ?
> You'd want to use -db => 'nucest' and then you just put in an Entrez query
> as per the example. ?you can include dates in the query so you can do
> updates to your locally retrieved data in a script that runs periodically.
>
> -jason
> On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:
>
>> Hi all and happy 2010 for those that follow the Gregorian calendar,
>>
>> A question that is a bit in between bioperl and NCBI. I would like to use
>> bioperl to download sequences fom dbEST. For that, my idea is to use
>> Bio::DB::Genbank and get the sequences by gi id.
>>
>> Now, I want my script to download sequences for a given NCBI taxonomy
>> clade.
>>
>> For example, if I want to download all fish (clupeocephala) sequences in
>> dbEST,
>> I can browse it around with the dbEST webpage using
>> "clupeocephala[taxonomy]",
>> so I am thinking there should be a way to do it programmatically.
>>
>> How can I query NCBI dbEST through bioperl to give me the list of GI ids I
>> am
>> looking for given a taxon id?
>>
>> Thanks in advance,
>>
>> Albert.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From Jean-Marc.Frigerio at pierroton.inra.fr  Mon Jan  4 09:12:18 2010
From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA)
Date: Mon, 04 Jan 2010 15:12:18 +0100
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
Message-ID: <4B41F742.2030209@pierroton.inra.fr>

> Message: 1
> Date: Thu, 31 Dec 2009 11:26:45 +1800
> From: Peng Yu <pengyu.ut at gmail.com>
> Subject: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: bioperl-l at lists.open-bio.org
> Message-ID:
> 	<366c6f340912300926k5af5cc88nc3c3babda541fd1 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 30 Dec 2009 13:04:53 -0500
> From: Sean Davis <sdavis2 at mail.nih.gov>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: "bioperl-l at lists.open-bio.org" <bioperl-l at lists.open-bio.org>
> Message-ID:
> 	<264855a00912301004t396e0d4fwf9d291c5d82c3fb9 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
> 
> In perl, you can use an array to store the records.  You could also
> use a hash if you have reasonable keys for the entries.
> 
> Sean
> 
> 
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 30 Dec 2009 11:58:54 -0800
> From: Jason Stajich <jason at bioperl.org>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: BioPerl List <bioperl-l at lists.open-bio.org>
> Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B at bioperl.org>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> 
> or use a database object so you can retrieve sequences that have a  
> particular id. See Bio::DB::Fasta
> On Dec 30, 2009, at 10:04 AM, Sean Davis wrote:
> 
>> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> With Bio::SeqIO, I can only read in the records in a fasta file one  
>>> by
>>> one. This is preferable if there are many records in a file.
>>>
>>> But I also want to read all the records in. I could use a while loop
>>> to read all records in. But could somebody let me know if there is a
>>> function in bioperl that can read in all the record at once and  
>>> return
>>> me an object?
>> In perl, you can use an array to store the records.  You could also
>> use a hash if you have reasonable keys for the entries.
>>
>> Sean
>>
>>
>>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Wed, 30 Dec 2009 16:20:31 -0500
> From: "Mark A. Jensen" <maj at fortinbras.us>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: "Peng Yu" <pengyu.ut at gmail.com>, <bioperl-l at lists.open-bio.org>
> Message-ID: <2646F627E6D14AADB412A6E6B51E24DA at NewLife>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
> 
> I think you might want Bio::AlignIO:
> 
> $alnio = Bio::AlignIO->new(-file=> 'my.fas' );
> $aln = $alnio->next_aln;
> @seqs = $aln->each_seqs;
> 
> MAJ
> ----- Original Message ----- 
> From: "Peng Yu" <pengyu.ut at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 30, 2009 12:26 PM
> Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
> 
> 
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
>>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Hi,

I wrote and currently use a module I named Bio::SeqIO::multifasta, which 
is basically a copy of Bio::SeqIO::fasta plus a few methods:
get_by_id(), get_by_order(), first_seq() and previous_seq()

It would need review, validation etc. Do I submit it to Bugzilla ?

	-- jmf


From jason at bioperl.org  Mon Jan  4 11:03:45 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 4 Jan 2010 08:03:45 -0800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <4B41F742.2030209@pierroton.inra.fr>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
	<4B41F742.2030209@pierroton.inra.fr>
Message-ID: <16D7C8C1-E4BE-406F-9D60-379876178CAB@bioperl.org>

We typically think of SeqIO as parsing a stream of data, not being  
reliant on it being a file which is what these methods would be  
implying I think. Sounds a lot like a database - does Bio::DB::Fasta  
not provide some of the functionality you need by these methods?  I  
realize there isn't a by_order() but the get_by_id() is implemented to  
allow random access.

-jason

>
> Hi,
>
> I wrote and currently use a module I named Bio::SeqIO::multifasta,  
> which is basically a copy of Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
>
> It would need review, validation etc. Do I submit it to Bugzilla ?
>
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From avilella at gmail.com  Mon Jan  4 15:00:24 2010
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 4 Jan 2010 20:00:24 +0000
Subject: [Bioperl-l] indexed fastq files
Message-ID: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>

Hi all,

What is the best way to index fastq files, so that once clustered, I
can provide a list of seq_ids and get
them back in fastq format from the indexed db?

Cheers,

Albert.


From cjfields at illinois.edu  Mon Jan  4 16:59:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 4 Jan 2010 15:59:50 -0600
Subject: [Bioperl-l] indexed fastq files
In-Reply-To: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>
References: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>
Message-ID: <07EBA105-6A34-490C-B0B9-4772DF386CBA@illinois.edu>

Bio::Index::Fastq, maybe?  To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work.

chris

On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote:

> Hi all,
> 
> What is the best way to index fastq files, so that once clustered, I
> can provide a list of seq_ids and get
> them back in fastq format from the indexed db?
> 
> Cheers,
> 
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan  4 22:54:03 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 4 Jan 2010 21:54:03 -0600
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <4B41F742.2030209@pierroton.inra.fr>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
	<4B41F742.2030209@pierroton.inra.fr>
Message-ID: <1BAE5508-0DB7-41B4-92E3-49256582131F@illinois.edu>

Jean-Marc,

You can do that, yes.  Just curious, but have you looked at the various flat file indexing modules for FASTA?  Bio::DB::Fasta and Bio::Index::Fasta are commonly used and allow lookups by primary ID (and I think in some cases secondary IDs).

chris

On Jan 4, 2010, at 8:12 AM, Jean-Marc Frigerio INRA wrote:

> ...
> 
> Hi,
> 
> I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
> 
> It would need review, validation etc. Do I submit it to Bugzilla ?
> 
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fs5 at sanger.ac.uk  Wed Jan  6 17:16:13 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 06 Jan 2010 22:16:13 +0000
Subject: [Bioperl-l] Bio::DB::Sam strange behaviour for read pairs
Message-ID: <4B450BAD.3050807@sanger.ac.uk>

I'm trying to extract paired reads from a BAM file that span a given 
region. I would then like to get the two read ends of the sequenced 
clone that spans the region.
I use Bio::DB::Sam->get_features_by_location for this and it does give 
me the correct read pairs as a region match but it doesn't give me both 
read pairs in all cases.

Here is the script:

#!/usr/bin/perl
use Bio::DB::Sam;

my $usage = "usage: $0 BAMFILE CHROMOSOME STARTPOS ENDPOS\n" ;
my ($bam_file,$chrom,$start,$end) = @ARGV ;
die $usage unless $bam_file && $chrom && $start && $end;

my $bam = Bio::DB::Sam->new(-bam => $bam_file);

my @pairs = $bam->get_features_by_location(
    -type   => 'read_pair',
    -seq_id => $chrom,
    -start  => $start,
    -end    => $end);

print "region: $chrom:$start..$end\n" ;
foreach my $pair (@pairs) {
  print "  pair: id: ".$pair->id.", start".$pair->start.', 
end:'.$pair->end."\n";
  my ($first_mate,$second_mate) = $pair->get_SeqFeatures;
  print "    first_mate: start:".$first_mate->start.', 
end:'.$first_mate->end."\n";
  if ($second_mate){
    print "    second_mate: start:".$second_mate->start.', 
end:'.$second_mate->end."\n";
  } else {
    print "    no second mate\n";
  }
}

And here are the matching pairs that it produces with one of my files 
for the region tal12:22479..29232:
region: 
tal12:22479..29232                                                                                                                          

  pair: id: tal-2446c08, start17496, 
end:29423                                                                                                      

    first_mate: start:28540, 
end:29423                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2463d10, start23534, 
end:31363                                                                                                      

    first_mate: start:23534, 
end:24448                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2371c09, start20860, 
end:28230                                                                                                      

    first_mate: start:27604, 
end:28230                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2440b06, start19232, 
end:27099                                                                                                      

    first_mate: start:26025, 
end:27099                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2327g09, start18909, 
end:26129                                                                                                      

    first_mate: start:25354, 
end:26129                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2381b05, start25658, 
end:35054                                                                                                      

    first_mate: start:25658, 
end:26295                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2377c11, start20898, 
end:28230                                                                                                      

    first_mate: start:27473, 
end:28230                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2426e12, start21975, 
end:27562                                                                                                      

    first_mate: start:21975, 
end:23008                                                                                                               

    second_mate: start:26396, 
end:27562                                                                                                              

  pair: id: tal-2365h10, start22843, 
end:31944                                                                                                      

    first_mate: start:22843, 
end:23184                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2388h09, start19016, 
end:28238                                                                                                      

    first_mate: start:27475, 
end:28238                                                                                                               

    no second mate                   


So it finds a lot of pairs that span the region and the start/end from 
the pair is also correct but it only gives me both individual mates in 
one case:
  pair: id: tal-2426e12, start21975, 
end:27562                                                                                                      

    first_mate: start:21975, 
end:23008                                                                                                               

    second_mate: start:26396, end:27562

In this case, both pairs are actually inside the query region (at least 
partially) whereas in the other cases, one of the mates is not inside, 
e.g. this one:

  pair: id: tal-2388h09, start19016, 
end:28238                                                                                                      

    first_mate: start:27475, 
end:28238                                                                                                               

    no second mate
  
 > get this read pair from the BAM file:
$ samtools view clones.bam | grep tal-2388h09

tal-2388h09    99      tal12  19016   205     
36H9M1D14M1D664M1D16M1D21M1D28M1D15M1D10M1D12M1D7M1D8M1D5M      =       
27475   9223    
CTTTGGATGAAATAGTTTTTAAATAATACTTATTAAATATTAAATATATAACACATAAATAAGTATTGATGCAAATTTTAAAGTATTATAGAAAACTAGGTTTGATTATATTGTTATACTGTACTTTAAGAGGAGAGAGATAAGATATCTTTGCTCTTTTAATATATAAATTTAGATAAATATTCGTTAAATTTTCTACATAGTTATTTTTTATCTTATATATTATACTGCTATAGTTATCAATGTATATACATTCAAATAATTTATTAAAAATTCTATATTATATTAATTCTATGATAAAATAATCCTGTTTGTGATTTAAAAAATGATGATTCAATAAAAACTAATAATATAATACGAGTTAATATGGAATAATAAAATGGCATTTAACATGAATTTAGTCTTTAACCTTTTCTTTGTTTGTCAAGTTTTTTAAAACATAAAACCACACATTTCAAAATGGATTTTTAGCAAATATATAAAAATTATACATTTATAATGTATTGTTATGCGTCTTTTCGATAGAATCAATATTTAATTATATGAAGTTTCCACAATAAAATAATATTTAATATTATTTATTAGTAGAGTATTTGATTATATATATAGGCATATAATAATAACTCTAGTTCTATCTACCATATTATTTATAATTATTATAACAAAATGTGATATGAAATTTTATTATATACTTATATTATTTTTTTAACTATTTTAAAATATATTTATTTATACCTCAAAACTATAAAATTGAAATTATTAATAATAATCTAATATATACCTTTATAAAAATAAACGTATAAACTAAT   
 ><:4/+1+*)+4>BEH=9-,,66IIIIIIIIEDA>>>>A at DDFFIHHHHHITIIIIIHIIHHHHHHIYYYYYTTTYDDDHDDDDDDDIIINNTNHHHHHIYYYYIIIIIINNNNTTIIIIIIIIIIITTNTTTTTYYYTTTTTTYYNNNNNNLLLLLLLLLLLNNNNNNTTTTTTTTTTTTTNNTNNNTTTYYTLLLLLLTTTTTTTTYTTTNNNNNNTTTTTTTNNTTNNTTTTTTTTTTYYTTTTTTTNNNNNNTTTTTTTYYTTTTTTYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTNNINTTYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYTTTTTOOOIFFFIFIIOICC>>II@>>>>>>C>>>>>>CIBECCCHIIOOOOOOOOTTTIIFDDEIQQA:55839AA>99>@IIIIII>>::;;I;>>CC>>>>>@III<::=>AAA<>>>>I>:>>99:>842225006824855;5>68844//.//00:>::338:99<:/-+*-./0)((((+00+..,++(((+-()(*((((()*)***))3)''')*..+*++((*1++--+*''''((+/)*42.((***)+,+('*'''*((''''((,'%%''''''''(     
AS:i:614        MS:i:50
tal-2388h09    147     tal12  27475   205     1H764M40H       =       
19016   -9223   
ATTAAATCGGTATCGCCAACACAATGAGTATAATCATTGTCAAATATGCGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTAATTTTTGTGAGTATAATCATTGGAACG  
(((0))*,-1-../2((())03---03266300271+*.-0-*''''+*,+/+))*-05330+)..4>7=77273911**((+20+03688633:93036<8;::5:<99379>>::>>>:57:<:7--)))1435::333228>::>II>::>A>>3/.958677AA=AA:>:==IIII8338<>A>>>>IIIIIIIIYYYYYKKYYYMIFFFFEIIIMI::4..8AIIC>9>=EIQQQMCAAAAAACIIIIAICIIIOOYTIIIMOQQMIIIIC>>AAABCCCCCEAI>C>>IQQIIIIIIIIIIKKYYYYYYYYYYYYYYYYYYYYYTIIIIIIYYYYTNINNNTYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYSSYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTYYYYYTTTTTTYYYTTNNNNTTYYYYYYYYYTTTTLLTTNNTTTTYTTTTTTYYYYYYYYYYYTTOOKKKLKOOTYYYYYYYYYYYYYYYYTNNNNNNNNNTTTNYNNNNTNNNNTTYYYYYYYYTTNNNNTTYNNNNNITTTTTYYYYYYYYYYTTNNIIIIIDIIIIHTNNNNTTYYYYTNNNIIIIIITTTINIIIINNNNTTTYYYYIHHHDDHHDDIHDDGDFFFTIIINTTYYYYTTTTHHHHCCIIIHIHHHHCAI9:++**1168>ACCIIDDDDDDI>>>>>?NNN  
AS:i:688        MS:i:50

So the read in the first line starts before the start of the query 
region and is not accessible via $pair->get_SeqFeatures although this is 
a valid pair.
Am I doing something wrong, is this the desired behaviour or is it a bug?

Thanks for your help!


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From hlapp at drycafe.net  Thu Jan  7 11:55:00 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 7 Jan 2010 11:55:00 -0500
Subject: [Bioperl-l] Data missing into Annotation object using
	Bio::SeqIO (Genbank)
In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
References: <4B28EB44.3080006@pasteur.fr>
	<29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
Message-ID: <240F198A-83FA-4304-ACA8-80A702A68D8C@drycafe.net>

I don't know to what extent this was followed up on further and I  
guess it's too long ago to be of much help, but if it hasn't been  
mentioned before I wanted to point out  
Bio::SeqFeature::AnnotationAdaptor which integrates tag/value  
annotation and Bio::Annotation annotation into one  
AnnotationCollection, so it doesn't matter whether something is  
attached as a tag or as an annotation object.

	-hilmar

On Dec 16, 2009, at 10:09 AM, Chris Fields wrote:

> Emmanuel,
>
> The previous behavior in the 1.5.x series was to store feature tags  
> as Bio::Annotation.  The problem had been the way this was  
> implemented was considered unsatisfactory for various reasons, so we  
> reverted back to using simple tag-value pairs as the default.  You  
> can get at the data this way (from the Feature/Annotation HOWTO):
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>    print "primary tag: ", $feat_object->primary_tag, "\n";
>    for my $tag ($feat_object->get_all_tags) {
>        print "  tag: ", $tag, "\n";
>        for my $value ($feat_object->get_tag_values($tag)) {
>            print "    value: ", $value, "\n";
>        }
>    }
> }
>
> You can also convert all the tag-value data into a  
> Bio::Annotation::Collection using the  
> Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.
>
> chris
>
> On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote:
>
>> Hi,
>>
>> I've wrote a small Genbank parser few months ago before BioPerl  
>> release 1.6.0.
>> I tried to use my code once again but now the output of my parser  
>> is empty.
>> It looks like Annotation from seqfeatures is not filled anymore.
>>
>> Here is the code I used previously:
>>
>> while(my $seq = $streamer->next_seq()){
>>
>>   #We only want to retrieve CDS features...
>>   foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq- 
>> >get_SeqFeatures()){
>>       print $ofh join("#",
>>                       $feat->annotation()- 
>> >get_Annotations('locus_tag'),    # Acc num
>>                       $feat->annotation()->get_Annotations('gene')
>>                         ? $feat->annotation()- 
>> >get_Annotations('gene')      # Gene name
>>                         : $feat->annotation()- 
>> >get_Annotations('locus_tag'),
>>                       $feat->annotation()- 
>> >get_Annotations('product'),      # Description
>>                      ),"\n";
>>   }
>> }
>>
>> $feat is a Bio::SeqFeature::Generic object
>>
>> If I print Dumper($feat->annotation()) here is the output :
>>
>> $VAR1 = bless( {
>>                '_typemap' => bless( {
>>                                       '_type' => {
>>                                                    'comment' =>  
>> 'Bio::Annotation::Comment',
>>                                                    'reference' =>  
>> 'Bio::Annotation::Reference',
>>                                                    'dblink' =>  
>> 'Bio::Annotation::DBLink'
>>                                                  }
>>                                     },  
>> 'Bio::Annotation::TypeManager' ),
>>                '_annotation' => {}
>>              }, 'Bio::Annotation::Collection' );
>>
>> Have some changes been made into the way annotation object is  
>> populated?
>>
>> Thanks for any clue and sorry if my question look stupid
>>
>> Regards
>>
>> Emmanuel
>>
>> -- 
>> -------------------------
>> Emmanuel Quevillon
>> Biological Software and Databases Group
>> Institut Pasteur
>> +33 1 44 38 95 98
>> tuco at_ pasteur dot fr
>> -------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From rtbio.2009 at gmail.com  Fri Jan  8 10:00:21 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 8 Jan 2010 16:00:21 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
Message-ID: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>

Hello all,

I was trying Remote blast using Bioperl. My input data is a Trypanosoma
brucei sequence in Fasta format. When I was trying to submit to BLAST using
the step
$r=$factory->submit_blast($input)
It was not returning anything which I checked by debugging the code. It is
not blasting my input sequence even though I mentioned all the parameters.I
would paste the code below.

Please help me in solving put this problem. It is very urgent.

Regards
Roopa.

#!/usr/bin/perl

#path for extra camel module
use lib "/srv/www/htdocs/rain/RNAi/";
use Roopablast;


use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds. Roopa";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";


open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
 This page will automatically reload in 30 seconds Roopa <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

#$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= 'Trypanosoma Brucei';

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

            # open(OUTFILE,'>',$debugfile);
             #  print OUTFILE @params;
             # close(OUTFILE);


 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => 'Trypanosoma Brucei' );


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);    #The program stops here it
does not return any value and it does not enter the While loop,Please help
me in this regard.#
                open(OUTFILE,'>',$debugfile);
                print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
               print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
               print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
               print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
               print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);

}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

    print OUTFILE substr ($in{'Inputseq'}, $i, 1);

    if ( ($i+1)%10==0){
        print OUTFILE " ";
    }
    if ( ($i+1)%60==0){
        print OUTFILE "<br>\n";
    }
}


print OUTFILE "</font> <p>";

$z=@compseqs;

for($k=1;$k<$z;$k++) {
    print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
Sequence: <br>";

    for ($i=0; $i<length ($compseqs[$k]); $i++) {

        print OUTFILE substr ($compseqs[$k], $i, 1);

        if ( ($i+1)%10==0){
            print OUTFILE " ";
        }
        if ( ($i+1)%60==0){
            print OUTFILE "<br>\n";
        }
    }
    print OUTFILE "<p></font>";
}

print OUTFILE "<p>
Window: <br>$in{'Windowsize'}
<p>
<p>
Threshold: <br>$in{'Threshold'}
<p>";
my $j=0;

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

    if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
        if ($out[$i]->{similar}<=$in{'Threshold'}){
            $j=$in{'Windowsize'};
        }
        $height=$out[$i]->{similar}*5;
    }

    if ($j>0) {
        print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
height=\"5\">";
        $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
1)."</font>";
        $j--;
    }
    else {
        print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
height=\"5\">";
        $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
1)."</font>";
    }

    if ( ($i+1)%10==0){
        $outstring .= " ";
    }
    if ( ($i+1)%60==0){
        $outstring .= "<br>\n";

    }
    if ( ($i+1)%800==0){
        print OUTFILE "<br><br>\n";

    }
}

print OUTFILE "<br><br><font face=\"Courier, monospace font
set\">$outstring</font>";

#foreach (@out) {
#print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
#if ($_->{similar}<=$in{'Threshold'}){

#    }
#}

print OUTFILE "</BODY>\n</HTML>\n";

close OUTFILE;

#nameprint();

sub parse_form {
    local ($buffer, @pairs, $pair, $name, $value);
    # Read in text
    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
    if ($ENV{'REQUEST_METHOD'} eq "POST")
    {
        read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    }
    else
    {
        $buffer = $ENV{'QUERY_STRING'};
    }
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs)
    {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $in{$name} = $value;
    }
}


From maj at fortinbras.us  Fri Jan  8 10:36:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 8 Jan 2010 10:36:41 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
Message-ID: <F19004692A4A4350856B23DF25E09074@NewLife>

Hi Roopa--

I got your code to work with the following changes:

+# the input should be a valid FASTA file...
 ...
 open(NUC,'>',$nuc);
+print NUC ">seq (need a name line for valid fasta)\n";
 print NUC $inpu1, "\n";
 close(NUC);
...

+# you can set these header parms in the call itself...
- my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
+ my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => 
''Trypanosoma Brucei[ORGN]');

  #change a paramter
+# commented this out...
+# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma 
Brucei[ORGN]';

MAJ
----- Original Message ----- 
From: "Roopa Raghuveer" <rtbio.2009 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 08, 2010 10:00 AM
Subject: [Bioperl-l] Regarding blast in Bioperl


> Hello all,
>
> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
> brucei sequence in Fasta format. When I was trying to submit to BLAST using
> the step
> $r=$factory->submit_blast($input)
> It was not returning anything which I checked by debugging the code. It is
> not blasting my input sequence even though I mentioned all the parameters.I
> would paste the code below.
>
> Please help me in solving put this problem. It is very urgent.
>
> Regards
> Roopa.
>
> #!/usr/bin/perl
>
> #path for extra camel module
> use lib "/srv/www/htdocs/rain/RNAi/";
> use Roopablast;
>
>
> use Bio::SearchIO;
> use Bio::Search::Result::BlastResult;
> use Bio::Perl;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> $serverpath = "/srv/www/htdocs/rain/RNAi";
> $serverurl = "http://141.84.66.66/rain/RNAi";
> $outfile = $serverpath."/rnairesult_".time().".html";
> $nuc = $serverpath."/nuc".time().".txt";
> $debugfile = $serverpath."/debug_".time().".txt";
> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>
> my $outstring ="";
>
> &parse_form;
>
> print "Content-type: text/html\n\n";
> print "<HTML>\n";
> print "<head><title>RNAi Result</title>";
> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl/rnairesult_".time().".html\"> \n";
> print "</head>\n";
> print "<body>\n";
> print " Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>";
> print " Please be patient, runtime can be up to 5 minutes<br>";
> print " This page will automatically reload in 30 seconds. Roopa";
> print "</BODY>\n";
> print "</HTML>\n";
>
> defined(my $pid = fork) or die "Can't fork: $!";
> exit if $pid;
> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>
>
>
> open(OUTFILE, '>',$outfile);
>
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl//rnairesult_".time().".html\"> \n
> <meta http-equiv=\"expires\" content=\"0\">
> </head>\n
> <body>\n
>  Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>
>  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
> This page will automatically reload in 30 seconds Roopa <br>
> </BODY>\n
> </HTML>\n";
>
> close(OUTFILE);
>
>
> @compseqs = blastcode($in{'Inputseq'});
>
> $in{'Inputseq'} =~ s/>.*$//m;
> $in{'Inputseq'} =~ s/[^TAGC]//gim;
> $in{'Inputseq'} =~ tr/actg/ACTG/;
>
> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
> $in{'Threshold'});
>
>
> sub blastcode
> {
>
> $inpu1= $_[0];
>
> #$organ= $_[1];
>
> open(NUC,'>',$nuc);
> print NUC $inpu1;
> close(NUC);
>
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= 'Trypanosoma Brucei';
>
> $gb = new Bio::DB::GenBank;
>
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>         '-Organism'   => $organism );
>
>            # open(OUTFILE,'>',$debugfile);
>             #  print OUTFILE @params;
>             # close(OUTFILE);
>
>
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>
>  #change a paramter
>
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
>
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>
>  my $v = 1;
>  #$v is just to turn on and off the messages
>
> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => 'Trypanosoma Brucei' );
>
>
> while (my $input = $str->next_seq())
> {
>   #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
>
>             open(OUTFILE,'>',$debugfile);
>               print OUTFILE $input;
>              close(OUTFILE);
>
>
>   my $r = $factory->submit_blast($input);    #The program stops here it
> does not return any value and it does not enter the While loop,Please help
> me in this regard.#
>                open(OUTFILE,'>',$debugfile);
>                print OUTFILE $r;
>                close(OUTFILE);
>
>
>   print STDERR "waiting...." if($v>0);
>
>  while ( my @rids = $factory->each_rid ) {
>      open(OUTFILE,'>',$debugfile);
>               print OUTFILE "while entered";
>              close(OUTFILE);
>     foreach my $rid ( @rids ) {
>
>               open(OUTFILE,'>',$debugfile);
>               print OUTFILE "foreach entered";
>              close(OUTFILE);
>
>        my $rc = $factory->retrieve_blast($rid);
>
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         open(OUTFILE,'>',$debugfile);
>               print OUTFILE "if entered";
>              close(OUTFILE);
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>              open(OUTFILE,'>',$debugfile);
>               print OUTFILE "else entered";
>              close(OUTFILE);
>
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>
>          open(BLASTDEBUGFILE,'>',$blastdebugfile);
>          print BLASTDEBUGFILE $result->next_hit();
>          close(BLASTDEBUGFILE);
>
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
>
>         $factory->save_output($filename);
>
>       # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
>
>       $factory->remove_rid($rid);
>
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
>
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
>
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
>
>   while ( my $hit = $result->next_hit ) {
>
>            next unless ( $v > 0);
>
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
>
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
>
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
>
> return(@seqs);
>
> }
>
> open(OUTFILE, '>',$outfile) || die ;
>
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <meta http-equiv=\"expires\" content=\"0\"></head>\n
> <body>\n
> <p><font face=\"Courier, monospace font set\">
> Inputsequence: <br>";
>
> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>
>    print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>
>    if ( ($i+1)%10==0){
>        print OUTFILE " ";
>    }
>    if ( ($i+1)%60==0){
>        print OUTFILE "<br>\n";
>    }
> }
>
>
>
> print OUTFILE "</font> <p>";
>
> $z=@compseqs;
>
> for($k=1;$k<$z;$k++) {
>    print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
> Sequence: <br>";
>
>    for ($i=0; $i<length ($compseqs[$k]); $i++) {
>
>        print OUTFILE substr ($compseqs[$k], $i, 1);
>
>        if ( ($i+1)%10==0){
>            print OUTFILE " ";
>        }
>        if ( ($i+1)%60==0){
>            print OUTFILE "<br>\n";
>        }
>    }
>    print OUTFILE "<p></font>";
> }
>
> print OUTFILE "<p>
> Window: <br>$in{'Windowsize'}
> <p>
> <p>
> Threshold: <br>$in{'Threshold'}
> <p>";
> my $j=0;
>
> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>
>    if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>        if ($out[$i]->{similar}<=$in{'Threshold'}){
>            $j=$in{'Windowsize'};
>        }
>        $height=$out[$i]->{similar}*5;
>    }
>
>    if ($j>0) {
>        print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
> height=\"5\">";
>        $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
> 1)."</font>";
>        $j--;
>    }
>    else {
>        print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
> height=\"5\">";
>        $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
> 1)."</font>";
>    }
>
>    if ( ($i+1)%10==0){
>        $outstring .= " ";
>    }
>    if ( ($i+1)%60==0){
>        $outstring .= "<br>\n";
>
>    }
>    if ( ($i+1)%800==0){
>        print OUTFILE "<br><br>\n";
>
>    }
> }
>
> print OUTFILE "<br><br><font face=\"Courier, monospace font
> set\">$outstring</font>";
>
> #foreach (@out) {
> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
> #if ($_->{similar}<=$in{'Threshold'}){
>
> #    }
> #}
>
> print OUTFILE "</BODY>\n</HTML>\n";
>
> close OUTFILE;
>
> #nameprint();
>
> sub parse_form {
>    local ($buffer, @pairs, $pair, $name, $value);
>    # Read in text
>    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>    if ($ENV{'REQUEST_METHOD'} eq "POST")
>    {
>        read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>    }
>    else
>    {
>        $buffer = $ENV{'QUERY_STRING'};
>    }
>    @pairs = split(/&/, $buffer);
>    foreach $pair (@pairs)
>    {
>        ($name, $value) = split(/=/, $pair);
>        $value =~ tr/+/ /;
>        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>        $in{$name} = $value;
>    }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From julian.onions at gmail.com  Fri Jan  8 11:53:50 2010
From: julian.onions at gmail.com (Julian Onions)
Date: Fri, 8 Jan 2010 16:53:50 +0000
Subject: [Bioperl-l] Cladogram construction
Message-ID: <cbeabfd41001080853m50c75779q4155cd02af17670a@mail.gmail.com>

Does anyone have any sample code for building cladograms based on Pars (one
of Phylip tools) type format (or any other format actually)
I've got something sort of working but I get no weights on the tree -
everything appears as nan. I'd also like to set one of the species to be an
outgroup. This is the closest sample I've found so far.


#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
use Bio::Tree::DistanceFactory;
use Bio::Align::ProteinStatistics;
use Bio::TreeIO;
use Bio::Tree::Draw::Cladogram;
my $alnfile = shift @ARGV || die "need a file to run";

my $input= Bio::AlignIO->new(-format => 'fasta',
    -file    => $alnfile);

if( my $aln = $input->next_aln ) {
 my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ');
 my $stats = Bio::Align::ProteinStatistics->new;
 my $distmat = $stats->distance(-align => $aln,
         -method => 'Kimura');
 my $treeout = Bio::TreeIO->new(-format => 'newick');
 my $tree = $dfactory->make_tree($distmat);
 $treeout->write_tree($tree);
  my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree    => $tree,
                                             -compact => 0);
  $obj1->print(-file => "tree.eps");
} else {
 die "could not find any alignments in the file $alnfile";
}


Pars input looks like
3 4
Robin   101
Blackbird 100
Sparrow 100


Thanks,
Julian.


From rtbio.2009 at gmail.com  Sat Jan  9 11:57:09 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Sat, 9 Jan 2010 17:57:09 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <F19004692A4A4350856B23DF25E09074@NewLife>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<F19004692A4A4350856B23DF25E09074@NewLife>
Message-ID: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>

Hello all,

Thanks alot for your reply Mark. It was working for Trypanosoma brucei as
the organism parameter,but when I tried to use the Organism parameter from
the user,it was not working i.e., I was unable to get the target sequences.
Please help me in this regard. My code is

#!/usr/bin/perl

#path for extra camel module
use lib "/srv/www/htdocs/rain/RNAi/";
use Roopablast;


use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds. Roopa";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";

open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
 This page will automatically reload in 30 seconds Roopa <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'},$in{'Organism'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1,"\n";
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
        '-Organism'   => $organism );

             open(OUTFILE,'>',$debugfile);
             print OUTFILE $inpu1;
              close(OUTFILE);


my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
'$organ[ORGN]');

 #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => $organ );


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             #open(OUTFILE,'>',$debugfile);
              # print OUTFILE $input;
              #close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);

   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
   #   open(OUTFILE,'>',$debugfile);
    #           print OUTFILE "while entered";
     #         close(OUTFILE);
     foreach my $rid ( @rids ) {

      #         open(OUTFILE,'>',$debugfile);
       #        print OUTFILE "foreach entered";
        #      close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
         #      print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          #    open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "else entered";
            #  close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);
  # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);

}

Regards,
Roopa.


On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> Hi Roopa--
>
> I got your code to work with the following changes:
>
> +# the input should be a valid FASTA file...
> ...
> open(NUC,'>',$nuc);
> +print NUC ">seq (need a name line for valid fasta)\n";
> print NUC $inpu1, "\n";
> close(NUC);
> ...
>
> +# you can set these header parms in the call itself...
> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
> ''Trypanosoma Brucei[ORGN]');
>
>  #change a paramter
> +# commented this out...
> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
>
> MAJ
> ----- Original Message ----- From: "Roopa Raghuveer" <rtbio.2009 at gmail.com
> >
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 08, 2010 10:00 AM
> Subject: [Bioperl-l] Regarding blast in Bioperl
>
>
>  Hello all,
>>
>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
>> brucei sequence in Fasta format. When I was trying to submit to BLAST
>> using
>> the step
>> $r=$factory->submit_blast($input)
>> It was not returning anything which I checked by debugging the code. It is
>> not blasting my input sequence even though I mentioned all the
>> parameters.I
>> would paste the code below.
>>
>> Please help me in solving put this problem. It is very urgent.
>>
>> Regards
>> Roopa.
>>
>> #!/usr/bin/perl
>>
>> #path for extra camel module
>> use lib "/srv/www/htdocs/rain/RNAi/";
>> use Roopablast;
>>
>>
>> use Bio::SearchIO;
>> use Bio::Search::Result::BlastResult;
>> use Bio::Perl;
>> use Bio::Tools::Run::RemoteBlast;
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::DB::GenBank;
>>
>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>> $serverurl = "http://141.84.66.66/rain/RNAi";
>> $outfile = $serverpath."/rnairesult_".time().".html";
>> $nuc = $serverpath."/nuc".time().".txt";
>> $debugfile = $serverpath."/debug_".time().".txt";
>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>
>> my $outstring ="";
>>
>> &parse_form;
>>
>> print "Content-type: text/html\n\n";
>> print "<HTML>\n";
>> print "<head><title>RNAi Result</title>";
>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>> print "</head>\n";
>> print "<body>\n";
>> print " Your results will appear <a
>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>> print " Please be patient, runtime can be up to 5 minutes<br>";
>> print " This page will automatically reload in 30 seconds. Roopa";
>> print "</BODY>\n";
>> print "</HTML>\n";
>>
>> defined(my $pid = fork) or die "Can't fork: $!";
>> exit if $pid;
>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>
>>
>>
>> open(OUTFILE, '>',$outfile);
>>
>> print OUTFILE "<HTML>\n
>> <head><title>RNAi Result</title>
>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>> URL=$serverurl//rnairesult_".time().".html\"> \n
>> <meta http-equiv=\"expires\" content=\"0\">
>> </head>\n
>> <body>\n
>>  Your results will appear <a
>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>  Please be patient, runtime can be up to 5 minutes wait wait
>> wait......<br>
>> This page will automatically reload in 30 seconds Roopa <br>
>> </BODY>\n
>> </HTML>\n";
>>
>> close(OUTFILE);
>>
>>
>> @compseqs = blastcode($in{'Inputseq'});
>>
>> $in{'Inputseq'} =~ s/>.*$//m;
>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>
>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>> $in{'Threshold'});
>>
>>
>> sub blastcode
>> {
>>
>> $inpu1= $_[0];
>>
>> #$organ= $_[1];
>>
>> open(NUC,'>',$nuc);
>> print NUC $inpu1;
>> close(NUC);
>>
>> my $prog = 'blastn';
>> my $db   = 'refseq_rna';
>> my $e_val= '1e-10';
>> my $organism= 'Trypanosoma Brucei';
>>
>> $gb = new Bio::DB::GenBank;
>>
>> my @params = ( '-prog' => $prog,
>>        '-data' => $db,
>>        '-expect' => $e_val,
>>        '-readmethod' => 'SearchIO',
>>        '-Organism'   => $organism );
>>
>>           # open(OUTFILE,'>',$debugfile);
>>            #  print OUTFILE @params;
>>            # close(OUTFILE);
>>
>>
>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>
>>  #change a paramter
>>
>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>> Brucei[ORGN]';
>>
>> #change a paramter
>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>>
>>  my $v = 1;
>>  #$v is just to turn on and off the messages
>>
>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>> '-organism' => 'Trypanosoma Brucei' );
>>
>>
>> while (my $input = $str->next_seq())
>> {
>>  #Blast a sequence against a database:
>>   #Alternatively, you could  pass in a file with many
>>   #sequences rather than loop through sequence one at a time
>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>   #and swap the two lines below for an example of that.
>>
>>            open(OUTFILE,'>',$debugfile);
>>              print OUTFILE $input;
>>             close(OUTFILE);
>>
>>
>>  my $r = $factory->submit_blast($input);    #The program stops here it
>> does not return any value and it does not enter the While loop,Please help
>> me in this regard.#
>>               open(OUTFILE,'>',$debugfile);
>>               print OUTFILE $r;
>>               close(OUTFILE);
>>
>>
>>  print STDERR "waiting...." if($v>0);
>>
>>  while ( my @rids = $factory->each_rid ) {
>>     open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "while entered";
>>             close(OUTFILE);
>>    foreach my $rid ( @rids ) {
>>
>>              open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "foreach entered";
>>             close(OUTFILE);
>>
>>       my $rc = $factory->retrieve_blast($rid);
>>
>>       if( !ref($rc) )
>>       {
>>       if( $rc < 0 )
>>       {
>>       $factory->remove_rid($rid);
>>       }
>>        open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "if entered";
>>             close(OUTFILE);
>>        print STDERR "." if ( $v > 0 );
>>        sleep 5;
>>       }
>>      else {
>>             open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "else entered";
>>             close(OUTFILE);
>>
>>         my $result = $rc->next_result();
>>        #save the output
>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>
>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>         print BLASTDEBUGFILE $result->next_hit();
>>         close(BLASTDEBUGFILE);
>>
>>       my $filename =
>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>
>>        # open(DEBUGFILE,'>',$debugfile);
>>        # open(new,'>',$filename);
>>        # @arra=<new>;
>>        # print DEBUGFILE @arra;
>>        # close(DEBUGFILE);
>>        # close(new);
>>
>>        $factory->save_output($filename);
>>
>>      # open(BLASTDEBUGFILE,'>',$debugfile);
>>      # print BLASTDEBUGFILE  "Hello $rid";
>>      # close(BLASTDEBUGFILE);
>>
>>      $factory->remove_rid($rid);
>>
>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>      print BLASTDEBUGFILE  $organism;
>>       close(BLASTDEBUGFILE);
>>
>>   # open(OUTFILE,'>',$outfile);
>>   # print OUTFILE "Test2 $result->database_name()";
>>   # close(OUTFILE);
>>
>> #$hit = $result->next_hit;
>> #open(new,'>',$debugfile);
>> #print $hit;
>> #close(new);
>>
>>  while ( my $hit = $result->next_hit ) {
>>
>>           next unless ( $v > 0);
>>
>>         #     open(OUTFILE,'>',$debugfile);
>>          #    print OUTFILE "$hit in while hits";
>>           #  close(OUTFILE);
>>
>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>          my $dna = $sequ->seq();        # get the sequence as a string
>>                 push(@seqs,$dna);
>>         }
>>       }
>>     }
>>   }
>>  }
>>
>>  #open(OUTFILE,'>',$debugfile);
>>  #print OUTFILE $seqs[0];
>>  #close(OUTFILE);
>>
>> return(@seqs);
>>
>> }
>>
>> open(OUTFILE, '>',$outfile) || die ;
>>
>> print OUTFILE "<HTML>\n
>> <head><title>RNAi Result</title>
>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>> <body>\n
>> <p><font face=\"Courier, monospace font set\">
>> Inputsequence: <br>";
>>
>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>
>>   print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>
>>   if ( ($i+1)%10==0){
>>       print OUTFILE " ";
>>   }
>>   if ( ($i+1)%60==0){
>>       print OUTFILE "<br>\n";
>>   }
>> }
>>
>>
>>
>> print OUTFILE "</font> <p>";
>>
>> $z=@compseqs;
>>
>> for($k=1;$k<$z;$k++) {
>>   print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
>> Sequence: <br>";
>>
>>   for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>
>>       print OUTFILE substr ($compseqs[$k], $i, 1);
>>
>>       if ( ($i+1)%10==0){
>>           print OUTFILE " ";
>>       }
>>       if ( ($i+1)%60==0){
>>           print OUTFILE "<br>\n";
>>       }
>>   }
>>   print OUTFILE "<p></font>";
>> }
>>
>> print OUTFILE "<p>
>> Window: <br>$in{'Windowsize'}
>> <p>
>> <p>
>> Threshold: <br>$in{'Threshold'}
>> <p>";
>> my $j=0;
>>
>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>
>>   if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>       if ($out[$i]->{similar}<=$in{'Threshold'}){
>>           $j=$in{'Windowsize'};
>>       }
>>       $height=$out[$i]->{similar}*5;
>>   }
>>
>>   if ($j>0) {
>>       print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>> height=\"5\">";
>>       $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
>> 1)."</font>";
>>       $j--;
>>   }
>>   else {
>>       print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>> height=\"5\">";
>>       $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
>> 1)."</font>";
>>   }
>>
>>   if ( ($i+1)%10==0){
>>       $outstring .= " ";
>>   }
>>   if ( ($i+1)%60==0){
>>       $outstring .= "<br>\n";
>>
>>   }
>>   if ( ($i+1)%800==0){
>>       print OUTFILE "<br><br>\n";
>>
>>   }
>> }
>>
>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>> set\">$outstring</font>";
>>
>> #foreach (@out) {
>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
>> #if ($_->{similar}<=$in{'Threshold'}){
>>
>> #    }
>> #}
>>
>> print OUTFILE "</BODY>\n</HTML>\n";
>>
>> close OUTFILE;
>>
>> #nameprint();
>>
>> sub parse_form {
>>   local ($buffer, @pairs, $pair, $name, $value);
>>   # Read in text
>>   $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>   if ($ENV{'REQUEST_METHOD'} eq "POST")
>>   {
>>       read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>   }
>>   else
>>   {
>>       $buffer = $ENV{'QUERY_STRING'};
>>   }
>>   @pairs = split(/&/, $buffer);
>>   foreach $pair (@pairs)
>>   {
>>       ($name, $value) = split(/=/, $pair);
>>       $value =~ tr/+/ /;
>>       $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>       $in{$name} = $value;
>>   }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>


From maj at fortinbras.us  Sat Jan  9 13:05:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 9 Jan 2010 13:05:41 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com><F19004692A4A4350856B23DF25E09074@NewLife>
	<c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
Message-ID: <4C2E8133F916495B876628EF3E8FCBB2@NewLife>

I see it immediately (from making same bug many times) :

 my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
- '$organ[ORGN]');
+"$organ[ORGN]");

MAJ

----- Original Message ----- 
From: "Roopa Raghuveer" <rtbio.2009 at gmail.com>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Saturday, January 09, 2010 11:57 AM
Subject: Re: [Bioperl-l] Regarding blast in Bioperl


> Hello all,
> 
> Thanks alot for your reply Mark. It was working for Trypanosoma brucei as
> the organism parameter,but when I tried to use the Organism parameter from
> the user,it was not working i.e., I was unable to get the target sequences.
> Please help me in this regard. My code is
> 
> #!/usr/bin/perl
> 
> #path for extra camel module
> use lib "/srv/www/htdocs/rain/RNAi/";
> use Roopablast;
> 
> 
> use Bio::SearchIO;
> use Bio::Search::Result::BlastResult;
> use Bio::Perl;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> 
> $serverpath = "/srv/www/htdocs/rain/RNAi";
> $serverurl = "http://141.84.66.66/rain/RNAi";
> $outfile = $serverpath."/rnairesult_".time().".html";
> $nuc = $serverpath."/nuc".time().".txt";
> $debugfile = $serverpath."/debug_".time().".txt";
> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
> my $outstring ="";
> 
> &parse_form;
> 
> print "Content-type: text/html\n\n";
> print "<HTML>\n";
> print "<head><title>RNAi Result</title>";
> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl/rnairesult_".time().".html\"> \n";
> print "</head>\n";
> print "<body>\n";
> print " Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>";
> print " Please be patient, runtime can be up to 5 minutes<br>";
> print " This page will automatically reload in 30 seconds. Roopa";
> print "</BODY>\n";
> print "</HTML>\n";
> 
> defined(my $pid = fork) or die "Can't fork: $!";
> exit if $pid;
> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
> 
> open(OUTFILE, '>',$outfile);
> 
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl//rnairesult_".time().".html\"> \n
> <meta http-equiv=\"expires\" content=\"0\">
> </head>\n
> <body>\n
>  Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>
>  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
> This page will automatically reload in 30 seconds Roopa <br>
> </BODY>\n
> </HTML>\n";
> 
> close(OUTFILE);
> 
> 
> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
> 
> $in{'Inputseq'} =~ s/>.*$//m;
> $in{'Inputseq'} =~ s/[^TAGC]//gim;
> $in{'Inputseq'} =~ tr/actg/ACTG/;
> 
> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
> $in{'Threshold'});
> 
> 
> sub blastcode
> {
> 
> $inpu1= $_[0];
> 
> $organ= $_[1];
> 
> open(NUC,'>',$nuc);
> print NUC $inpu1,"\n";
> close(NUC);
> 
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= $organ;
> 
> $gb = new Bio::DB::GenBank;
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>        '-Organism'   => $organism );
> 
>             open(OUTFILE,'>',$debugfile);
>             print OUTFILE $inpu1;
>              close(OUTFILE);
> 
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
> '$organ[ORGN]');
> 
> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
>  #change a paramter
> 
> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
> 
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
> 
>  my $v = 1;
>  #$v is just to turn on and off the messages
> 
> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => $organ );
> 
> 
> while (my $input = $str->next_seq())
> {
>   #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
> 
>             #open(OUTFILE,'>',$debugfile);
>              # print OUTFILE $input;
>              #close(OUTFILE);
> 
> 
>   my $r = $factory->submit_blast($input);
> 
>                open(OUTFILE,'>',$debugfile);
>             #   print OUTFILE $r;
>                close(OUTFILE);
> 
>   print STDERR "waiting...." if($v>0);
> 
>  while ( my @rids = $factory->each_rid ) {
>   #   open(OUTFILE,'>',$debugfile);
>    #           print OUTFILE "while entered";
>     #         close(OUTFILE);
>     foreach my $rid ( @rids ) {
> 
>      #         open(OUTFILE,'>',$debugfile);
>       #        print OUTFILE "foreach entered";
>        #      close(OUTFILE);
> 
>        my $rc = $factory->retrieve_blast($rid);
> 
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         open(OUTFILE,'>',$debugfile);
>         #      print OUTFILE "if entered";
>              close(OUTFILE);
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>          #    open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "else entered";
>            #  close(OUTFILE);
> 
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>          open(BLASTDEBUGFILE,'>',$blastdebugfile);
>          print BLASTDEBUGFILE $result->next_hit();
>          close(BLASTDEBUGFILE);
> 
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
> 
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
> 
>         $factory->save_output($filename);
>  # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
> 
>       $factory->remove_rid($rid);
> 
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
> 
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> 
>   while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
> 
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
> 
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
> 
> return(@seqs);
> 
> }
> 
> Regards,
> Roopa.
> 
> 
> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> 
>> Hi Roopa--
>>
>> I got your code to work with the following changes:
>>
>> +# the input should be a valid FASTA file...
>> ...
>> open(NUC,'>',$nuc);
>> +print NUC ">seq (need a name line for valid fasta)\n";
>> print NUC $inpu1, "\n";
>> close(NUC);
>> ...
>>
>> +# you can set these header parms in the call itself...
>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
>> ''Trypanosoma Brucei[ORGN]');
>>
>>  #change a paramter
>> +# commented this out...
>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>> Brucei[ORGN]';
>>
>> MAJ
>> ----- Original Message ----- From: "Roopa Raghuveer" <rtbio.2009 at gmail.com
>> >
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 08, 2010 10:00 AM
>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>
>>
>>  Hello all,
>>>
>>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
>>> brucei sequence in Fasta format. When I was trying to submit to BLAST
>>> using
>>> the step
>>> $r=$factory->submit_blast($input)
>>> It was not returning anything which I checked by debugging the code. It is
>>> not blasting my input sequence even though I mentioned all the
>>> parameters.I
>>> would paste the code below.
>>>
>>> Please help me in solving put this problem. It is very urgent.
>>>
>>> Regards
>>> Roopa.
>>>
>>> #!/usr/bin/perl
>>>
>>> #path for extra camel module
>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>> use Roopablast;
>>>
>>>
>>> use Bio::SearchIO;
>>> use Bio::Search::Result::BlastResult;
>>> use Bio::Perl;
>>> use Bio::Tools::Run::RemoteBlast;
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>> use Bio::DB::GenBank;
>>>
>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>> $nuc = $serverpath."/nuc".time().".txt";
>>> $debugfile = $serverpath."/debug_".time().".txt";
>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>
>>> my $outstring ="";
>>>
>>> &parse_form;
>>>
>>> print "Content-type: text/html\n\n";
>>> print "<HTML>\n";
>>> print "<head><title>RNAi Result</title>";
>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>> print "</head>\n";
>>> print "<body>\n";
>>> print " Your results will appear <a
>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>> print " This page will automatically reload in 30 seconds. Roopa";
>>> print "</BODY>\n";
>>> print "</HTML>\n";
>>>
>>> defined(my $pid = fork) or die "Can't fork: $!";
>>> exit if $pid;
>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>
>>>
>>>
>>> open(OUTFILE, '>',$outfile);
>>>
>>> print OUTFILE "<HTML>\n
>>> <head><title>RNAi Result</title>
>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>> <meta http-equiv=\"expires\" content=\"0\">
>>> </head>\n
>>> <body>\n
>>>  Your results will appear <a
>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>> wait......<br>
>>> This page will automatically reload in 30 seconds Roopa <br>
>>> </BODY>\n
>>> </HTML>\n";
>>>
>>> close(OUTFILE);
>>>
>>>
>>> @compseqs = blastcode($in{'Inputseq'});
>>>
>>> $in{'Inputseq'} =~ s/>.*$//m;
>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>
>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>> $in{'Threshold'});
>>>
>>>
>>> sub blastcode
>>> {
>>>
>>> $inpu1= $_[0];
>>>
>>> #$organ= $_[1];
>>>
>>> open(NUC,'>',$nuc);
>>> print NUC $inpu1;
>>> close(NUC);
>>>
>>> my $prog = 'blastn';
>>> my $db   = 'refseq_rna';
>>> my $e_val= '1e-10';
>>> my $organism= 'Trypanosoma Brucei';
>>>
>>> $gb = new Bio::DB::GenBank;
>>>
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO',
>>>        '-Organism'   => $organism );
>>>
>>>           # open(OUTFILE,'>',$debugfile);
>>>            #  print OUTFILE @params;
>>>            # close(OUTFILE);
>>>
>>>
>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>
>>>  #change a paramter
>>>
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>> Brucei[ORGN]';
>>>
>>> #change a paramter
>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>>>
>>>  my $v = 1;
>>>  #$v is just to turn on and off the messages
>>>
>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>> '-organism' => 'Trypanosoma Brucei' );
>>>
>>>
>>> while (my $input = $str->next_seq())
>>> {
>>>  #Blast a sequence against a database:
>>>   #Alternatively, you could  pass in a file with many
>>>   #sequences rather than loop through sequence one at a time
>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>   #and swap the two lines below for an example of that.
>>>
>>>            open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE $input;
>>>             close(OUTFILE);
>>>
>>>
>>>  my $r = $factory->submit_blast($input);    #The program stops here it
>>> does not return any value and it does not enter the While loop,Please help
>>> me in this regard.#
>>>               open(OUTFILE,'>',$debugfile);
>>>               print OUTFILE $r;
>>>               close(OUTFILE);
>>>
>>>
>>>  print STDERR "waiting...." if($v>0);
>>>
>>>  while ( my @rids = $factory->each_rid ) {
>>>     open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "while entered";
>>>             close(OUTFILE);
>>>    foreach my $rid ( @rids ) {
>>>
>>>              open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "foreach entered";
>>>             close(OUTFILE);
>>>
>>>       my $rc = $factory->retrieve_blast($rid);
>>>
>>>       if( !ref($rc) )
>>>       {
>>>       if( $rc < 0 )
>>>       {
>>>       $factory->remove_rid($rid);
>>>       }
>>>        open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "if entered";
>>>             close(OUTFILE);
>>>        print STDERR "." if ( $v > 0 );
>>>        sleep 5;
>>>       }
>>>      else {
>>>             open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "else entered";
>>>             close(OUTFILE);
>>>
>>>         my $result = $rc->next_result();
>>>        #save the output
>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>
>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>         print BLASTDEBUGFILE $result->next_hit();
>>>         close(BLASTDEBUGFILE);
>>>
>>>       my $filename =
>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>
>>>        # open(DEBUGFILE,'>',$debugfile);
>>>        # open(new,'>',$filename);
>>>        # @arra=<new>;
>>>        # print DEBUGFILE @arra;
>>>        # close(DEBUGFILE);
>>>        # close(new);
>>>
>>>        $factory->save_output($filename);
>>>
>>>      # open(BLASTDEBUGFILE,'>',$debugfile);
>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>      # close(BLASTDEBUGFILE);
>>>
>>>      $factory->remove_rid($rid);
>>>
>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>      print BLASTDEBUGFILE  $organism;
>>>       close(BLASTDEBUGFILE);
>>>
>>>   # open(OUTFILE,'>',$outfile);
>>>   # print OUTFILE "Test2 $result->database_name()";
>>>   # close(OUTFILE);
>>>
>>> #$hit = $result->next_hit;
>>> #open(new,'>',$debugfile);
>>> #print $hit;
>>> #close(new);
>>>
>>>  while ( my $hit = $result->next_hit ) {
>>>
>>>           next unless ( $v > 0);
>>>
>>>         #     open(OUTFILE,'>',$debugfile);
>>>          #    print OUTFILE "$hit in while hits";
>>>           #  close(OUTFILE);
>>>
>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>                 push(@seqs,$dna);
>>>         }
>>>       }
>>>     }
>>>   }
>>>  }
>>>
>>>  #open(OUTFILE,'>',$debugfile);
>>>  #print OUTFILE $seqs[0];
>>>  #close(OUTFILE);
>>>
>>> return(@seqs);
>>>
>>> }
>>>
>>> open(OUTFILE, '>',$outfile) || die ;
>>>
>>> print OUTFILE "<HTML>\n
>>> <head><title>RNAi Result</title>
>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>> <body>\n
>>> <p><font face=\"Courier, monospace font set\">
>>> Inputsequence: <br>";
>>>
>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>
>>>   print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>
>>>   if ( ($i+1)%10==0){
>>>       print OUTFILE " ";
>>>   }
>>>   if ( ($i+1)%60==0){
>>>       print OUTFILE "<br>\n";
>>>   }
>>> }
>>>
>>>
>>>
>>> print OUTFILE "</font> <p>";
>>>
>>> $z=@compseqs;
>>>
>>> for($k=1;$k<$z;$k++) {
>>>   print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
>>> Sequence: <br>";
>>>
>>>   for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>
>>>       print OUTFILE substr ($compseqs[$k], $i, 1);
>>>
>>>       if ( ($i+1)%10==0){
>>>           print OUTFILE " ";
>>>       }
>>>       if ( ($i+1)%60==0){
>>>           print OUTFILE "<br>\n";
>>>       }
>>>   }
>>>   print OUTFILE "<p></font>";
>>> }
>>>
>>> print OUTFILE "<p>
>>> Window: <br>$in{'Windowsize'}
>>> <p>
>>> <p>
>>> Threshold: <br>$in{'Threshold'}
>>> <p>";
>>> my $j=0;
>>>
>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>
>>>   if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>       if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>           $j=$in{'Windowsize'};
>>>       }
>>>       $height=$out[$i]->{similar}*5;
>>>   }
>>>
>>>   if ($j>0) {
>>>       print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>> height=\"5\">";
>>>       $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
>>> 1)."</font>";
>>>       $j--;
>>>   }
>>>   else {
>>>       print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>> height=\"5\">";
>>>       $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
>>> 1)."</font>";
>>>   }
>>>
>>>   if ( ($i+1)%10==0){
>>>       $outstring .= " ";
>>>   }
>>>   if ( ($i+1)%60==0){
>>>       $outstring .= "<br>\n";
>>>
>>>   }
>>>   if ( ($i+1)%800==0){
>>>       print OUTFILE "<br><br>\n";
>>>
>>>   }
>>> }
>>>
>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>> set\">$outstring</font>";
>>>
>>> #foreach (@out) {
>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>
>>> #    }
>>> #}
>>>
>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>
>>> close OUTFILE;
>>>
>>> #nameprint();
>>>
>>> sub parse_form {
>>>   local ($buffer, @pairs, $pair, $name, $value);
>>>   # Read in text
>>>   $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>   if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>   {
>>>       read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>   }
>>>   else
>>>   {
>>>       $buffer = $ENV{'QUERY_STRING'};
>>>   }
>>>   @pairs = split(/&/, $buffer);
>>>   foreach $pair (@pairs)
>>>   {
>>>       ($name, $value) = split(/=/, $pair);
>>>       $value =~ tr/+/ /;
>>>       $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>       $in{$name} = $value;
>>>   }
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From robert.bradbury at gmail.com  Sat Jan  9 14:52:53 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sat, 9 Jan 2010 14:52:53 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<F19004692A4A4350856B23DF25E09074@NewLife>
	<c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
Message-ID: <deaa866a1001091152u4e85b1eboc99feb52a5b45b5@mail.gmail.com>

Roopa,

Mark is correct, you have to be very careful of single vs. double quotes in
perl. Double quoted strings are "interpreted" while single quoted strings
are taken literally is my current understanding.

I tried to run your script (with fixes) but without the supporting files it
appears to be impossible.

What I am curious about is what it is trying to do, I was particularly i
particularly intrigued by some apparent efforts to parse blast results into
color enhanced HTML and without thinking about the code in detail it seems
easier to simply ask what you are trying to do?  I find "classical" blast
results particularly tedious and long for blast results that display concise
information as the NCBI homologene cross-species comparisons do.
Unfortunately NCBI has deemed their methods (I have asked them) "too complex
to disclose (for a person comfortable in dealing with assembly language, or
even gate level electronics -- "too complex" is a very relative concept)".
One has the option of using NCBI with a limited number of species but good
display methodologies or Ensembl with many more species but less desirable
display methodologies (phylogenetic tree derived from cross species
comparisons).  And for the WRN protein which may play a key role in aging
(through the activity of its exonuclease domain mutating DNA sequences and
inducing microdeletions and microinsertions this gets important because it
appears that the *C. elegans* genome is missing the exonuclease domain (so
it may be useless from the perspective of studying aging), and the other 4
nematode species which have been sequenced aren't even in the NCBI nor the
Ensembl databases.  Needless to say, if we manage in the near future, given
the drop in sequencing costs, to sequence the nematodes which are
freeze/thaw tolerant (which induces DSB that have to be repaired) those
genomes will be unlikely to be in the NCBI/Ensembl databases either.  So
there is a requirement for the user to develop the ability to mix and match
public and obscure databases in creative ways to provide easy to interpret
information.

Robert Bradbury


From robert.bradbury at gmail.com  Sat Jan  9 15:27:54 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sat, 9 Jan 2010 15:27:54 -0500
Subject: [Bioperl-l] Ensembl problems
Message-ID: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>

I am trying to get the examples provided by EMBL/Ensembl to work and am
encountering problems.

For example, about 1/3 of the way through the Compara API tutorial [1] there
is what is supposed to be a completely functional script.  It does not
work.  This is in contrast to some of the earlier simple scripts (listing
the species in  Ensmbl etc.) which do work on my machine, so I have all the
libraries do dah installed correctly).

Very poor form to document scripts which do not function on a properly setup
system.

I have modified my invocation of the script slightly:
  Align.pl --set_of_species \
"Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"

which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
an undefined value at ./Align.pl line 132." (Align.pl is my slightly
modified example of the Compara Tutoraial code.)
As these are slightly modified perl scripts from the documantation, the line
numbers may be variable.

I can print out the genome_dbs, and it gives me a list of genome names (hash
tables) though it appears that is problematic in the Align.pl script.
in spite of the fact that just previously to that call I dumped "genome_dbs"
and got back some 25 hash tables (expected).  I believe this occurs whether
one is comparing "human:mouse" or the more complex species set I have
outlined above.


Has anyone else attempted to run the code documented in the Ensembl API
Tutorial?
Any suggestions as to what direction to go in would be appreciated -- when
one is trying to copy code out of a tutorial and it fails its kind of hard
to know where to go.)

There do appear to be some problems in the specifications of a Compara
version/database and there don't appear to be a lot of resources informing
one of what resources are currently available.

Robert


1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html


From ak at ebi.ac.uk  Sat Jan  9 17:01:21 2010
From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=)
Date: Sat, 9 Jan 2010 22:01:21 +0000
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
Message-ID: <20100109220121.GA9521@quux.windows.ebi.ac.uk>

On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
> I am trying to get the examples provided by EMBL/Ensembl to work and am
> encountering problems.

Hi Robert,

The ensembl-dev list is the appropriate forum for this type of questions
as it has nothing to do with bioperl.

There is also the Ensembl helpdesk.  If you send your problem to
<helpdesk at ensembl.org> I'm sure that it will be picked up by the
appropriate people (I do myself not know enough about the Compara API to
be able to diagnose this problem straight away I'm afraid).

Be sure to submit a minimal script that still exhibit the problem, and
information about what version of the APIs you're using (we will assume
that you're not mixing newer version of the API with older databases or
vice versa).

We are generally very happy to have bugs in documentation or code
pointed out to us, and will correct errors as we are made aware of them.


Kind regards,
Andreas

> For example, about 1/3 of the way through the Compara API tutorial [1] there
> is what is supposed to be a completely functional script.  It does not
> work.  This is in contrast to some of the earlier simple scripts (listing
> the species in  Ensmbl etc.) which do work on my machine, so I have all the
> libraries do dah installed correctly).
> 
> Very poor form to document scripts which do not function on a properly setup
> system.
> 
> I have modified my invocation of the script slightly:
>   Align.pl --set_of_species \
> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
> 
> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
> modified example of the Compara Tutoraial code.)
> As these are slightly modified perl scripts from the documantation, the line
> numbers may be variable.
> 
> I can print out the genome_dbs, and it gives me a list of genome names (hash
> tables) though it appears that is problematic in the Align.pl script.
> in spite of the fact that just previously to that call I dumped "genome_dbs"
> and got back some 25 hash tables (expected).  I believe this occurs whether
> one is comparing "human:mouse" or the more complex species set I have
> outlined above.
> 
> 
> 
> Has anyone else attempted to run the code documented in the Ensembl API
> Tutorial?
> Any suggestions as to what direction to go in would be appreciated -- when
> one is trying to copy code out of a tutorial and it fails its kind of hard
> to know where to go.)
> 
> There do appear to be some problems in the specifications of a Compara
> version/database and there don't appear to be a lot of resources informing
> one of what resources are currently available.
> 
> Robert
> 
> 
> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
Andreas K?h?ri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom


From cjfields at illinois.edu  Sat Jan  9 17:01:19 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Jan 2010 16:01:19 -0600
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
Message-ID: <743C998D-BBB5-4832-BA25-24D7D7288F78@illinois.edu>

Robert,

Ensembl errors probably should be redirected to the ensembl mail list.  I can't speak to the problems with it (they appear specific to the Ensembl tool set).

chris

On Jan 9, 2010, at 2:27 PM, Robert Bradbury wrote:

> I am trying to get the examples provided by EMBL/Ensembl to work and am
> encountering problems.
> 
> For example, about 1/3 of the way through the Compara API tutorial [1] there
> is what is supposed to be a completely functional script.  It does not
> work.  This is in contrast to some of the earlier simple scripts (listing
> the species in  Ensmbl etc.) which do work on my machine, so I have all the
> libraries do dah installed correctly).
> 
> Very poor form to document scripts which do not function on a properly setup
> system.
> 
> I have modified my invocation of the script slightly:
>  Align.pl --set_of_species \
> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
> 
> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
> modified example of the Compara Tutoraial code.)
> As these are slightly modified perl scripts from the documantation, the line
> numbers may be variable.
> 
> I can print out the genome_dbs, and it gives me a list of genome names (hash
> tables) though it appears that is problematic in the Align.pl script.
> in spite of the fact that just previously to that call I dumped "genome_dbs"
> and got back some 25 hash tables (expected).  I believe this occurs whether
> one is comparing "human:mouse" or the more complex species set I have
> outlined above.
> 
> 
> 
> Has anyone else attempted to run the code documented in the Ensembl API
> Tutorial?
> Any suggestions as to what direction to go in would be appreciated -- when
> one is trying to copy code out of a tutorial and it fails its kind of hard
> to know where to go.)
> 
> There do appear to be some problems in the specifications of a Compara
> version/database and there don't appear to be a lot of resources informing
> one of what resources are currently available.
> 
> Robert
> 
> 
> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robert.bradbury at gmail.com  Sun Jan 10 14:47:00 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sun, 10 Jan 2010 14:47:00 -0500
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <20100109220121.GA9521@quux.windows.ebi.ac.uk>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
	<20100109220121.GA9521@quux.windows.ebi.ac.uk>
Message-ID: <deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>

As it turns out the example from the file I cited (the compara API
tutorial does work).  The code that I started with may have been from
a "MS-WORD" document distributed with the documentation (which could
quite well be out-of-date).

But even the corrected code does not work for various uncommon
comparisons between species (which they may not have archived in
Ensembl).  I also don't understand enough about the functions yet as
to whether they are comparing the same regions from the same
chromosomes that just happen to be identical or whether they are
comparing the same region with a homologous region on a different
chromosome (i.e. conserved genes).  I'm going to have to dig into this
some more to figure out what is going on.

Thanks for the pointers, I'll refer future questions to the Ensembl
list/help-desk.

However, if anyone knows Ensembl very well, the database has in it
some of these interspecies comparisons already.  They are accessed
when one does a phylogeny tree for specific genes (and generally for
highly conserved gene you will get a tree that includes nearly all 50
species in the database).  As I don't think they are computed
on-the-fly, the information must be precomputed and stored someplace
in the database.  I would very much like to know how to access this
information.

Thanks,
Robert


On 1/9/10, Andreas K?h?ri <ak at ebi.ac.uk> wrote:
> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
>> I am trying to get the examples provided by EMBL/Ensembl to work and am
>> encountering problems.
>
> Hi Robert,
>
> The ensembl-dev list is the appropriate forum for this type of questions
> as it has nothing to do with bioperl.
>
> There is also the Ensembl helpdesk.  If you send your problem to
> <helpdesk at ensembl.org> I'm sure that it will be picked up by the
> appropriate people (I do myself not know enough about the Compara API to
> be able to diagnose this problem straight away I'm afraid).
>
> Be sure to submit a minimal script that still exhibit the problem, and
> information about what version of the APIs you're using (we will assume
> that you're not mixing newer version of the API with older databases or
> vice versa).
>
> We are generally very happy to have bugs in documentation or code
> pointed out to us, and will correct errors as we are made aware of them.
>
>
> Kind regards,
> Andreas
>
>> For example, about 1/3 of the way through the Compara API tutorial [1]
>> there
>> is what is supposed to be a completely functional script.  It does not
>> work.  This is in contrast to some of the earlier simple scripts (listing
>> the species in  Ensmbl etc.) which do work on my machine, so I have all
>> the
>> libraries do dah installed correctly).
>>
>> Very poor form to document scripts which do not function on a properly
>> setup
>> system.
>>
>> I have modified my invocation of the script slightly:
>>   Align.pl --set_of_species \
>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis
>> familiaris:Sus
>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
>>
>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs"
>> on
>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
>> modified example of the Compara Tutoraial code.)
>> As these are slightly modified perl scripts from the documantation, the
>> line
>> numbers may be variable.
>>
>> I can print out the genome_dbs, and it gives me a list of genome names
>> (hash
>> tables) though it appears that is problematic in the Align.pl script.
>> in spite of the fact that just previously to that call I dumped
>> "genome_dbs"
>> and got back some 25 hash tables (expected).  I believe this occurs
>> whether
>> one is comparing "human:mouse" or the more complex species set I have
>> outlined above.
>>
>>
>>
>> Has anyone else attempted to run the code documented in the Ensembl API
>> Tutorial?
>> Any suggestions as to what direction to go in would be appreciated -- when
>> one is trying to copy code out of a tutorial and it fails its kind of hard
>> to know where to go.)
>>
>> There do appear to be some problems in the specifications of a Compara
>> version/database and there don't appear to be a lot of resources informing
>> one of what resources are currently available.
>>
>> Robert
>>
>>
>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --
> Andreas K?h?ri, Ensembl Software Developer
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge CB10 1SD, United Kingdom
>


From Russell.Smithies at agresearch.co.nz  Sun Jan 10 15:34:39 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 11 Jan 2010 09:34:39 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>

An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. 
In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)

If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:

   my $taxid  = $gi_taxid_nucl{$accession};
   my $org_name = $names{$taxid};

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> Sent: Saturday, 26 December 2009 4:52 p.m.
> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Bhakti,
> The following example (using EUtilities) may serve your purpose:
> 
> use Bio::DB::EUtilities;
> 
> my (%taxa, @taxa);
> my (%names, %idmap);
> 
> # these are protein ids; nuc ids will work by changing -dbfrom =>
> 'nucleotide',
> # (probably)
> 
> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> 
> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>                                        -db => 'taxonomy',
>                                        -dbfrom => 'protein',
>                                        -correspondence => 1,
>                                        -id => \@ids);
> 
> # iterate through the LinkSet objects
> while (my $ds = $factory->next_LinkSet) {
>     $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> }
> 
> @taxa = @taxa{@ids};
> 
> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>         -db    => 'taxonomy',
>         -id    => \@taxa );
> 
> while (local $_ = $factory->next_DocSum) {
>     $names{($_->get_contents_by_name('TaxId'))[0]} =
> ($_->get_contents_by_name('ScientificName'))[0];
> }
> 
> foreach (@ids) {
>     $idmap{$_} = $names{$taxa{$_}};
> }
> 
> # %idmap is
> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> #    68536103 => 'Corynebacterium jeikeium K411'
> #    730439 => 'Bacillus caldolyticus'
> #    89318838 => undef    (this record has been removed from the db)
> 
> 1;
> 
> You probably will need to break up your 30000 into chunks
> (say, 1000-3000 each), and do the above on each chunk with a
> 
> sleep 3;
> 
> or so separating the queries.
> MAJ
> ----- Original Message -----
> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, December 25, 2009 9:46 PM
> Subject: [Bioperl-l] how to retrieve organism name from accession number?
> 
> 
> > Hi,
> >
> > Does anyone know how to retrieve the "Source" or the "Species name"
> given
> > the accession number using Bioperl.   I have these 30,000 accession
> numbers
> > for which I need to get the source organisms.  Any kind of help will be
> > appreciated.
> >
> > Thanks
> >
> > BD
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Sun Jan 10 15:49:40 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 10 Jan 2010 14:49:40 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
Message-ID: <F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>

One could also use Bio::DB::Taxonomy, which indexes the same files or (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the details).

chris

On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:

> An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. 
> In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)
> 
> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:
> 
>   my $taxid  = $gi_taxid_nucl{$accession};
>   my $org_name = $names{$taxid};
> 
> --Russell
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>> Sent: Saturday, 26 December 2009 4:52 p.m.
>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> Bhakti,
>> The following example (using EUtilities) may serve your purpose:
>> 
>> use Bio::DB::EUtilities;
>> 
>> my (%taxa, @taxa);
>> my (%names, %idmap);
>> 
>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>> 'nucleotide',
>> # (probably)
>> 
>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>> 
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>                                       -db => 'taxonomy',
>>                                       -dbfrom => 'protein',
>>                                       -correspondence => 1,
>>                                       -id => \@ids);
>> 
>> # iterate through the LinkSet objects
>> while (my $ds = $factory->next_LinkSet) {
>>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>> }
>> 
>> @taxa = @taxa{@ids};
>> 
>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>        -db    => 'taxonomy',
>>        -id    => \@taxa );
>> 
>> while (local $_ = $factory->next_DocSum) {
>>    $names{($_->get_contents_by_name('TaxId'))[0]} =
>> ($_->get_contents_by_name('ScientificName'))[0];
>> }
>> 
>> foreach (@ids) {
>>    $idmap{$_} = $names{$taxa{$_}};
>> }
>> 
>> # %idmap is
>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>> #    68536103 => 'Corynebacterium jeikeium K411'
>> #    730439 => 'Bacillus caldolyticus'
>> #    89318838 => undef    (this record has been removed from the db)
>> 
>> 1;
>> 
>> You probably will need to break up your 30000 into chunks
>> (say, 1000-3000 each), and do the above on each chunk with a
>> 
>> sleep 3;
>> 
>> or so separating the queries.
>> MAJ
>> ----- Original Message -----
>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, December 25, 2009 9:46 PM
>> Subject: [Bioperl-l] how to retrieve organism name from accession number?
>> 
>> 
>>> Hi,
>>> 
>>> Does anyone know how to retrieve the "Source" or the "Species name"
>> given
>>> the accession number using Bioperl.   I have these 30,000 accession
>> numbers
>>> for which I need to get the source organisms.  Any kind of help will be
>>> appreciated.
>>> 
>>> Thanks
>>> 
>>> BD
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Sun Jan 10 16:05:06 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 11 Jan 2010 10:05:06 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>

I've started to go off eUtils recently (not BioPerl's fault) as I've often been finding that with large queries, chunks of the resulting data is missing.
For example, before Xmas I was creating species-specific databases by using eUtils to get a list of GI numbers back for a taxid, then retrieving the fasta sequences in chunks of 500.
Very regularly, in the middle of the fasta there would be a message about resource unavailable eg.
  >test_sequence_1
  TACGATCATCGCTResource UnavailableTACGACTCTGCT
  >test_sequence_2
  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT

Often this wasn't detected until formatdb complained about invalid characters.
Inquiries to NCBI as to why this was happening and what to do about it returned stupid answers ("do each sequence manually thru the web interface", or "use eUtils").
As we have a nice fast network connection, I now prefer to download very large gzip files (i.e. all of refseq) and extract what I need.

I can't help but think that NCBI could solve a lot of problems if they gzipped the output from eUtils queries - it's something I've requested regularly for the last 5 years or so!!

--Russell


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Monday, 11 January 2010 9:50 a.m.
> To: Smithies, Russell
> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> One could also use Bio::DB::Taxonomy, which indexes the same files or
> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
> details).
> 
> chris
> 
> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> 
> > An alternate non-BioPerly way (that may be faster given NCBI's flakiness
> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and
> do lookups.
> > In that same dir, taxdump.tar.gz contains a file called names.dmp which
> lists taxids and descriptions (and synonyms)
> >
> > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> could do this:
> >
> >   my $taxid  = $gi_taxid_nucl{$accession};
> >   my $org_name = $names{$taxid};
> >
> > --Russell
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >> Sent: Saturday, 26 December 2009 4:52 p.m.
> >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> Bhakti,
> >> The following example (using EUtilities) may serve your purpose:
> >>
> >> use Bio::DB::EUtilities;
> >>
> >> my (%taxa, @taxa);
> >> my (%names, %idmap);
> >>
> >> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >> 'nucleotide',
> >> # (probably)
> >>
> >> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>
> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>                                       -db => 'taxonomy',
> >>                                       -dbfrom => 'protein',
> >>                                       -correspondence => 1,
> >>                                       -id => \@ids);
> >>
> >> # iterate through the LinkSet objects
> >> while (my $ds = $factory->next_LinkSet) {
> >>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >> }
> >>
> >> @taxa = @taxa{@ids};
> >>
> >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>        -db    => 'taxonomy',
> >>        -id    => \@taxa );
> >>
> >> while (local $_ = $factory->next_DocSum) {
> >>    $names{($_->get_contents_by_name('TaxId'))[0]} =
> >> ($_->get_contents_by_name('ScientificName'))[0];
> >> }
> >>
> >> foreach (@ids) {
> >>    $idmap{$_} = $names{$taxa{$_}};
> >> }
> >>
> >> # %idmap is
> >> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >> #    68536103 => 'Corynebacterium jeikeium K411'
> >> #    730439 => 'Bacillus caldolyticus'
> >> #    89318838 => undef    (this record has been removed from the db)
> >>
> >> 1;
> >>
> >> You probably will need to break up your 30000 into chunks
> >> (say, 1000-3000 each), and do the above on each chunk with a
> >>
> >> sleep 3;
> >>
> >> or so separating the queries.
> >> MAJ
> >> ----- Original Message -----
> >> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Friday, December 25, 2009 9:46 PM
> >> Subject: [Bioperl-l] how to retrieve organism name from accession
> number?
> >>
> >>
> >>> Hi,
> >>>
> >>> Does anyone know how to retrieve the "Source" or the "Species name"
> >> given
> >>> the accession number using Bioperl.   I have these 30,000 accession
> >> numbers
> >>> for which I need to get the source organisms.  Any kind of help will
> be
> >>> appreciated.
> >>>
> >>> Thanks
> >>>
> >>> BD
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From avilella at gmail.com  Sun Jan 10 16:05:13 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 10 Jan 2010 21:05:13 +0000
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
	<20100109220121.GA9521@quux.windows.ebi.ac.uk>
	<deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>
Message-ID: <358f4d651001101305q1b75cfe3q558a245ab1ab1238@mail.gmail.com>

> However, if anyone knows Ensembl very well, the database has in it
> some of these interspecies comparisons already. ?They are accessed
> when one does a phylogeny tree for specific genes (and generally for
> highly conserved gene you will get a tree that includes nearly all 50
> species in the database). ?As I don't think they are computed
> on-the-fly, the information must be precomputed and stored someplace
> in the database. ?I would very much like to know how to access this
> information.

Yes, they are. You can access the data programmatically by installing
the ensembl and ensembl-compara Perl APIs.
There are a few example scripts for the GeneTrees:

ensembl-compara/scripts/examples/homology*.pl

Cheers,

Albert.

> Thanks,
> Robert
>
>
>
>
> On 1/9/10, Andreas K?h?ri <ak at ebi.ac.uk> wrote:
>> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
>>> I am trying to get the examples provided by EMBL/Ensembl to work and am
>>> encountering problems.
>>
>> Hi Robert,
>>
>> The ensembl-dev list is the appropriate forum for this type of questions
>> as it has nothing to do with bioperl.
>>
>> There is also the Ensembl helpdesk. ?If you send your problem to
>> <helpdesk at ensembl.org> I'm sure that it will be picked up by the
>> appropriate people (I do myself not know enough about the Compara API to
>> be able to diagnose this problem straight away I'm afraid).
>>
>> Be sure to submit a minimal script that still exhibit the problem, and
>> information about what version of the APIs you're using (we will assume
>> that you're not mixing newer version of the API with older databases or
>> vice versa).
>>
>> We are generally very happy to have bugs in documentation or code
>> pointed out to us, and will correct errors as we are made aware of them.
>>
>>
>> Kind regards,
>> Andreas
>>
>>> For example, about 1/3 of the way through the Compara API tutorial [1]
>>> there
>>> is what is supposed to be a completely functional script. ?It does not
>>> work. ?This is in contrast to some of the earlier simple scripts (listing
>>> the species in ?Ensmbl etc.) which do work on my machine, so I have all
>>> the
>>> libraries do dah installed correctly).
>>>
>>> Very poor form to document scripts which do not function on a properly
>>> setup
>>> system.
>>>
>>> I have modified my invocation of the script slightly:
>>> ? Align.pl --set_of_species \
>>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
>>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
>>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis
>>> familiaris:Sus
>>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
>>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
>>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
>>>
>>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs"
>>> on
>>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
>>> modified example of the Compara Tutoraial code.)
>>> As these are slightly modified perl scripts from the documantation, the
>>> line
>>> numbers may be variable.
>>>
>>> I can print out the genome_dbs, and it gives me a list of genome names
>>> (hash
>>> tables) though it appears that is problematic in the Align.pl script.
>>> in spite of the fact that just previously to that call I dumped
>>> "genome_dbs"
>>> and got back some 25 hash tables (expected). ?I believe this occurs
>>> whether
>>> one is comparing "human:mouse" or the more complex species set I have
>>> outlined above.
>>>
>>>
>>>
>>> Has anyone else attempted to run the code documented in the Ensembl API
>>> Tutorial?
>>> Any suggestions as to what direction to go in would be appreciated -- when
>>> one is trying to copy code out of a tutorial and it fails its kind of hard
>>> to know where to go.)
>>>
>>> There do appear to be some problems in the specifications of a Compara
>>> version/database and there don't appear to be a lot of resources informing
>>> one of what resources are currently available.
>>>
>>> Robert
>>>
>>>
>>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Andreas K?h?ri, Ensembl Software Developer
>> European Bioinformatics Institute (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge CB10 1SD, United Kingdom
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From alessandra.bilardi at gmail.com  Sun Jan 10 18:21:12 2010
From: alessandra.bilardi at gmail.com (Alessandra)
Date: Mon, 11 Jan 2010 00:21:12 +0100
Subject: [Bioperl-l] GBrowse.org project
In-Reply-To: <e0996aca1001101515q8121c87o9b90310691fcd640@mail.gmail.com>
References: <e0996aca1001101515q8121c87o9b90310691fcd640@mail.gmail.com>
Message-ID: <e0996aca1001101521p30b46829p93ee75dd797829b1@mail.gmail.com>

 Hi all,

   I'm Alessandra and I run GBrowse.org.
GBrowse.org is a resource for using and setting up GBrowse genome
browsers. The site provides one location where biologists and
bioinformaticians can find:

  1. Genome browser web sites for any organism that has them. If a
species has a genome browser anywhere on the web, then we aim to link
to it.
  2. Links to sequence and annotation files that are available online.
  3. Links to genome browser configuration files, when available
  4. An FTP site containing genome annotation and configuration files
for each annotated genome that does not have its own web site.

GBrowse.org emphasizes the GBrowse genome browser in its organization,
but also links to sites that use other browser packages such as UCSC,
Ensembl, and JBrowse.

Also, we are currently conducting a survey seeking input on future
project direction. Please take a few minutes now to provide your
feedback.

   Survey link: http://gbrowse.org/survey/index.php?sid=64264&lang=en
   GBrowse.org introdution link:
http://gmod.org/wiki/August_2009_GMOD_Meeting#GBrowse.org

   Thank you for your help,

   Alessandra Bilardi.
   http://gbrowse.org/
   CRIBI Genomics, University of Padua
   http://genomics.cribi.unipd.it/


From cjfields at illinois.edu  Sun Jan 10 22:04:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 10 Jan 2010 21:04:13 -0600
Subject: [Bioperl-l] GMOD BioPerl Meeting
Message-ID: <7D72ECC2-E856-4C09-B67A-62AFFB59B377@illinois.edu>

Just a quick reminder that we're having a BioPerl satellite meeting after the PAG Conference (just prior to the GMOD Meeting).  The meeting is this Wednesday, Jan. 13, starting at 11:30am, at the Best Western Seven Seas in San Diego.  I will update the relevant BioPerl and GMOD pages with more details as they become available.  At the moment, we will be meeting in the hotel lobby prior to starting the meeting and possible hackathon.  

http://www.bioperl.org/wiki/GMOD_2010_Meeting
http://gmod.org/wiki/January_2010_GMOD_Meeting#Satellite_Meetings

Thanks!

chris


From bernd.jagla at pasteur.fr  Mon Jan 11 05:11:16 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 11 Jan 2010 11:11:16 +0100
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
Message-ID: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>

Hi,

 
First off, I am not sure if this is supposed to be addressed to the Bioperl
or Gbrowse mailing list, so apologies if this is the wrong list and please
let me know.

 
I am writing a program in Java that needs to access genome annotation data.
Since I am using Gbrowse already I was thinking that I could combine both
approaches making life eventually easier for me. I am mainly interested in
getting a gene/feature name for a given position. The position is stored in
the feature table and through linking typelist, locationlist, (maybe
sequence), and feature I can get all the information I need. Unfortunately
it seems that the feature name is stored in the object blog of the feature
table. 

 
That is a bit suspicious to me because I don't understand why searching for
a name can be so fast if it is not indexed through mysql when searching
using GBrowse.

 
So my question is how to I parse the Bio::DB::SeqFeature object in JAVA
correctly to get the name of the feature and possible also any further
information.

 
Any suggestions are greatly appreciated. Maybe there is a better solution
than parsing Perl code with Java.?

 
Thanks a lot,

 
Bernd


From biopython at maubp.freeserve.co.uk  Mon Jan 11 05:48:52 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 10:48:52 +0000
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
In-Reply-To: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
Message-ID: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>

On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla <bernd.jagla at pasteur.fr> wrote:
> Hi,
>
> First off, I am not sure if this is supposed to be addressed to the Bioperl
> or Gbrowse mailing list, so apologies if this is the wrong list and please
> let me know.
>
> I am writing a program in Java that needs to access genome annotation data.
> Since I am using Gbrowse already I was thinking that I could combine both
> approaches making life eventually easier for me. I am mainly interested in
> getting a gene/feature name for a given position. The position is stored in
> the feature table and through linking typelist, locationlist, (maybe
> sequence), and feature I can get all the information I need. Unfortunately
> it seems that the feature name is stored in the object blog of the feature
> table.

How are you storing the data in Gbrowse? There are several back ends,
and this will make a big difference for accessing the raw data.

One option would be to use Gbrowse with BioSQL as the backend.
You can then use BioJava (or BioPerl, or BioPython, etc) to access the
database. The only downside is Gbrowse isn't working 100% on top
of BioSQL right now (I'd like to see this fixed, but I don't know Perl).
There is an open bug on this [ gmod-Bugs-2168597 ].

Peter


From bernd.jagla at pasteur.fr  Mon Jan 11 05:53:20 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 11 Jan 2010 11:53:20 +0100
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
In-Reply-To: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>
References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
	<320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>
Message-ID: <9056164A8A744A77B6CD1E8E4E20B104@zillumina>

I am using bp_seqfeature_load.pl to load my features. That is using
Bio:DB:SeqFeature(Store) and MySql as a backend... That's all I
understood...

B

> -----Original Message-----
> From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On
> Behalf Of Peter
> Sent: Monday, January 11, 2010 11:49 AM
> To: Bernd Jagla
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
> 
> On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla <bernd.jagla at pasteur.fr>
> wrote:
> > Hi,
> >
> > First off, I am not sure if this is supposed to be addressed to the
> Bioperl
> > or Gbrowse mailing list, so apologies if this is the wrong list and
> please
> > let me know.
> >
> > I am writing a program in Java that needs to access genome annotation
> data.
> > Since I am using Gbrowse already I was thinking that I could combine
> both
> > approaches making life eventually easier for me. I am mainly interested
> in
> > getting a gene/feature name for a given position. The position is stored
> in
> > the feature table and through linking typelist, locationlist, (maybe
> > sequence), and feature I can get all the information I need.
> Unfortunately
> > it seems that the feature name is stored in the object blog of the
> feature
> > table.
> 
> How are you storing the data in Gbrowse? There are several back ends,
> and this will make a big difference for accessing the raw data.
> 
> One option would be to use Gbrowse with BioSQL as the backend.
> You can then use BioJava (or BioPerl, or BioPython, etc) to access the
> database. The only downside is Gbrowse isn't working 100% on top
> of BioSQL right now (I'd like to see this fixed, but I don't know Perl).
> There is an open bug on this [ gmod-Bugs-2168597 ].
> 
> Peter


From awitney at sgul.ac.uk  Mon Jan 11 07:21:07 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 11 Jan 2010 12:21:07 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
Message-ID: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>

Hi,

I am writing a script to automate the running of Phylip Pars. In the process i have to create a Bio::AlignIO object from a set of data that i have in a hash.

I could write the hash data into a phylip file and then load the Bio::AlignIO from that file, but i wondered if i could skip the writing and then reading of a temporary file ?

thanks for any help

adam


From roy.chaudhuri at gmail.com  Mon Jan 11 08:54:25 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 11 Jan 2010 13:54:25 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <4B4B2A51.9040602@gmail.com>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com>
Message-ID: <4B4B2D91.70906@gmail.com>

Actually, I guess some sample code would be more helpful:

use Bio::LocatableSeq;
use Bio::SimpleAlign;
use Bio::AlignIO;
my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, 
-end=>4);
my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, 
-end=>3);
my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, 
-end=>5);
my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);

Cheers,
Roy.


On 11/01/2010 13:40, Roy Chaudhuri wrote:
> Hi Adam,
>
> I'm guessing you actually want to create a Bio::SimpleAlign object
> (representing an alignment), rather than a Bio::AlignIO object (which is
> just for reading/writing alignment files). Bio::SimpleAlign has a
> documented new method that allows you to construct an alignment from
> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but
> include gaps and start/end coordinates to describe their relationship to
> other sequences in the alignment.
>
> Roy.
>
> On 11/01/2010 12:21, Adam Witney wrote:
>> Hi,
>>
>> I am writing a script to automate the running of Phylip Pars. In the
>> process i have to create a Bio::AlignIO object from a set of data
>> that i have in a hash.
>>
>> I could write the hash data into a phylip file and then load the
>> Bio::AlignIO from that file, but i wondered if i could skip the
>> writing and then reading of a temporary file ?
>>
>> thanks for any help
>>
>> adam _______________________________________________ Bioperl-l
>> mailing list Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From roy.chaudhuri at gmail.com  Mon Jan 11 08:40:33 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 11 Jan 2010 13:40:33 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
Message-ID: <4B4B2A51.9040602@gmail.com>

Hi Adam,

I'm guessing you actually want to create a Bio::SimpleAlign object 
(representing an alignment), rather than a Bio::AlignIO object (which is 
just for reading/writing alignment files). Bio::SimpleAlign has a 
documented new method that allows you to construct an alignment from 
Bio::LocatableSeq objects, which are similar to Bio::Seq objects but 
include gaps and start/end coordinates to describe their relationship to 
other sequences in the alignment.

Roy.

On 11/01/2010 12:21, Adam Witney wrote:
> Hi,
>
> I am writing a script to automate the running of Phylip Pars. In the
> process i have to create a Bio::AlignIO object from a set of data
> that i have in a hash.
>
> I could write the hash data into a phylip file and then load the
> Bio::AlignIO from that file, but i wondered if i could skip the
> writing and then reading of a temporary file ?
>
> thanks for any help
>
> adam _______________________________________________ Bioperl-l
> mailing list Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Mon Jan 11 09:16:45 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 14:16:45 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
Message-ID: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>

Hi,

I'm running bioperl-live from SVN, just updated to revision 16648.

$ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.0069

I am trying to get Bio::SeqIO to convert a multiple record EMBL
file into GenBank format, piping the data via stdin/stdout using
the following trivial Perl script:

#!/usr/bin/env perl
use Bio::SeqIO;
my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
my $out = Bio::SeqIO->new(-format => 'genbank');
while (my $seq = $in->next_seq) { $out->write_seq($seq) };

This only seems to find the first EMBL record in my example
files. For example, this simple file has just two contig records:
http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl

This is just the first two records taken from a much larger EMBL file
rel_con_hum_01_r102.dat downloaded and uncompressed from:
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz

Trying both these examples as input, BioPerl just gives a single
GenBank record as output (the first EMBL entry in the input).

Is this a BioPerl bug, or am I missing something?

Peter


From maj at fortinbras.us  Mon Jan 11 10:04:00 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 11 Jan 2010 10:04:00 -0500
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>

Hi Peter, 
I found the issue-- there are no SQ lines in the data, and 
having them is a key stop condition in the parser (line 438 embl.pm).
We evidently need to be more liberal in what we accept, even as we 
are strict in what we emit. Could you make a bug report?
thanks for the heads-up--
MAJ
----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "bioperl-l list" <bioperl-l at lists.open-bio.org>
Sent: Monday, January 11, 2010 9:16 AM
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records


> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz
> 
> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From biopython at maubp.freeserve.co.uk  Mon Jan 11 10:17:37 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 15:17:37 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
Message-ID: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>

On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
> Hi Peter, I found the issue-- there are no SQ lines in the data, and having
> them is a key stop condition in the parser (line 438 embl.pm).
> We evidently need to be more liberal in what we accept, even as we are
> strict in what we emit. Could you make a bug report?
> thanks for the heads-up--
> MAJ

Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982

These are EMBL contig records, so they don't have SQ lines,
but instead CO lines.

Peter


From cjfields at illinois.edu  Mon Jan 11 10:24:24 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 09:24:24 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
	<320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>
Message-ID: <CDB3F40D-0298-410B-9814-3D9721380EBA@illinois.edu>


On Jan 11, 2010, at 9:17 AM, Peter wrote:

> On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>> 
>> Hi Peter, I found the issue-- there are no SQ lines in the data, and having
>> them is a key stop condition in the parser (line 438 embl.pm).
>> We evidently need to be more liberal in what we accept, even as we are
>> strict in what we emit. Could you make a bug report?
>> thanks for the heads-up--
>> MAJ
> 
> Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982
> 
> These are EMBL contig records, so they don't have SQ lines,
> but instead CO lines.
> 
> Peter

Peter, 

Just curious, but have you tried the experimental EMBL parser 'embldriver'?  I don't think it's bound to the same strictures, but I may be mistaken.

chris


From cjfields at illinois.edu  Mon Jan 11 10:23:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 09:23:00 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <0D0D9DB5-56FA-414E-8D1D-3FE18198F7EC@illinois.edu>

Just saw that mark responded, so if possible submit a bug.  We may be doing a mini-hackathon this Wednesday, so we can probably tackle it in the process (possibly along with a few other pressing issues).

chris

On Jan 11, 2010, at 8:16 AM, Peter wrote:

> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz
> 
> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Mon Jan 11 10:55:26 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 15:55:26 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <C771056E.6204%hrh@fmi.ch>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
Message-ID: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>

On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>
> These entries form the CON data class, see:
> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
> and they don't contain any sequence information.

I know - GenBank files have a similar system with CONTIG
lines instead of sequences. I was expecting BioPerl to be
able to convert these EMBL files with CO lines into GenBank
files with CONTIG lines.

> If you take the 'expanded' entries from
> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
> your script will work.

That's a useful tip - thanks.

Peter


From hrh at fmi.ch  Mon Jan 11 10:42:22 2010
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Mon, 11 Jan 2010 16:42:22 +0100
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <C771056E.6204%hrh@fmi.ch>


On 1/11/10 3:16 PM, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz

These entries form the CON data class, see:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
and they don't contain any sequence information.

If you take the 'expanded' entries from
ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r
102.dat.gz
your script will work.


Hans


> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Mon Jan 11 11:27:15 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 11 Jan 2010 16:27:15 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <4B4B2D91.70906@gmail.com>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
Message-ID: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>


Ah excellent, thanks Roy. I was indeed thinking about it the wrong way.

In the process of writing this i have created a 

Bio::Tools::Run::Phylo::Phylip::Pars class

which is essentially just a modified copy of ProtPars. I have also fixed a few typos and possible bugs in

Bio/Tools/Run/Phylo/Phylip/Base.pm
Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm
Bio/AlignIO/phylip.pm
Bio/Tools/Run/Alignment/Clustalw.pm

I am of course happy to send these back in to the project... how would i best do this?

Cheers

adam


On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote:

> Actually, I guess some sample code would be more helpful:
> 
> use Bio::LocatableSeq;
> use Bio::SimpleAlign;
> use Bio::AlignIO;
> my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4);
> my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3);
> my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5);
> my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);
> 
> Cheers,
> Roy.
> 
> 
> On 11/01/2010 13:40, Roy Chaudhuri wrote:
>> Hi Adam,
>> 
>> I'm guessing you actually want to create a Bio::SimpleAlign object
>> (representing an alignment), rather than a Bio::AlignIO object (which is
>> just for reading/writing alignment files). Bio::SimpleAlign has a
>> documented new method that allows you to construct an alignment from
>> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but
>> include gaps and start/end coordinates to describe their relationship to
>> other sequences in the alignment.
>> 
>> Roy.
>> 
>> On 11/01/2010 12:21, Adam Witney wrote:
>>> Hi,
>>> 
>>> I am writing a script to automate the running of Phylip Pars. In the
>>> process i have to create a Bio::AlignIO object from a set of data
>>> that i have in a hash.
>>> 
>>> I could write the hash data into a phylip file and then load the
>>> Bio::AlignIO from that file, but i wondered if i could skip the
>>> writing and then reading of a temporary file ?
>>> 
>>> thanks for any help
>>> 
>>> adam _______________________________________________ Bioperl-l
>>> mailing list Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 


From Russell.Smithies at agresearch.co.nz  Mon Jan 11 22:41:02 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 12 Jan 2010 16:41:02 +1300
Subject: [Bioperl-l] BioPerl version?
In-Reply-To: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>

Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ?

--Russell

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Mon Jan 11 22:59:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 21:59:44 -0600
Subject: [Bioperl-l] BioPerl version?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
	<18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>
Message-ID: <795BD926-4AE9-4478-AAD5-E36558350745@illinois.edu>

Not dumb, but a frequently asked one: that's a FAQ question ;>

http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'

chris

On Jan 11, 2010, at 9:41 PM, Smithies, Russell wrote:

> Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ?
> 
> --Russell
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 12 11:02:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 10:02:02 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
	<320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
Message-ID: <ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>

On Jan 11, 2010, at 9:55 AM, Peter wrote:

> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>> 
>> These entries form the CON data class, see:
>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>> and they don't contain any sequence information.
> 
> I know - GenBank files have a similar system with CONTIG
> lines instead of sequences. I was expecting BioPerl to be
> able to convert these EMBL files with CO lines into GenBank
> files with CONTIG lines.

IIRC the contig information for GenBank is stored in annotation.  We can try to ensure the data is carried over to EMBL properly.

>> If you take the 'expanded' entries from
>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>> your script will work.
> 
> That's a useful tip - thanks.
> 
> Peter

NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).

chris


From biopython at maubp.freeserve.co.uk  Tue Jan 12 11:19:32 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 12 Jan 2010 16:19:32 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
	<320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
	<ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>
Message-ID: <320fb6e01001120819u50e73fa8k9bde8aa1abdf942d@mail.gmail.com>

On Tue, Jan 12, 2010 at 4:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Jan 11, 2010, at 9:55 AM, Peter wrote:
>
>> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>>>
>>> These entries form the CON data class, see:
>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>>> and they don't contain any sequence information.
>>
>> I know - GenBank files have a similar system with CONTIG
>> lines instead of sequences. I was expecting BioPerl to be
>> able to convert these EMBL files with CO lines into GenBank
>> files with CONTIG lines.
>
> IIRC the contig information for GenBank is stored in annotation.
> We can try to ensure the data is carried over to EMBL properly.

For contig records (where there is no sequence) I think we just
need to map the GenBank CONTIG lines to the EMBL CO lines,
and vice versa. At least, that's what Biopython now does (trunk
code, not yet released).

>>> If you take the 'expanded' entries from
>>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>>> your script will work.
>>
>> That's a useful tip - thanks.
>>
>> Peter
>
> NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).

Indeed. This is a useful work around for when a parser couldn't
cope with the contig version of a GenBank file for some reason, e.g.
http://bugzilla.open-bio.org/show_bug.cgi?id=2745

Peter


From maj at fortinbras.us  Tue Jan 12 12:33:30 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 12:33:30 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
Message-ID: <231A8D9473704E7697F7A486A0CDA86A@NewLife>

Hi All--

The beta of Bio::DB::SoapEUtilities is now available in the
bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
service. The system is fully WSDL based, and all eutils are
available. The best thing (IMHO) are the result adaptors, which
provide conversion and iteration of SOAP results into BioPerl
objects. Schau, mal:

 use Bio::DB::EUtilities;
 my $fac = Bio::DB::EUtilities->new(); # step 1
 my $seqio = $fac->esearch(
       -db => 'nucleotide', 
       -term => 'HIV1 and CCR5 and Brazil'
    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
 # yes, it's already done the efetch under the hood...
 while ( my $seq = $seqio->next_seq ) { # step 4
  # do something with $seq, a Bio::Seq object...
 }

or this:

 my $links = $fac->elink( -db => 'protein', 
                          -dbfrom => 'nucleotide',
                          -id => \@nucids )->run( -auto_adapt => 1 );
 
 # maybe more than one associated id...
 my @prot_0 = $links->id_map( $nucids[0] );
   
 while ( my $ls = $links->next_linkset ) {
    @ids = $ls->ids;
    @submitted_ids = $ls->submitted_ids;
    # etc.
 }

and much, much more. See

http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service

and of course, the POD, for all the details, including 
download/installation. Tests in bioperl-run/t.

cheers, 
MAJ

-- No new dependencies were added or animals mistreated 
-- during the making of these modules.


From sheldon.mckay at gmail.com  Tue Jan 12 13:02:53 2010
From: sheldon.mckay at gmail.com (Sheldon McKay)
Date: Tue, 12 Jan 2010 10:02:53 -0800
Subject: [Bioperl-l] code.open-bio.org timing out?
Message-ID: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>

Hi all,

I keep timing out trying to do an svn checkout of bioperl-live from
code.open-bio.org.  Any suggestions?

Thanks,
Sheldon

----
Sheldon McKay, PhD
Lead, iPlant Tree of Life Engagement Team;
Research Investigator
Cold Spring Harbor Laboratory
http://mckay.cshl.edu
Google Voice:  (203) 701-9204


On Tue, Nov 3, 2009 at 9:09 AM, Aaron Mackey <amackey at virginia.edu> wrote:
> [ajm6q at lc4 bioperl-live]$ svn update
> svn: Decompression of svndiff data failed
>
>
> I'll admit to not having svn updated in awhile; A clean, anonymous svn co
> failed with the same message:
>
> [...]
> A ? ?bioperl-live/Bio/Structure/StructureI.pm
> A ? ?bioperl-live/Bio/Structure/IO
> svn: Decompression of svndiff data failed
>
> -Aaron
>
> P.S. I used this command: svn co svn://
> code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From biopython at maubp.freeserve.co.uk  Tue Jan 12 13:12:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 12 Jan 2010 18:12:46 +0000
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
Message-ID: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>

On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com> wrote:
> Hi all,
>
> I keep timing out trying to do an svn checkout of bioperl-live from
> code.open-bio.org. ?Any suggestions?
>
> Thanks,
> Sheldon

The OBF team know about this (its being discussed on root-l),
hopefully they'll have it fixed before too long.

Peter


From cjfields at illinois.edu  Tue Jan 12 13:18:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 12:18:45 -0600
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
	<320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
Message-ID: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>

On Jan 12, 2010, at 12:12 PM, Peter wrote:

> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com> wrote:
>> Hi all,
>> 
>> I keep timing out trying to do an svn checkout of bioperl-live from
>> code.open-bio.org.  Any suggestions?
>> 
>> Thanks,
>> Sheldon
> 
> The OBF team know about this (its being discussed on root-l),
> hopefully they'll have it fixed before too long.
> 
> Peter

We probably need to set up some automatic syncing of our read-only code.google.com repo as a backup.  Jason had originally set that up, hopefully he'll respond.

chris


From jason at bioperl.org  Tue Jan 12 13:27:55 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 12 Jan 2010 10:27:55 -0800
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
	<320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
	<8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>
Message-ID: <C9DDBB08-DB88-4596-AED3-B3FD89893C55@bioperl.org>

Hi - I had setup the google code sync, but then the unfortunately  
realization that the revision numbers are shared among the wiki and  
the code SVN (all 1 repo) so when I added a wiki page on the site I  
screwed up the numbering and it wasn't possible to sync anymore (that  
I could figure out) without resetting it and I haven't gone back to  
that. Sorry - I wasn't sure if we had figured out what we wanted to  
for repositories so I sort of stopped worrying about it.


-jason
On Jan 12, 2010, at 10:18 AM, Chris Fields wrote:

> On Jan 12, 2010, at 12:12 PM, Peter wrote:
>
>> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com 
>> > wrote:
>>> Hi all,
>>>
>>> I keep timing out trying to do an svn checkout of bioperl-live from
>>> code.open-bio.org.  Any suggestions?
>>>
>>> Thanks,
>>> Sheldon
>>
>> The OBF team know about this (its being discussed on root-l),
>> hopefully they'll have it fixed before too long.
>>
>> Peter
>
> We probably need to set up some automatic syncing of our read-only  
> code.google.com repo as a backup.  Jason had originally set that up,  
> hopefully he'll respond.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From virajj at gmail.com  Wed Jan  6 13:20:39 2010
From: virajj at gmail.com (Vijayaraj Nagarajan)
Date: Wed, 6 Jan 2010 13:20:39 -0500
Subject: [Bioperl-l] targetp request
Message-ID: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>

Hi,

I am trying to use targetP in bioperl.
the documentation at the bioperl site is a bit confusing to me...

I would appreciate if you could give a very small example, as to how to use
"Bio::Tools::TargetP" to predict the localization of a protein sequence that
i have stored as a string.

Thanks,
Vijay


From cjfields at illinois.edu  Tue Jan 12 18:36:53 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 17:36:53 -0600
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
Message-ID: <D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>

Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API conflict with the current EUtilities tools.

chris

On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:

> Hi All--
> 
> The beta of Bio::DB::SoapEUtilities is now available in the
> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
> service. The system is fully WSDL based, and all eutils are
> available. The best thing (IMHO) are the result adaptors, which
> provide conversion and iteration of SOAP results into BioPerl
> objects. Schau, mal:
> 
> use Bio::DB::EUtilities;
> my $fac = Bio::DB::EUtilities->new(); # step 1
> my $seqio = $fac->esearch(
>       -db => 'nucleotide', 
>       -term => 'HIV1 and CCR5 and Brazil'
>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
> # yes, it's already done the efetch under the hood...
> while ( my $seq = $seqio->next_seq ) { # step 4
>  # do something with $seq, a Bio::Seq object...
> }
> 
> or this:
> 
> my $links = $fac->elink( -db => 'protein', 
>                          -dbfrom => 'nucleotide',
>                          -id => \@nucids )->run( -auto_adapt => 1 );
> 
> # maybe more than one associated id...
> my @prot_0 = $links->id_map( $nucids[0] );
> 
> while ( my $ls = $links->next_linkset ) {
>    @ids = $ls->ids;
>    @submitted_ids = $ls->submitted_ids;
>    # etc.
> }
> 
> and much, much more. See
> 
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
> 
> and of course, the POD, for all the details, including 
> download/installation. Tests in bioperl-run/t.
> 
> cheers, 
> MAJ
> 
> -- No new dependencies were added or animals mistreated 
> -- during the making of these modules.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 12 19:22:10 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 18:22:10 -0600
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
	<D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
	<5AD210CB0C444A57881BBDD34DE99149@NewLife>
Message-ID: <B536964F-8F2F-4E07-9FD3-B7D0A945253E@illinois.edu>

Okay, just making sure (I was getting a bit paranoid).  Great work on the SOAP interface, BTW!

chris

On Jan 12, 2010, at 6:08 PM, Mark A. Jensen wrote:

> Um, yeah.
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "BioPerl List" <bioperl-l at bioperl.org>
> Sent: Tuesday, January 12, 2010 6:36 PM
> Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service
> 
> 
> Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API conflict with the current EUtilities tools.
> 
> chris
> 
> On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:
> 
>> Hi All--
>> 
>> The beta of Bio::DB::SoapEUtilities is now available in the
>> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
>> service. The system is fully WSDL based, and all eutils are
>> available. The best thing (IMHO) are the result adaptors, which
>> provide conversion and iteration of SOAP results into BioPerl
>> objects. Schau, mal:
>> 
>> use Bio::DB::EUtilities;
>> my $fac = Bio::DB::EUtilities->new(); # step 1
>> my $seqio = $fac->esearch(
>>      -db => 'nucleotide',
>>      -term => 'HIV1 and CCR5 and Brazil'
>>   )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
>> # yes, it's already done the efetch under the hood...
>> while ( my $seq = $seqio->next_seq ) { # step 4
>> # do something with $seq, a Bio::Seq object...
>> }
>> 
>> or this:
>> 
>> my $links = $fac->elink( -db => 'protein',
>>                         -dbfrom => 'nucleotide',
>>                         -id => \@nucids )->run( -auto_adapt => 1 );
>> 
>> # maybe more than one associated id...
>> my @prot_0 = $links->id_map( $nucids[0] );
>> 
>> while ( my $ls = $links->next_linkset ) {
>>   @ids = $ls->ids;
>>   @submitted_ids = $ls->submitted_ids;
>>   # etc.
>> }
>> 
>> and much, much more. See
>> 
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>> 
>> and of course, the POD, for all the details, including
>> download/installation. Tests in bioperl-run/t.
>> 
>> cheers,
>> MAJ
>> 
>> -- No new dependencies were added or animals mistreated
>> -- during the making of these modules.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Tue Jan 12 19:08:12 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 19:08:12 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
	<D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
Message-ID: <5AD210CB0C444A57881BBDD34DE99149@NewLife>

Um, yeah.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Tuesday, January 12, 2010 6:36 PM
Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web 
service


Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's 
Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API 
conflict with the current EUtilities tools.

chris

On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:

> Hi All--
>
> The beta of Bio::DB::SoapEUtilities is now available in the
> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
> service. The system is fully WSDL based, and all eutils are
> available. The best thing (IMHO) are the result adaptors, which
> provide conversion and iteration of SOAP results into BioPerl
> objects. Schau, mal:
>
> use Bio::DB::EUtilities;
> my $fac = Bio::DB::EUtilities->new(); # step 1
> my $seqio = $fac->esearch(
>       -db => 'nucleotide',
>       -term => 'HIV1 and CCR5 and Brazil'
>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
> # yes, it's already done the efetch under the hood...
> while ( my $seq = $seqio->next_seq ) { # step 4
>  # do something with $seq, a Bio::Seq object...
> }
>
> or this:
>
> my $links = $fac->elink( -db => 'protein',
>                          -dbfrom => 'nucleotide',
>                          -id => \@nucids )->run( -auto_adapt => 1 );
>
> # maybe more than one associated id...
> my @prot_0 = $links->id_map( $nucids[0] );
>
> while ( my $ls = $links->next_linkset ) {
>    @ids = $ls->ids;
>    @submitted_ids = $ls->submitted_ids;
>    # etc.
> }
>
> and much, much more. See
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>
> and of course, the POD, for all the details, including
> download/installation. Tests in bioperl-run/t.
>
> cheers,
> MAJ
>
> -- No new dependencies were added or animals mistreated
> -- during the making of these modules.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jan 12 20:09:28 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 20:09:28 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP
	webservice
In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife><D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
	<5AD210CB0C444A57881BBDD34DE99149@NewLife>
Message-ID: <A5829F72FD6F469D9CBCC94FC69C068F@NewLife>

corrected:

use Bio::DB::SoapEUtilities;
my $fac = Bio::DB::SoapEUtilities->new(); # step 1
my $seqio = $fac->esearch(
       -db => 'nucleotide',
       -term => 'HIV1 and CCR5 and Brazil'
    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
 # yes, it's already done the efetch under the hood...
 while ( my $seq = $seqio->next_seq ) { # step 4
  # do something with $seq, a Bio::Seq object...
 }

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Tuesday, January 12, 2010 7:08 PM
Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP 
webservice


> Um, yeah.
> ----- Original Message ----- 
> From: "Chris Fields" <cjfields at illinois.edu>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "BioPerl List" <bioperl-l at bioperl.org>
> Sent: Tuesday, January 12, 2010 6:36 PM
> Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web 
> service
>
>
> Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's 
> Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API 
> conflict with the current EUtilities tools.
>
> chris
>
> On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:
>
>> Hi All--
>>
>> The beta of Bio::DB::SoapEUtilities is now available in the
>> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
>> service. The system is fully WSDL based, and all eutils are
>> available. The best thing (IMHO) are the result adaptors, which
>> provide conversion and iteration of SOAP results into BioPerl
>> objects. Schau, mal:
>>
>> use Bio::DB::EUtilities;
>> my $fac = Bio::DB::EUtilities->new(); # step 1
>> my $seqio = $fac->esearch(
>>       -db => 'nucleotide',
>>       -term => 'HIV1 and CCR5 and Brazil'
>>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
>> # yes, it's already done the efetch under the hood...
>> while ( my $seq = $seqio->next_seq ) { # step 4
>>  # do something with $seq, a Bio::Seq object...
>> }
>>
>> or this:
>>
>> my $links = $fac->elink( -db => 'protein',
>>                          -dbfrom => 'nucleotide',
>>                          -id => \@nucids )->run( -auto_adapt => 1 );
>>
>> # maybe more than one associated id...
>> my @prot_0 = $links->id_map( $nucids[0] );
>>
>> while ( my $ls = $links->next_linkset ) {
>>    @ids = $ls->ids;
>>    @submitted_ids = $ls->submitted_ids;
>>    # etc.
>> }
>>
>> and much, much more. See
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>>
>> and of course, the POD, for all the details, including
>> download/installation. Tests in bioperl-run/t.
>>
>> cheers,
>> MAJ
>>
>> -- No new dependencies were added or animals mistreated
>> -- during the making of these modules.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From tuco at pasteur.fr  Wed Jan 13 05:24:34 2010
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 13 Jan 2010 11:24:34 +0100
Subject: [Bioperl-l] targetp request
In-Reply-To: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
Message-ID: <4B4D9F62.5010306@pasteur.fr>

On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote:
> Hi,
>
> I am trying to use targetP in bioperl.
> the documentation at the bioperl site is a bit confusing to me...
>
> I would appreciate if you could give a very small example, as to how to use
> "Bio::Tools::TargetP" to predict the localization of a protein sequence that
> i have stored as a string.
>
> Thanks,
> Vijay
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Dear Vivay,

Bio::Tools::TargetP is not intended to run targetp on a sequence but to 
read and parse results from targetp run.

 From the Pod doc :

DESCRIPTION
        TargetP modules will provides parsed informations about protein 
localization.  It
        reads in a targetp output file.  It parses the results, and 
returns a
        Bio::SeqFeature::Generic object for each sequences found to have 
a subcellular
        localization


So to analyze your sequence, you'll first need to run targetp on your 
sequence file to create a targetp result output file. Then use 
Bio::Tools::TargetP module to parse this result file and get only 
informations you want/need from the result to be display as shown in the 
SYNOPSIS of the Pod documentation of the module.

HTH

Regards

Emmanuel


From roy.chaudhuri at gmail.com  Wed Jan 13 07:52:58 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 13 Jan 2010 12:52:58 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
Message-ID: <4B4DC22A.8080701@gmail.com>

Upload them to Bugzilla as patches, and one of the devs will review your 
changes and incorporate them into bioperl-live:
http://www.bioperl.org/wiki/HOWTO:SubmitPatch

Roy.

On 11/01/2010 16:27, Adam Witney wrote:
>
> Ah excellent, thanks Roy. I was indeed thinking about it the wrong
> way.
>
> In the process of writing this i have created a
>
> Bio::Tools::Run::Phylo::Phylip::Pars class
>
> which is essentially just a modified copy of ProtPars. I have also
> fixed a few typos and possible bugs in
>
> Bio/Tools/Run/Phylo/Phylip/Base.pm
> Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm
> Bio/Tools/Run/Alignment/Clustalw.pm
>
> I am of course happy to send these back in to the project... how
> would i best do this?
>
> Cheers
>
> adam
>
>
> On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote:
>
>> Actually, I guess some sample code would be more helpful:
>>
>> use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my
>> $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1,
>> -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two',
>> -seq=>'A--CG', -start=>1, -end=>3); my
>> $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG',
>> -start=>1, -end=>5); my
>> $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
>> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);
>>
>> Cheers, Roy.
>>
>>
>> On 11/01/2010 13:40, Roy Chaudhuri wrote:
>>> Hi Adam,
>>>
>>> I'm guessing you actually want to create a Bio::SimpleAlign
>>> object (representing an alignment), rather than a Bio::AlignIO
>>> object (which is just for reading/writing alignment files).
>>> Bio::SimpleAlign has a documented new method that allows you to
>>> construct an alignment from Bio::LocatableSeq objects, which are
>>> similar to Bio::Seq objects but include gaps and start/end
>>> coordinates to describe their relationship to other sequences in
>>> the alignment.
>>>
>>> Roy.
>>>
>>> On 11/01/2010 12:21, Adam Witney wrote:
>>>> Hi,
>>>>
>>>> I am writing a script to automate the running of Phylip Pars.
>>>> In the process i have to create a Bio::AlignIO object from a
>>>> set of data that i have in a hash.
>>>>
>>>> I could write the hash data into a phylip file and then load
>>>> the Bio::AlignIO from that file, but i wondered if i could skip
>>>> the writing and then reading of a temporary file ?
>>>>
>>>> thanks for any help
>>>>
>>>> adam _______________________________________________ Bioperl-l
>>>> mailing list Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>


From marcelo011982 at gmail.com  Wed Jan 13 13:12:04 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Wed, 13 Jan 2010 16:12:04 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
Message-ID: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>

Hi..
I have an simple Blast result, such as blastn.
Is there an  scrip  to transform such result to Clustalw format in Bioperl
?(.aln)

Thanx for any help.


From Kevin.M.Brown at asu.edu  Wed Jan 13 13:01:42 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 13 Jan 2010 11:01:42 -0700
Subject: [Bioperl-l] targetp request
In-Reply-To: <4B4D9F62.5010306@pasteur.fr>
References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
	<4B4D9F62.5010306@pasteur.fr>
Message-ID: <1A4207F8295607498283FE9E93B775B4067C133E@EX02.asurite.ad.asu.edu>

Sounds like this module might be in the wrong place then. Sounds more
like a SeqIO or AlignIO module, heheh. Also looks like the docs might
need to be cleaned up a bit for english readability (at least that
initial sentence).

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Emmanuel Quevillon
> Sent: Wednesday, January 13, 2010 3:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] targetp request
> 
> On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote:
> > Hi,
> >
> > I am trying to use targetP in bioperl.
> > the documentation at the bioperl site is a bit confusing to me...
> >
> > I would appreciate if you could give a very small example, 
> as to how to use
> > "Bio::Tools::TargetP" to predict the localization of a 
> protein sequence that
> > i have stored as a string.
> >
> > Thanks,
> > Vijay
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Dear Vivay,
> 
> Bio::Tools::TargetP is not intended to run targetp on a 
> sequence but to 
> read and parse results from targetp run.
> 
>  From the Pod doc :
> 
> DESCRIPTION
>         TargetP modules will provides parsed informations 
> about protein 
> localization.  It
>         reads in a targetp output file.  It parses the results, and 
> returns a
>         Bio::SeqFeature::Generic object for each sequences 
> found to have 
> a subcellular
>         localization
> 
> 
> So to analyze your sequence, you'll first need to run targetp on your 
> sequence file to create a targetp result output file. Then use 
> Bio::Tools::TargetP module to parse this result file and get only 
> informations you want/need from the result to be display as 
> shown in the 
> SYNOPSIS of the Pod documentation of the module.
> 
> HTH
> 
> Regards
> 
> Emmanuel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Wed Jan 13 13:44:36 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 13 Jan 2010 13:44:36 -0500
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
Message-ID: <C85EC8A05E884B328AFDAA055341E9E2@NewLife>

Marcelo-
Yes-- look at the code snip at
http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
combined with the snip at 
http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
(using -format => 'clustalw')
cheers MAJ
----- Original Message ----- 
From: "Marcelo Iwata" <marcelo011982 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, January 13, 2010 1:12 PM
Subject: [Bioperl-l] Blast to Clustalw Format


> Hi..
> I have an simple Blast result, such as blastn.
> Is there an  scrip  to transform such result to Clustalw format in Bioperl
> ?(.aln)
> 
> Thanx for any help.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From dan.kortschak at adelaide.edu.au  Wed Jan 13 23:26:46 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 14:56:46 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
Message-ID: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>

Hi All,

I'm having a stupid problem that for some reason I just can't figure
out. I'm putting together a B:A:IO:bowtie module to wrap around the
B:A:IO:sam module so bowtie output can be used as an assembly start
point.

For some reason that is escaping me I can't create tempfiles!

What should be the relevant code in the module:

package Bio::Assembly::IO::bowtie;
use strict;
use warnings;

# Object preamble - inherits from Bio::Root::Root

use Bio::SeqIO;
use Bio::Tools::Run::Samtools;
use Bio::Assembly::IO;
use Carp;
use Bio::Root::Root;
use Bio::Root::IO;
use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );


and the line (there are a couple of others that are like to fail in the
same way, but I've not got that far)

my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
$self->tempdir(), -suffix => '.sam' );

Which dies with:
Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.

Relevant environment vars:
  DB<10> x @ISA 
0  'Bio::Root::Root'
1  'Bio::Root::IO'
2  'Bio::Assembly::IO'

DB<11> x $self
0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
   '_no_head' => undef
   '_no_sq' => undef
   '_root_verbose' => 0


Can someone suggest what I'm missing?

cheers
Dan


From maj at fortinbras.us  Thu Jan 14 00:11:01 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:11:01 -0500
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <84196F01FF584C64A79B89FECE2DD86F@NewLife>

Hey Dan-- what does your constructor look like? I wonder if something's getting 
lost in new() and _initialize() chaining spaghetti- MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, January 13, 2010 11:26 PM
Subject: [Bioperl-l] not able to use Bio::Root::IO method


> Hi All,
>
> I'm having a stupid problem that for some reason I just can't figure
> out. I'm putting together a B:A:IO:bowtie module to wrap around the
> B:A:IO:sam module so bowtie output can be used as an assembly start
> point.
>
> For some reason that is escaping me I can't create tempfiles!
>
> What should be the relevant code in the module:
>
> package Bio::Assembly::IO::bowtie;
> use strict;
> use warnings;
>
> # Object preamble - inherits from Bio::Root::Root
>
> use Bio::SeqIO;
> use Bio::Tools::Run::Samtools;
> use Bio::Assembly::IO;
> use Carp;
> use Bio::Root::Root;
> use Bio::Root::IO;
> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>
>
> and the line (there are a couple of others that are like to fail in the
> same way, but I've not got that far)
>
> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
> $self->tempdir(), -suffix => '.sam' );
>
> Which dies with:
> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>
> Relevant environment vars:
>  DB<10> x @ISA
> 0  'Bio::Root::Root'
> 1  'Bio::Root::IO'
> 2  'Bio::Assembly::IO'
>
> DB<11> x $self
> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>   '_no_head' => undef
>   '_no_sq' => undef
>   '_root_verbose' => 0
>
>
>
> Can someone suggest what I'm missing?
>
> cheers
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Thu Jan 14 00:35:35 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 16:05:35 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <84196F01FF584C64A79B89FECE2DD86F@NewLife>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
Message-ID: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>

Thanks Mark, I'm not sure about that since @ISA still includes
Bio::Root:IO when it's at the call, but it might be.

cheers
Dan

Here is the entirety of the code (it reasonably short):

package Bio::Assembly::IO::bowtie;
use strict;
use warnings;

# Object preamble - inherits from Bio::Root::Root

use Bio::SeqIO;
use Bio::Tools::Run::Samtools;
use Bio::Assembly::IO;
use Carp;
use Bio::Root::Root;
use Bio::Root::IO;
use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );

our $HD = "\@HD\tVN:1.0\tSO:unsorted\n";
our $PG = "\@PG\tID=Bowtie\n";

our $HAVE_IO_UNCOMPRESS;
BEGIN {
# check requirements
    unless ( eval "require Bio::Tools::Run::Bowtie;") {
	Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index.");
    }
    unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") {
	Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand.");
    }
}

sub new {
	my $class = shift;
	my @args = @_;
	my $self = $class->SUPER::new(@args);
	my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args);
	$file =~ s/^<//;
	$self->{'_no_head'} = $no_head;
	$self->{'_no_sq'} = $no_sq;
	# get the sequence so samtools can work with it
	my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' );
	my $refdb = $inspector->run($index);
	my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb));
	my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' );
	return $sam;
}

sub _bowtie_to_sam {
	my ($self, $file, $refdb) = @_;

	$self->throw("'$file' does not exist or is not readable.")
		unless ( -e $file && -r $file );
	my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file);
	$self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/;

	my %SQ;
	my $mapq = 255;
	my $in_pair;
	my @mate_line;
	my $mlen;

	if ($file =~ m/\.gz[^.]*$/) {
		unless ($HAVE_IO_UNCOMPRESS) {
			croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" );
		}
		my ($tfh, $tf) = $self->io->tempfile;
		my $z = IO::Uncompress::Gunzip->new($_);
		while (<$z>) { print $tfh $_ }
		close $tfh;
		$file = $tf;
	}

        open(my $fh, $file) or
		$self->throw("Can not open '$file' for reading: $!");
            
	# create temp file for working
	my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
	
	while ($fh) {
		chomp;
		my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_);
		$SQ{$rname} = 1;
		
		my $paired_f =  ($qname =~ m#/[12]#) ? 0x03 : 0;
		my $strand_f = ($strand eq '-') ? 0x10 : 0;
		my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0;
		my $first_f =  ($qname =~ m#/1#) ? 0x40 : 0;
		my $second_f = ($qname =~ m#/2#) ? 0x80 : 0;
		my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f;

		$pos++;
		my $len = length $seq;
		die unless $len == length $qual;
		my $cigar = $len.'M';
		my @detail = split(',',$details);
		my $dist = 'NM:i:'.scalar @detail;
		
		my @mismatch;
		my $last_pos = 0;
		for (@detail) {
			m/(\d+):(\w)>\w/;
			my $err = ($1-$last_pos);
			$last_pos = $1+1;
			push @mismatch,($err,$2);
		}
		push @mismatch, $len-$last_pos;
		@mismatch = reverse @mismatch if $strand eq '-';
		my $mismatch = join('',('MD:Z:', at mismatch));

		if ($paired_f) {
			my $mrnm = '=';
			if ($in_pair) {
				my $mpos = $mate_line[3];
				$mate_line[7] = $pos;
				my $isize = $mpos-$pos-$len;
				$mate_line[8] = -$isize;
				print $sam_tmp_h join("\t", at mate_line),"\n";
				print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
				$in_pair = 0;
			} else {
				$mlen = $len;
				@mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist);
				$in_pair = 1;
			}
		} else {
			my $mrnm = '*';
			my $mpos = 0;
			my $isize = 0;
			print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
		}
	}

	close($fh);
	$sam_tmp_h->close;
	
	return $sam_tmp_f if $self->{'_no_head'};

	my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );

	# print header
	print $samh $HD;
	
	# print sequence dictionary
	unless ($self->{'_no_sq'}) {
		my $db  = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' );
		while ( my $seq = $db->next_seq() ) {
			$SQ{$seq->id} = $seq->length if $SQ{$seq->id};
		}
	
		map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ;
	}
	
	# print program
	print $samh $PG;
	
	open($sam_tmp_h, $sam_tmp_f) or
		$self->throw("Can not open '$sam_tmp_f' for reading: $!");

	print $samh $_ while ($sam_tmp_h);
	
	close($sam_tmp_h);
	$samh->close;
	
	return $samf;
}

sub _make_bam {
	my ($self, $file) = @_;
	
	$self->throw("'$file' does not exist or is not readable")
		unless ( -e $file && -r $file );

	# make a sorted bam file from a sam file input
	my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' );
	my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' );
	$_->close for ($bamh, $srth);
	
	my $samt = Bio::Tools::Run::Samtools->new( -command => 'view',
						   -sam_input => 1,
						   -bam_output => 1 );

	$samt->run( -bam => $file, -out => $bamf );

	$samt = Bio::Tools::Run::Samtools->new( -command => 'sort' );

	$samt->run( -bam => $bamf, -pfx => $srtf);

	return $srtf.'.bam'
}

1;


On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote:
> Hey Dan-- what does your constructor look like? I wonder if
> something's getting 
> lost in new() and _initialize() chaining spaghetti- MAJ
> 


From dan.kortschak at adelaide.edu.au  Thu Jan 14 00:35:48 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 16:05:48 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au>
	<B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
Message-ID: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>

I've had a bit of a play with that, but no luck.

Dan

On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote:
> I've found that rearranging the items in the 'use base' array can
> sometimes 
> recover
> lost methods. I don't know enough of the arcana to know why it works. 
> (Sometimes,
> java starts looking pretty good from here...)
> 


From maj at fortinbras.us  Thu Jan 14 00:38:00 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:38:00 -0500
Subject: [Bioperl-l] Fw:  not able to use Bio::Root::IO method
Message-ID: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>

up to list
----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
Sent: Thursday, January 14, 2010 12:36 AM
Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method


> Aha-- check out the pod for Bio::Root::IO:
> 
> "This module provides methods that will usually be needed for any sort
> of file- or stream-related input/output, e.g., keeping track of a file
> handle, transient printing and reading from the file handle, a close
> method, automatically closing the handle on garbage collection, etc.
> 
> To use this for your own code you will either want to inherit from
> this module, or instantiate an object for every file or stream you are
> dealing with. In the first case this module will most likely not be
> the first class off which your class inherits; therefore you need to
> call _initialize_io() with the named parameters in order to set file
> handle, open file, etc automatically."
> 
> I think you're wanting a call to $self->_initialize_io(). (There is no io() 
> method explicitly defined in any of the base classes.)
> MAJ
> ----- Original Message ----- 
> From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, January 13, 2010 11:26 PM
> Subject: [Bioperl-l] not able to use Bio::Root::IO method
> 
> 
>> Hi All,
>> 
>> I'm having a stupid problem that for some reason I just can't figure
>> out. I'm putting together a B:A:IO:bowtie module to wrap around the
>> B:A:IO:sam module so bowtie output can be used as an assembly start
>> point.
>> 
>> For some reason that is escaping me I can't create tempfiles!
>> 
>> What should be the relevant code in the module:
>> 
>> package Bio::Assembly::IO::bowtie;
>> use strict;
>> use warnings;
>> 
>> # Object preamble - inherits from Bio::Root::Root
>> 
>> use Bio::SeqIO;
>> use Bio::Tools::Run::Samtools;
>> use Bio::Assembly::IO;
>> use Carp;
>> use Bio::Root::Root;
>> use Bio::Root::IO;
>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>> 
>> 
>> and the line (there are a couple of others that are like to fail in the
>> same way, but I've not got that far)
>> 
>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
>> $self->tempdir(), -suffix => '.sam' );
>> 
>> Which dies with:
>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>> 
>> Relevant environment vars:
>>  DB<10> x @ISA 
>> 0  'Bio::Root::Root'
>> 1  'Bio::Root::IO'
>> 2  'Bio::Assembly::IO'
>> 
>> DB<11> x $self
>> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>>   '_no_head' => undef
>>   '_no_sq' => undef
>>   '_root_verbose' => 0
>> 
>> 
>> 
>> Can someone suggest what I'm missing?
>> 
>> cheers
>> Dan
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>


From maj at fortinbras.us  Thu Jan 14 00:50:11 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:50:11 -0500
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au>
	<B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
	<1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <82BFF47099684EF496DB3875D39DCA14@NewLife>

For the benefit of the list, I categorically deny ever making the 
statement about java below....
MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 12:35 AM
Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method


> I've had a bit of a play with that, but no luck.
> 
> Dan
> 
> On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote:
>> I've found that rearranging the items in the 'use base' array can
>> sometimes 
>> recover
>> lost methods. I don't know enough of the arcana to know why it works. 
>> (Sometimes,
>> java starts looking pretty good from here...)
>> 
> 
>


From cjfields at illinois.edu  Thu Jan 14 02:23:41 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 01:23:41 -0600
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>

You can remove separate 'use' directives if they are declared with 'use base' (they will be imported then).  Also, Bio::Root::IO inherits Bio::Root::Root, and Bio::Assembly::IO should inherit from Bio::Root::IO, so the only base module you should need is Bio::Assembly::IO.  It's possible having all three is confusing the interpreter.

chris

On Jan 13, 2010, at 11:35 PM, Dan Kortschak wrote:

> Thanks Mark, I'm not sure about that since @ISA still includes
> Bio::Root:IO when it's at the call, but it might be.
> 
> cheers
> Dan
> 
> Here is the entirety of the code (it reasonably short):
> 
> package Bio::Assembly::IO::bowtie;
> use strict;
> use warnings;
> 
> # Object preamble - inherits from Bio::Root::Root
> 
> use Bio::SeqIO;
> use Bio::Tools::Run::Samtools;
> use Bio::Assembly::IO;
> use Carp;
> use Bio::Root::Root;
> use Bio::Root::IO;
> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
> 
> our $HD = "\@HD\tVN:1.0\tSO:unsorted\n";
> our $PG = "\@PG\tID=Bowtie\n";
> 
> our $HAVE_IO_UNCOMPRESS;
> BEGIN {
> # check requirements
>    unless ( eval "require Bio::Tools::Run::Bowtie;") {
> 	Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index.");
>    }
>    unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") {
> 	Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand.");
>    }
> }
> 
> sub new {
> 	my $class = shift;
> 	my @args = @_;
> 	my $self = $class->SUPER::new(@args);
> 	my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args);
> 	$file =~ s/^<//;
> 	$self->{'_no_head'} = $no_head;
> 	$self->{'_no_sq'} = $no_sq;
> 	# get the sequence so samtools can work with it
> 	my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' );
> 	my $refdb = $inspector->run($index);
> 	my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb));
> 	my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' );
> 	return $sam;
> }
> 
> sub _bowtie_to_sam {
> 	my ($self, $file, $refdb) = @_;
> 
> 	$self->throw("'$file' does not exist or is not readable.")
> 		unless ( -e $file && -r $file );
> 	my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file);
> 	$self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/;
> 
> 	my %SQ;
> 	my $mapq = 255;
> 	my $in_pair;
> 	my @mate_line;
> 	my $mlen;
> 
> 	if ($file =~ m/\.gz[^.]*$/) {
> 		unless ($HAVE_IO_UNCOMPRESS) {
> 			croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" );
> 		}
> 		my ($tfh, $tf) = $self->io->tempfile;
> 		my $z = IO::Uncompress::Gunzip->new($_);
> 		while (<$z>) { print $tfh $_ }
> 		close $tfh;
> 		$file = $tf;
> 	}
> 
>        open(my $fh, $file) or
> 		$self->throw("Can not open '$file' for reading: $!");
> 
> 	# create temp file for working
> 	my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
> 	
> 	while ($fh) {
> 		chomp;
> 		my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_);
> 		$SQ{$rname} = 1;
> 		
> 		my $paired_f =  ($qname =~ m#/[12]#) ? 0x03 : 0;
> 		my $strand_f = ($strand eq '-') ? 0x10 : 0;
> 		my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0;
> 		my $first_f =  ($qname =~ m#/1#) ? 0x40 : 0;
> 		my $second_f = ($qname =~ m#/2#) ? 0x80 : 0;
> 		my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f;
> 
> 		$pos++;
> 		my $len = length $seq;
> 		die unless $len == length $qual;
> 		my $cigar = $len.'M';
> 		my @detail = split(',',$details);
> 		my $dist = 'NM:i:'.scalar @detail;
> 		
> 		my @mismatch;
> 		my $last_pos = 0;
> 		for (@detail) {
> 			m/(\d+):(\w)>\w/;
> 			my $err = ($1-$last_pos);
> 			$last_pos = $1+1;
> 			push @mismatch,($err,$2);
> 		}
> 		push @mismatch, $len-$last_pos;
> 		@mismatch = reverse @mismatch if $strand eq '-';
> 		my $mismatch = join('',('MD:Z:', at mismatch));
> 
> 		if ($paired_f) {
> 			my $mrnm = '=';
> 			if ($in_pair) {
> 				my $mpos = $mate_line[3];
> 				$mate_line[7] = $pos;
> 				my $isize = $mpos-$pos-$len;
> 				$mate_line[8] = -$isize;
> 				print $sam_tmp_h join("\t", at mate_line),"\n";
> 				print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
> 				$in_pair = 0;
> 			} else {
> 				$mlen = $len;
> 				@mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist);
> 				$in_pair = 1;
> 			}
> 		} else {
> 			my $mrnm = '*';
> 			my $mpos = 0;
> 			my $isize = 0;
> 			print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
> 		}
> 	}
> 
> 	close($fh);
> 	$sam_tmp_h->close;
> 	
> 	return $sam_tmp_f if $self->{'_no_head'};
> 
> 	my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
> 
> 	# print header
> 	print $samh $HD;
> 	
> 	# print sequence dictionary
> 	unless ($self->{'_no_sq'}) {
> 		my $db  = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' );
> 		while ( my $seq = $db->next_seq() ) {
> 			$SQ{$seq->id} = $seq->length if $SQ{$seq->id};
> 		}
> 	
> 		map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ;
> 	}
> 	
> 	# print program
> 	print $samh $PG;
> 	
> 	open($sam_tmp_h, $sam_tmp_f) or
> 		$self->throw("Can not open '$sam_tmp_f' for reading: $!");
> 
> 	print $samh $_ while ($sam_tmp_h);
> 	
> 	close($sam_tmp_h);
> 	$samh->close;
> 	
> 	return $samf;
> }
> 
> sub _make_bam {
> 	my ($self, $file) = @_;
> 	
> 	$self->throw("'$file' does not exist or is not readable")
> 		unless ( -e $file && -r $file );
> 
> 	# make a sorted bam file from a sam file input
> 	my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' );
> 	my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' );
> 	$_->close for ($bamh, $srth);
> 	
> 	my $samt = Bio::Tools::Run::Samtools->new( -command => 'view',
> 						   -sam_input => 1,
> 						   -bam_output => 1 );
> 
> 	$samt->run( -bam => $file, -out => $bamf );
> 
> 	$samt = Bio::Tools::Run::Samtools->new( -command => 'sort' );
> 
> 	$samt->run( -bam => $bamf, -pfx => $srtf);
> 
> 	return $srtf.'.bam'
> }
> 
> 1;
> 
> 
> On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote:
>> Hey Dan-- what does your constructor look like? I wonder if
>> something's getting 
>> lost in new() and _initialize() chaining spaghetti- MAJ
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jan 14 02:25:05 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 01:25:05 -0600
Subject: [Bioperl-l] Fw:  not able to use Bio::Root::IO method
In-Reply-To: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>
References: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>
Message-ID: <1DB926E1-9C6F-4B96-8D7E-28317DD7DE42@illinois.edu>

Yes, that's true.  The call to an io() is a Bio::Tools::Run::WrapperBase thing (the io() is a Bio::Root::IO instance).

chris

On Jan 13, 2010, at 11:38 PM, Mark A. Jensen wrote:

> up to list
> ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
> Sent: Thursday, January 14, 2010 12:36 AM
> Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method
> 
> 
>> Aha-- check out the pod for Bio::Root::IO:
>> "This module provides methods that will usually be needed for any sort
>> of file- or stream-related input/output, e.g., keeping track of a file
>> handle, transient printing and reading from the file handle, a close
>> method, automatically closing the handle on garbage collection, etc.
>> To use this for your own code you will either want to inherit from
>> this module, or instantiate an object for every file or stream you are
>> dealing with. In the first case this module will most likely not be
>> the first class off which your class inherits; therefore you need to
>> call _initialize_io() with the named parameters in order to set file
>> handle, open file, etc automatically."
>> I think you're wanting a call to $self->_initialize_io(). (There is no io() method explicitly defined in any of the base classes.)
>> MAJ
>> ----- Original Message ----- From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, January 13, 2010 11:26 PM
>> Subject: [Bioperl-l] not able to use Bio::Root::IO method
>>> Hi All,
>>> I'm having a stupid problem that for some reason I just can't figure
>>> out. I'm putting together a B:A:IO:bowtie module to wrap around the
>>> B:A:IO:sam module so bowtie output can be used as an assembly start
>>> point.
>>> For some reason that is escaping me I can't create tempfiles!
>>> What should be the relevant code in the module:
>>> package Bio::Assembly::IO::bowtie;
>>> use strict;
>>> use warnings;
>>> # Object preamble - inherits from Bio::Root::Root
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::Samtools;
>>> use Bio::Assembly::IO;
>>> use Carp;
>>> use Bio::Root::Root;
>>> use Bio::Root::IO;
>>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>>> and the line (there are a couple of others that are like to fail in the
>>> same way, but I've not got that far)
>>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
>>> $self->tempdir(), -suffix => '.sam' );
>>> Which dies with:
>>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
>>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>>> Relevant environment vars:
>>> DB<10> x @ISA 0  'Bio::Root::Root'
>>> 1  'Bio::Root::IO'
>>> 2  'Bio::Assembly::IO'
>>> DB<11> x $self
>>> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>>>  '_no_head' => undef
>>>  '_no_sq' => undef
>>>  '_root_verbose' => 0
>>> Can someone suggest what I'm missing?
>>> cheers
>>> Dan
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Thu Jan 14 02:59:20 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 18:29:20 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
	<B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>
Message-ID: <1263455960.4630.3.camel@epistle>

Thanks Chris,

I've done that, and since the inheritance is direct (rather than being a
constructed attribute in the object hash) the calls are $obj->temp*
rather than the $obj->io->temp* that I was using.

It works now and is much clearer having gotten rid of much of the
declarations.

cheers
Dan

On Thu, 2010-01-14 at 01:23 -0600, Chris Fields wrote:
> You can remove separate 'use' directives if they are declared with
> 'use base' (they will be imported then).  Also, Bio::Root::IO inherits
> Bio::Root::Root, and Bio::Assembly::IO should inherit from
> Bio::Root::IO, so the only base module you should need is
> Bio::Assembly::IO.  It's possible having all three is confusing the
> interpreter.
> 
> chris


From marcelo011982 at gmail.com  Thu Jan 14 08:44:25 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Thu, 14 Jan 2010 11:44:25 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <C85EC8A05E884B328AFDAA055341E9E2@NewLife>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
	<C85EC8A05E884B328AFDAA055341E9E2@NewLife>
Message-ID: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>

Thanks Mark.
I think that most of you already know it.
But , i'll put it for new users:


#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
use Bio::AlignIO;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => '
../../fontes/exemplos/blat/teste2/output.blast ');
my $aln;
my $alnIO;
$alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
while ( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while ( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while ( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
      $aln = $hsp->get_aln;
      $alnIO->write_aln($aln);

    }
  }
}


On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> Marcelo-
> Yes-- look at the code snip at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
> combined with the snip at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> (using -format => 'clustalw')
> cheers MAJ
> ----- Original Message ----- From: "Marcelo Iwata" <
> marcelo011982 at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, January 13, 2010 1:12 PM
> Subject: [Bioperl-l] Blast to Clustalw Format
>
>
>  Hi..
>> I have an simple Blast result, such as blastn.
>> Is there an  scrip  to transform such result to Clustalw format in Bioperl
>> ?(.aln)
>>
>> Thanx for any help.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>


From marcelo011982 at gmail.com  Thu Jan 14 08:46:21 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Thu, 14 Jan 2010 11:46:21 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
	<C85EC8A05E884B328AFDAA055341E9E2@NewLife>
	<1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
Message-ID: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>

Sorry , the correct code is:


#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
use Bio::AlignIO;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => '
../../fontes/exemplos/blat/teste2/output.blast ');
my $aln;
my $alnIO;
$alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
while ( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while ( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while ( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
      $aln = $hsp->get_aln;
      $alnIO->write_aln($aln);

    }
  }
}


On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata <marcelo011982 at gmail.com>wrote:

> Thanks Mark.
> I think that most of you already know it.
> But , i'll put it for new users:
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
> use Bio::AlignIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
>                            -file   => '
> ../../fontes/exemplos/blat/teste2/output.blast ');
> my $aln;
> my $alnIO;
> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
> while ( my $result = $in->next_result ) {
>   ## $result is a Bio::Search::Result::ResultI compliant object
>   while ( my $hit = $result->next_hit ) {
>     ## $hit is a Bio::Search::Hit::HitI compliant object
>     while ( my $hsp = $hit->next_hsp ) {
>       ## $hsp is a Bio::Search::HSP::HSPI compliant object
>       $aln = $hsp->get_aln;
>       $alnIO->write_aln($aln);
>
>
>     }
>   }
> }
>
>
> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>> Marcelo-
>> Yes-- look at the code snip at
>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>> combined with the snip at
>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>> (using -format => 'clustalw')
>> cheers MAJ
>> ----- Original Message ----- From: "Marcelo Iwata" <
>> marcelo011982 at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, January 13, 2010 1:12 PM
>> Subject: [Bioperl-l] Blast to Clustalw Format
>>
>>
>>  Hi..
>>> I have an simple Blast result, such as blastn.
>>> Is there an  scrip  to transform such result to Clustalw format in
>>> Bioperl
>>> ?(.aln)
>>>
>>> Thanx for any help.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>


From maj at fortinbras.us  Thu Jan 14 08:54:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 08:54:31 -0500
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com><C85EC8A05E884B328AFDAA055341E9E2@NewLife><1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
	<1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>
Message-ID: <1B8891488AA746F49BCAAB531FBE4D0B@NewLife>

Thanks Marcelo-- code snips always appreciated! MAJ
----- Original Message ----- 
From: "Marcelo Iwata" <marcelo011982 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 8:46 AM
Subject: Re: [Bioperl-l] Blast to Clustalw Format


> Sorry , the correct code is:
>
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
> use Bio::AlignIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
>                           -file   => '
> ../../fontes/exemplos/blat/teste2/output.blast ');
> my $aln;
> my $alnIO;
> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
> while ( my $result = $in->next_result ) {
>  ## $result is a Bio::Search::Result::ResultI compliant object
>  while ( my $hit = $result->next_hit ) {
>    ## $hit is a Bio::Search::Hit::HitI compliant object
>    while ( my $hsp = $hit->next_hsp ) {
>      ## $hsp is a Bio::Search::HSP::HSPI compliant object
>      $aln = $hsp->get_aln;
>      $alnIO->write_aln($aln);
>
>    }
>  }
> }
>
>
> On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata 
> <marcelo011982 at gmail.com>wrote:
>
>> Thanks Mark.
>> I think that most of you already know it.
>> But , i'll put it for new users:
>>
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>> use Bio::SearchIO;
>> use Bio::AlignIO;
>>
>> my $in = new Bio::SearchIO(-format => 'blast',
>>                            -file   => '
>> ../../fontes/exemplos/blat/teste2/output.blast ');
>> my $aln;
>> my $alnIO;
>> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
>> while ( my $result = $in->next_result ) {
>>   ## $result is a Bio::Search::Result::ResultI compliant object
>>   while ( my $hit = $result->next_hit ) {
>>     ## $hit is a Bio::Search::Hit::HitI compliant object
>>     while ( my $hsp = $hit->next_hsp ) {
>>       ## $hsp is a Bio::Search::HSP::HSPI compliant object
>>       $aln = $hsp->get_aln;
>>       $alnIO->write_aln($aln);
>>
>>
>>     }
>>   }
>> }
>>
>>
>> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>> Marcelo-
>>> Yes-- look at the code snip at
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>>> combined with the snip at
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>> (using -format => 'clustalw')
>>> cheers MAJ
>>> ----- Original Message ----- From: "Marcelo Iwata" <
>>> marcelo011982 at gmail.com>
>>> To: <bioperl-l at lists.open-bio.org>
>>> Sent: Wednesday, January 13, 2010 1:12 PM
>>> Subject: [Bioperl-l] Blast to Clustalw Format
>>>
>>>
>>>  Hi..
>>>> I have an simple Blast result, such as blastn.
>>>> Is there an  scrip  to transform such result to Clustalw format in
>>>> Bioperl
>>>> ?(.aln)
>>>>
>>>> Thanx for any help.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From sidd.basu at gmail.com  Thu Jan 14 14:15:04 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 13:15:04 -0600
Subject: [Bioperl-l] reading blast report
Message-ID: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>

Hi, 
I have a script that reads a tblastn report(13000 records) and loads in
a chado database(Bio::Chado::Schema module),  however the machine runs of memory. I am trying to figure 
out other than loading the database stuff 
if it the reading of SearchIO module could consume a lot of memory. So,
when i am reading a blast file and getting the result object ....

while (my $result = $searchio->next_result)

* Does the searchio object loads a huge chunk of file in the memory or
  for each iteration it only reads a part of the result.

* Does doing an index on blast report and then reading from it be much
  faster and why. And is there any way i could iterate through each
  record in the index,  will that be helpful.

-siddhartha


From jason at bioperl.org  Thu Jan 14 14:53:29 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 11:53:29 -0800
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
Message-ID: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>

What aspects of the report are you loading?  You might consider the  
blast report as tab-delimited (-m 8 format) if you only are interested  
in start/end positions and scores of ailgnments which is a simpler and  
reduced dataset that has lower memory footprint by the parser.

Searchio (default) -format => blast - you can try the BLAST -format =>  
blast_pull instead which lazy parses to create objects and will reduce  
memory consumption.

-jason
On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:

> Hi,
> I have a script that reads a tblastn report(13000 records) and loads  
> in
> a chado database(Bio::Chado::Schema module),  however the machine  
> runs of memory. I am trying to figure
> out other than loading the database stuff
> if it the reading of SearchIO module could consume a lot of memory.  
> So,
> when i am reading a blast file and getting the result object ....
>
> while (my $result = $searchio->next_result)
>
> * Does the searchio object loads a huge chunk of file in the memory or
>  for each iteration it only reads a part of the result.
>
> * Does doing an index on blast report and then reading from it be much
>  faster and why. And is there any way i could iterate through each
>  record in the index,  will that be helpful.
>
> -siddhartha
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From sidd.basu at gmail.com  Thu Jan 14 15:15:45 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 14:15:45 -0600
Subject: [Bioperl-l]  Re: reading blast report
In-Reply-To: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
Message-ID: <4b4f7b74.5744f10a.7087.4813@mx.google.com>

On Thu, 14 Jan 2010, Jason Stajich wrote:

> What aspects of the report are you loading?  You might consider the blast 
> report as tab-delimited (-m 8 format) if you only are interested in 
> start/end positions and scores of ailgnments which is a simpler and reduced 
> dataset that has lower memory footprint by the parser.

I think this would be a better approach i am mostly interested in
start/end/score data only.

>
> Searchio (default) -format => blast - you can try the BLAST -format => 
> blast_pull instead which lazy parses to create objects and will reduce 
> memory consumption.

It's another good option though. But just out of curosity,  so the
regular blast parser do load the entire file in the memory consider the
output consist of multiple Results concatenated together into a
single file. Could anybody clarify.

thanks, 
-siddhartha


>
> -jason
> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
>
> > Hi,
> > I have a script that reads a tblastn report(13000 records) and loads in
> > a chado database(Bio::Chado::Schema module),  however the machine runs of 
> > memory. I am trying to figure
> > out other than loading the database stuff
> > if it the reading of SearchIO module could consume a lot of memory. So,
> > when i am reading a blast file and getting the result object ....
> >
> > while (my $result = $searchio->next_result)
> >
> > * Does the searchio object loads a huge chunk of file in the memory or
> >  for each iteration it only reads a part of the result.
> >
> > * Does doing an index on blast report and then reading from it be much
> >  faster and why. And is there any way i could iterate through each
> >  record in the index,  will that be helpful.
> >
> > -siddhartha
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>


From jason at bioperl.org  Thu Jan 14 16:28:29 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 13:28:29 -0800
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
Message-ID: <CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>


On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
>
>> What aspects of the report are you loading?  You might consider the  
>> blast
>> report as tab-delimited (-m 8 format) if you only are interested in
>> start/end positions and scores of ailgnments which is a simpler and  
>> reduced
>> dataset that has lower memory footprint by the parser.
>
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
>
>>
>> Searchio (default) -format => blast - you can try the BLAST -format  
>> =>
>> blast_pull instead which lazy parses to create objects and will  
>> reduce
>> memory consumption.
>
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider  
> the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.
>
> thanks,
> -siddhartha

Each result is parsed (1 result per query) and all the hits and HSPs  
are parsed and brought into memory with the standard (non-pull)  
approach.
The SearchIO iterates at the level of result - that is why you call  
next_result which parses each one at a time.

>
>
>>
>> -jason
>> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
>>
>>> Hi,
>>> I have a script that reads a tblastn report(13000 records) and  
>>> loads in
>>> a chado database(Bio::Chado::Schema module),  however the machine  
>>> runs of
>>> memory. I am trying to figure
>>> out other than loading the database stuff
>>> if it the reading of SearchIO module could consume a lot of  
>>> memory. So,
>>> when i am reading a blast file and getting the result object ....
>>>
>>> while (my $result = $searchio->next_result)
>>>
>>> * Does the searchio object loads a huge chunk of file in the  
>>> memory or
>>> for each iteration it only reads a part of the result.
>>>
>>> * Does doing an index on blast report and then reading from it be  
>>> much
>>> faster and why. And is there any way i could iterate through each
>>> record in the index,  will that be helpful.
>>>
>>> -siddhartha
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From sidd.basu at gmail.com  Thu Jan 14 16:40:42 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 15:40:42 -0600
Subject: [Bioperl-l]  Re: reading blast report
In-Reply-To: <CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
	<CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>
Message-ID: <4b4f8f5d.5644f10a.2be2.47dc@mx.google.com>

Thanks jason for clarification.

On Thu, 14 Jan 2010, Jason Stajich wrote:

>
> On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:
>
> > On Thu, 14 Jan 2010, Jason Stajich wrote:
> >
> >> What aspects of the report are you loading?  You might consider the blast
> >> report as tab-delimited (-m 8 format) if you only are interested in
> >> start/end positions and scores of ailgnments which is a simpler and 
> >> reduced
> >> dataset that has lower memory footprint by the parser.
> >
> > I think this would be a better approach i am mostly interested in
> > start/end/score data only.
> >
> >>
> >> Searchio (default) -format => blast - you can try the BLAST -format =>
> >> blast_pull instead which lazy parses to create objects and will reduce
> >> memory consumption.
> >
> > It's another good option though. But just out of curosity,  so the
> > regular blast parser do load the entire file in the memory consider the
> > output consist of multiple Results concatenated together into a
> > single file. Could anybody clarify.
> >
> > thanks,
> > -siddhartha
>
> Each result is parsed (1 result per query) and all the hits and HSPs are 
> parsed and brought into memory with the standard (non-pull) approach.
> The SearchIO iterates at the level of result - that is why you call 
> next_result which parses each one at a time.
>
> >
> >
> >>
> >> -jason
> >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
> >>
> >>> Hi,
> >>> I have a script that reads a tblastn report(13000 records) and loads in
> >>> a chado database(Bio::Chado::Schema module),  however the machine runs 
> >>> of
> >>> memory. I am trying to figure
> >>> out other than loading the database stuff
> >>> if it the reading of SearchIO module could consume a lot of memory. So,
> >>> when i am reading a blast file and getting the result object ....
> >>>
> >>> while (my $result = $searchio->next_result)
> >>>
> >>> * Does the searchio object loads a huge chunk of file in the memory or
> >>> for each iteration it only reads a part of the result.
> >>>
> >>> * Does doing an index on blast report and then reading from it be much
> >>> faster and why. And is there any way i could iterate through each
> >>> record in the index,  will that be helpful.
> >>>
> >>> -siddhartha
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at gmail.com
> >> jason at bioperl.org
> >> http://fungalgenomes.org/
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>


From SMarkel at accelrys.com  Thu Jan 14 17:58:06 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Thu, 14 Jan 2010 14:58:06 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>

We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
from our customers.  Due to network irregularities (not sure what else
to call it) users see the getting of remote BLAST results as somewhat
random.  When results come back the hits are fine, but sometimes no
information comes back at all.  Retrying helps.

In looking at RemoteBlast.pm there are four "return -1" cases.

* $status eq 'ERROR'      (return on line 614)
* $line =~ /ERROR/I       (return on line 628)
* !$got_content           (return on line 648)
* !$response->is_success  (return on line 655)

In the case of no content we'd like to retry remote BLAST.  We're happy
to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
module, but we only want to retry in that case, not the other three.

What would happen if that third "return -1" changed to a different
return value?

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


From nickjd at gmail.com  Wed Jan 13 08:18:12 2010
From: nickjd at gmail.com (NickJD)
Date: Wed, 13 Jan 2010 05:18:12 -0800 (PST)
Subject: [Bioperl-l] Parsing PSI-BLAST results with SearchIO
Message-ID: <65554589-081b-4297-ab68-9ddfbd3d9944@c34g2000yqn.googlegroups.com>

I am trying to parse PSI-BLAST results using SearchIO and some very
basic code just to read the number of hits, number of hsps, etc.  I
have done 10 rounds on 1 input sequence and parsed it but it seems to
treat each round as a separate result, so round/iteration is always 1
and new_hits its always the total list not the ones that are new to
that round.  Does anyone have any experience of this?

Thanks,

Nick


From dsidote at waksman.rutgers.edu  Wed Jan 13 10:08:48 2010
From: dsidote at waksman.rutgers.edu (David J Sidote)
Date: Wed, 13 Jan 2010 10:08:48 -0500
Subject: [Bioperl-l] Bioinformatician position - Waksman Institute
Message-ID: <4b42af671001130708i703ecce0u47348484321714f@mail.gmail.com>

Bioinformatician ? Research Assistant Professor


The Waksman Institute of Microbiology located on the New Brunswick campus of
Rutgers University is seeking a highly motivated and talented bioinformatics
scientist for an Research Assistant Professor appointment.  The successful
candidate will analyze genome, transcriptome, and epigenome data generated
on the Life Sciences 454, Illumina, and AB SOLiD high-throughput sequencing
platforms. Excellent communication and teamwork skills are essential as the
successful candidate will work closely with individual research groups to
develop software to facilitate the visualization, quantification, and
interpretation of the data. The successful candidate will be expected to
contribute to the publication of scientific literature and to present at
seminars and conferences.


Qualifications:


-       PhD in molecular biology, genetics, bioinformatics, systems biology
or other related fields; candidates with a PhD in physics, mathematics, or
computer science with some working knowledge of biology and experience are
encouraged to apply.

-       Demonstrated scientific track record

-       Highly proficient in perl, python, or ruby programming, linux/unix
scripting, and SQL.

-       Experience with R is desirable but not required

-       Experience with high-throughput sequencing, microarrays, or other
high-throughput biological platforms

-       Excellent communication and organizational skills


How to Apply:


Please send a cover letter stating your current research interests, why you
are interested in this position, and how your skill set complements this
position along with a curriculum vitae, and the names and contact
information of three references to hr at waksman.rutgers.edu. Please include
"Bioinformatics Assistant Research Professor" in the subject line. Rutgers
is an equal opportunity employer.


For more information about this position please contact:

Dr. David Sidote (dsidote at waksman.rutgers.edu)


From albezg at gmail.com  Wed Jan 13 20:57:27 2010
From: albezg at gmail.com (albezg)
Date: Wed, 13 Jan 2010 20:57:27 -0500
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with
 negative PDB ranges
In-Reply-To: <49C405F0.5050100@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com>
Message-ID: <4B4E7A07.7070805@gmail.com>

Hi all,

I have a problem using AlignIO to read Pfam database:
ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
The database is in STOCKHOLM 1.0 format. AlignIO can read the alignment 
OK until the alignment PF00331.13. There it crashes with the following 
message:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: '1-344' is not an integer.

STACK: Error::throw
STACK: Bio::Root::Root::throw 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
STACK: Bio::Range::end 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
STACK: Bio::Annotation::Target::new 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
STACK: Bio::AlignIO::stockholm::next_aln 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
STACK: /home/albezg/scripts/pfam2fasta.pl:22
-----------------------------------------------------------

It appears this is caused by this entry:
#=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;

I don't care about residues in PDB, so I have just removed minus signs 
from the ranges. This seems to have fixed the crashing.

Is it a known problem? Is there a solution for it?

Thanks,
Alexandr


On 03/20/2009 05:09 PM, albezg wrote:
>
> I'm trying to change FASTA header(display_id) for a sequence in an
> alignment(SimpleAlign).
>
> There are no issues when I print it, however when I use AlignIO to write
> the alignment to a FASTA file, it does not work. Is this behavior intended?
>
> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>
> The error:
> ------------- EXCEPTION -------------
> MSG: No sequence with name [1/1-11]
> STACK Bio::SimpleAlign::displayname
> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
> STACK Bio::AlignIO::fasta::write_aln
> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
> STACK toplevel ./demo.pl:14
> -------------------------------------
>
> Alexandr


From mitch_skinner at berkeley.edu  Thu Jan 14 17:10:53 2010
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Thu, 14 Jan 2010 14:10:53 -0800
Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory
Message-ID: <4B4F966D.3030300@berkeley.edu>

Hi,

Some people haven't been getting all of the features in their GFF3 into 
JBrowse, and a nice test case that James Casbon posted to the list 
helped me track it down.

Here's an example of the behavior I was seeing with BioPerl 1.6.1 (using 
Devel::REPL):

==============
$ use Bio::DB::SeqFeature::Store

$ my $db = Bio::DB::SeqFeature::Store->new(-adaptor=>"memory", 
-dsn=>"casbon.gff3")
$Bio_DB_SeqFeature_Store_memory1 = 
Bio::DB::SeqFeature::Store::memory=HASH(0xa27ceec);

$ $db->features(-seq_id=>"CYP2C8")
$ARRAY1 = [
             Feature:src(41),
             region(CYP2C8),
             Feature:src(37),
             Feature:src(39),
             Feature:src(42),
             Feature:src(40),
             Feature:src(38)
           ];
==============

I expected to also see the features with IDs 43 and 44 (the gff3 file is 
attached).

I think there's a problem in the filter_by_location method.  If start 
and end parameters aren't passed to the method, it sets default start 
and end values that lead it to examine all of the bins in its index.  
But the end value that it creates is at the beginning of the last bin, 
and I think it should be at the end of the last bin instead.  The 
attached patch changes it to be at the end of the last bin.

Regards,
Mitch
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: casbon.gff3
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100114/8494aaa7/attachment-0004.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bdsfsm-filter_by_location.patch
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100114/8494aaa7/attachment-0005.pl>

From jason at bioperl.org  Thu Jan 14 19:20:43 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 16:20:43 -0800
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <4B4E7A07.7070805@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
Message-ID: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>

Seems like improper data really -- "-1" is an improper coordinate as  
far as the parser is concerned. You may want to tell Pfam that there  
is possible error in the dumper since that was the only record that  
had this problem?

-jason
On Jan 13, 2010, at 5:57 PM, albezg wrote:

> Hi all,
>
> I have a problem using AlignIO to read Pfam database:
> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
> The database is in STOCKHOLM 1.0 format. AlignIO can read the  
> alignment OK until the alignment PF00331.13. There it crashes with  
> the following message:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: '1-344' is not an integer.
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/ 
> 5.10.0/Bio/Root/Root.pm:368
> STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/ 
> Range.pm:228
> STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/ 
> 5.10.0/Bio/Annotation/Target.pm:82
> STACK:  
> Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/ 
> albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
> GenericAlignHandler.pm:293
> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler / 
> home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
> GenericAlignHandler.pm:73
> STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/ 
> site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
> STACK: /home/albezg/scripts/pfam2fasta.pl:22
> -----------------------------------------------------------
>
> It appears this is caused by this entry:
> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>
> I don't care about residues in PDB, so I have just removed minus  
> signs from the ranges. This seems to have fixed the crashing.
>
> Is it a known problem? Is there a solution for it?
>
> Thanks,
> Alexandr
>
>
> On 03/20/2009 05:09 PM, albezg wrote:
>>
>> I'm trying to change FASTA header(display_id) for a sequence in an
>> alignment(SimpleAlign).
>>
>> There are no issues when I print it, however when I use AlignIO to  
>> write
>> the alignment to a FASTA file, it does not work. Is this behavior  
>> intended?
>>
>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>
>> The error:
>> ------------- EXCEPTION -------------
>> MSG: No sequence with name [1/1-11]
>> STACK Bio::SimpleAlign::displayname
>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>> STACK Bio::AlignIO::fasta::write_aln
>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>> STACK toplevel ./demo.pl:14
>> -------------------------------------
>>
>> Alexandr
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From maj at fortinbras.us  Thu Jan 14 21:00:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 21:00:31 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
Message-ID: <CD613D33411040F8921DE3098FD6DF41@NewLife>

How about returning 1, 2, 4 for the non-zero cases, with some
error constants set for convenience? MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 5:58 PM
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Thu Jan 14 19:42:31 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 18:42:31 -0600
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
Message-ID: <0B76CCA7-C37C-4E24-BBDF-C8FD805DBBF2@illinois.edu>


On Jan 14, 2010, at 2:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
> 
>> What aspects of the report are you loading?  You might consider the blast 
>> report as tab-delimited (-m 8 format) if you only are interested in 
>> start/end positions and scores of ailgnments which is a simpler and reduced 
>> dataset that has lower memory footprint by the parser.
> 
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
> 
>> Searchio (default) -format => blast - you can try the BLAST -format => 
>> blast_pull instead which lazy parses to create objects and will reduce 
>> memory consumption.
> 
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.

Yes, the original SearchIO parsers all load the data into objects.  This was based on the presumption that one wouldn't want very large BLAST reports, but this assumption probably isn't amenable today.  The pull parser is one aswer to that, in it pulls the data only upon request (creates them on the fly), so it should be more amenable to parsing very large BLAST reports.

> thanks, 
> -siddhartha
> 
>> -jason

chris


From cjfields at illinois.edu  Fri Jan 15 01:33:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Jan 2010 00:33:50 -0600
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
Message-ID: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>

Scott,

I think this is fine (to change the third condition and retry with a specific code).  The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance).

One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3.  Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.

chris

PS - BTW, nice to finally meet you at GMOD!

On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:

> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields1 at gmail.com  Fri Jan 15 01:35:35 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Fri, 15 Jan 2010 00:35:35 -0600
Subject: [Bioperl-l] filter_by_location in
	Bio::DB::SeqFeature::Store::memory
In-Reply-To: <4B4F966D.3030300@berkeley.edu>
References: <4B4F966D.3030300@berkeley.edu>
Message-ID: <992796AC-B85B-4555-88A1-36000C0A2002@gmail.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100115/b772ee67/attachment-0002.html>

From David.Messina at sbc.su.se  Fri Jan 15 10:17:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 15 Jan 2010 16:17:14 +0100
Subject: [Bioperl-l] getting/setting species names with Bio::Species
Message-ID: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>

Hi everybody,

I'm having a little trouble with names in Bio::Species objects.

According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method:

my $my_species_obj = Bio::Species->new();
$my_species_obj->species('Homo sapiens');

print $my_species_obj->species;     # 'Homo sapiens'


That works fine if I create the Bio::Species object myself.

But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back:

my $io = Bio::SeqIO->new('-format' => 'genbank',
                         '-file'   => 'hoxa2.gb');
my $seq_obj = $io->next_seq;
my $io_species_obj = $seq_obj->species;

print $io_species_obj->species;     # 'sapiens'


I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately.

Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object:

print $my_species_obj->binomial;    # 'Homosapiens'
print $io_species_obj->binomial;    # 'Homo sapiens'


I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way?

If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite.


Thanks,
Dave


From maj at fortinbras.us  Fri Jan 15 10:31:16 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 10:31:16 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
Message-ID: <C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>

I'm not that familiar with Bio::Species either, but this looks
like conflicting semantics betwen Bio::Species and Bio::SeqIO.
Bio::SeqIO sets the species accessor to the 'species' element of
the lineage array, I believe.
FWIW, I'd prefer "binomial" = "genus" . "species"
MAJ
----- Original Message ----- 
From: "Dave Messina" <David.Messina at sbc.su.se>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:17 AM
Subject: [Bioperl-l] getting/setting species names with Bio::Species


> Hi everybody,
>
> I'm having a little trouble with names in Bio::Species objects.
>
> According to the Bio::Species documentation, if I have a species name as a 
> string, like "Homo sapiens", I can get and set that using the species method:
>
> my $my_species_obj = Bio::Species->new();
> $my_species_obj->species('Homo sapiens');
>
> print $my_species_obj->species;     # 'Homo sapiens'
>
>
> That works fine if I create the Bio::Species object myself.
>
> But if I try to get that string back out from a BIo::Species object created by 
> SeqIO from a genbank file, I get just 'sapiens' back:
>
> my $io = Bio::SeqIO->new('-format' => 'genbank',
>                         '-file'   => 'hoxa2.gb');
> my $seq_obj = $io->next_seq;
> my $io_species_obj = $seq_obj->species;
>
> print $io_species_obj->species;     # 'sapiens'
>
>
> I think that happens because genbank records have more taxonomic info about 
> the species name, like the genus (and in fact the whole taxonomic 
> categorization: kingdom phylum order, etc). So the genus is stored separately.
>
> Poking around a bit more in Bio::Species, I turned up the method 'binomial', 
> which appears to do the right thing, returning genus and species in both 
> cases. Except, as you can see, the space is stripped out for my 
> species-name-is-just-a-string object:
>
> print $my_species_obj->binomial;    # 'Homosapiens'
> print $io_species_obj->binomial;    # 'Homo sapiens'
>
>
> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I 
> using it correctly above, or is there a better way?
>
> If not, this kinda looks like a bug to me. I've got a patch which works and 
> passes the BioPerl test suite.
>
>
> Thanks,
> Dave
>
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 10:24:06 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 10:24:06 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
	<E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
Message-ID: <F1C8FA379C5746FB8987C1D41905C3F3@NewLife>

True-- blast+ allows remote dbs. I just commited a patch that makes
this easy in StandAloneBlastPlus: specify '-remote => 1' in the
factory, and downstream command calls will take care of it-
MAJ

# ex...
use Bio::Tools::Run::StandAloneBlastPlus;
use Bio::Seq;

$ENV{BLASTPLUSDIR} = $where_it_is;
my $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
    -db_name => 'wgs',
    -remote => 1
    );
my $result = $fac->blastn(
    -query => 
Bio::Seq->new(-seq=>'ggcaacaaacctggtaaagaagacggcaacaagcctggtaaagaagatggcaacaagcct',
       -id=>"proteinA")
    );


1;

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Markel" <smarkel at accelrys.com>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 1:33 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From SMarkel at accelrys.com  Fri Jan 15 10:40:31 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Fri, 15 Jan 2010 07:40:31 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
	<E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>

Chris,

It was nice meeting you and Scott C., too.  And seeing Jason again.

If you and Mark

> How about returning 1, 2, 4 for the non-zero cases, with some
> error constants set for convenience? MAJ

are okay with adding more return values, that works best for us in
Pipeline Pilot.

I'll add a Bugzilla entry.

Scott


-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu] 
Sent: Thursday, 14 January 2010 10:34 PM
To: Scott Markel
Cc: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes

Scott,

I think this is fine (to change the third condition and retry with a specific code).  The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance).

One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3.  Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.

chris

PS - BTW, nice to finally meet you at GMOD!

On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:

> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Fri Jan 15 11:00:21 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Jan 2010 10:00:21 -0600
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
	<C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
Message-ID: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>

> FWIW, I'd prefer "binomial" = "genus" . "species"


That's the way Bio::Species is supposed to work, at least when it was refactored by Sendu.  But just a note: Bio::Species was considered deprecated (scheduled for the 1.7 release IIRC) for many very good reasons in favor of Bio::Taxon.  First and foremost among these is the fact we cannot consistently parse out the genus/species/strain/variant/etc for every organism in GenBank w/o knowing it's full lineage, which means including some taxonomic information.  And even then it's highly problematic.

We've had several heated discussions on list about how to handle this in a somewhat backwards-compatible way, and the main solution was to forego compatibility issues altogether and eventually deprecate Bio::Species altogether in favor of Bio::Taxon, a class that doesn't make the same assumptions.  Bio::Species, in the interim, is-a Bio::Taxon.  You'll note that a minimal Bio::DB::Taxonomy instance is constructed from the classification scheme in some instances, but if one had a proper DB link one could link to Entrez Taxonomy or a local flat file indexes DB and grab the info.  Bio::Taxon (correct me if I'm wrong on this Sendu, if you're out there) eschews various methods (species, etc) for simpler consistent ones based on Taxonomy, and doesn't force us to handle every exception to getting the genus/species out of a name.  That is left up to the user, at their peril.

For either one, if you are reproducing the fully qualified name, you probably should use something like node_name() for consistency.  Bio::Species also has scientific_name().  With a true Bio::Taxon one would need to be check this is performed on the species node.

chris

On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote:

> I'm not that familiar with Bio::Species either, but this looks
> like conflicting semantics betwen Bio::Species and Bio::SeqIO.
> Bio::SeqIO sets the species accessor to the 'species' element of
> the lineage array, I believe.
> FWIW, I'd prefer "binomial" = "genus" . "species"
> MAJ
> ----- Original Message ----- From: "Dave Messina" <David.Messina at sbc.su.se>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 15, 2010 10:17 AM
> Subject: [Bioperl-l] getting/setting species names with Bio::Species
> 
> 
>> Hi everybody,
>> 
>> I'm having a little trouble with names in Bio::Species objects.
>> 
>> According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method:
>> 
>> my $my_species_obj = Bio::Species->new();
>> $my_species_obj->species('Homo sapiens');
>> 
>> print $my_species_obj->species;     # 'Homo sapiens'
>> 
>> 
>> That works fine if I create the Bio::Species object myself.
>> 
>> But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back:
>> 
>> my $io = Bio::SeqIO->new('-format' => 'genbank',
>>                        '-file'   => 'hoxa2.gb');
>> my $seq_obj = $io->next_seq;
>> my $io_species_obj = $seq_obj->species;
>> 
>> print $io_species_obj->species;     # 'sapiens'
>> 
>> 
>> I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately.
>> 
>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object:
>> 
>> print $my_species_obj->binomial;    # 'Homosapiens'
>> print $io_species_obj->binomial;    # 'Homo sapiens'
>> 
>> 
>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way?
>> 
>> If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite.
>> 
>> 
>> Thanks,
>> Dave
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From SMarkel at accelrys.com  Fri Jan 15 11:10:34 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Fri, 15 Jan 2010 08:10:34 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <FE85CD2526044E8797D5A1A248AF6866@NewLife>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net><E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
	<5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
	<FE85CD2526044E8797D5A1A248AF6866@NewLife>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B30A7@EXCH1-COLO.accelrys.net>

Mark,

Thank you.

Scott


-----Original Message-----
From: Mark A. Jensen [mailto:maj at fortinbras.us] 
Sent: Friday, 15 January 2010 8:10 AM
To: Scott Markel; Chris Fields
Cc: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes

can do Scott-- cheers MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:40 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Chris,
>
> It was nice meeting you and Scott C., too.  And seeing Jason again.
>
> If you and Mark
>
>> How about returning 1, 2, 4 for the non-zero cases, with some
>> error constants set for convenience? MAJ
>
> are okay with adding more return values, that works best for us in
> Pipeline Pilot.
>
> I'll add a Bugzilla entry.
>
> Scott
>
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thursday, 14 January 2010 10:34 PM
> To: Scott Markel
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
>
> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 11:09:38 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 11:09:38 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net><E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
	<5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
Message-ID: <FE85CD2526044E8797D5A1A248AF6866@NewLife>

can do Scott-- cheers MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:40 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Chris,
>
> It was nice meeting you and Scott C., too.  And seeing Jason again.
>
> If you and Mark
>
>> How about returning 1, 2, 4 for the non-zero cases, with some
>> error constants set for convenience? MAJ
>
> are okay with adding more return values, that works best for us in
> Pipeline Pilot.
>
> I'll add a Bugzilla entry.
>
> Scott
>
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thursday, 14 January 2010 10:34 PM
> To: Scott Markel
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
>
> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 11:10:02 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 11:10:02 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se><C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
	<16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>
Message-ID: <C4C0A0697FCE4CFD897AD58FA7FD58AA@NewLife>

excellent summary--thanks!!
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 11:00 AM
Subject: Re: [Bioperl-l] getting/setting species names with Bio::Species


>> FWIW, I'd prefer "binomial" = "genus" . "species"
>
>
> That's the way Bio::Species is supposed to work, at least when it was 
> refactored by Sendu.  But just a note: Bio::Species was considered deprecated 
> (scheduled for the 1.7 release IIRC) for many very good reasons in favor of 
> Bio::Taxon.  First and foremost among these is the fact we cannot consistently 
> parse out the genus/species/strain/variant/etc for every organism in GenBank 
> w/o knowing it's full lineage, which means including some taxonomic 
> information.  And even then it's highly problematic.
>
> We've had several heated discussions on list about how to handle this in a 
> somewhat backwards-compatible way, and the main solution was to forego 
> compatibility issues altogether and eventually deprecate Bio::Species 
> altogether in favor of Bio::Taxon, a class that doesn't make the same 
> assumptions.  Bio::Species, in the interim, is-a Bio::Taxon.  You'll note that 
> a minimal Bio::DB::Taxonomy instance is constructed from the classification 
> scheme in some instances, but if one had a proper DB link one could link to 
> Entrez Taxonomy or a local flat file indexes DB and grab the info.  Bio::Taxon 
> (correct me if I'm wrong on this Sendu, if you're out there) eschews various 
> methods (species, etc) for simpler consistent ones based on Taxonomy, and 
> doesn't force us to handle every exception to getting the genus/species out of 
> a name.  That is left up to the user, at their peril.
>
> For either one, if you are reproducing the fully qualified name, you probably 
> should use something like node_name() for consistency.  Bio::Species also has 
> scientific_name().  With a true Bio::Taxon one would need to be check this is 
> performed on the species node.
>
> chris
>
> On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote:
>
>> I'm not that familiar with Bio::Species either, but this looks
>> like conflicting semantics betwen Bio::Species and Bio::SeqIO.
>> Bio::SeqIO sets the species accessor to the 'species' element of
>> the lineage array, I believe.
>> FWIW, I'd prefer "binomial" = "genus" . "species"
>> MAJ
>> ----- Original Message ----- From: "Dave Messina" <David.Messina at sbc.su.se>
>> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 15, 2010 10:17 AM
>> Subject: [Bioperl-l] getting/setting species names with Bio::Species
>>
>>
>>> Hi everybody,
>>>
>>> I'm having a little trouble with names in Bio::Species objects.
>>>
>>> According to the Bio::Species documentation, if I have a species name as a 
>>> string, like "Homo sapiens", I can get and set that using the species 
>>> method:
>>>
>>> my $my_species_obj = Bio::Species->new();
>>> $my_species_obj->species('Homo sapiens');
>>>
>>> print $my_species_obj->species;     # 'Homo sapiens'
>>>
>>>
>>> That works fine if I create the Bio::Species object myself.
>>>
>>> But if I try to get that string back out from a BIo::Species object created 
>>> by SeqIO from a genbank file, I get just 'sapiens' back:
>>>
>>> my $io = Bio::SeqIO->new('-format' => 'genbank',
>>>                        '-file'   => 'hoxa2.gb');
>>> my $seq_obj = $io->next_seq;
>>> my $io_species_obj = $seq_obj->species;
>>>
>>> print $io_species_obj->species;     # 'sapiens'
>>>
>>>
>>> I think that happens because genbank records have more taxonomic info about 
>>> the species name, like the genus (and in fact the whole taxonomic 
>>> categorization: kingdom phylum order, etc). So the genus is stored 
>>> separately.
>>>
>>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', 
>>> which appears to do the right thing, returning genus and species in both 
>>> cases. Except, as you can see, the space is stripped out for my 
>>> species-name-is-just-a-string object:
>>>
>>> print $my_species_obj->binomial;    # 'Homosapiens'
>>> print $io_species_obj->binomial;    # 'Homo sapiens'
>>>
>>>
>>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I 
>>> using it correctly above, or is there a better way?
>>>
>>> If not, this kinda looks like a bug to me. I've got a patch which works and 
>>> passes the BioPerl test suite.
>>>
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hlapp at drycafe.net  Fri Jan 15 12:04:43 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Fri, 15 Jan 2010 12:04:43 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
Message-ID: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>


On Jan 15, 2010, at 10:17 AM, Dave Messina wrote:

> According to the Bio::Species documentation, if I have a species  
> name as a string, like "Homo sapiens", I can get and set that using  
> the species method:
>
> my $my_species_obj = Bio::Species->new();
> $my_species_obj->species('Homo sapiens');


If that's really what the documentation says, it's wrong. It is the  
binomial() method that does this (as getter and setter).

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From David.Messina at sbc.su.se  Fri Jan 15 13:37:17 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 15 Jan 2010 19:37:17 +0100
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
	<2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>
Message-ID: <24798E45-CF24-47D9-AB39-E66C35A5FA8B@sbc.su.se>

Thanks guys.

Well, looks like I ignored the deprecation warnings at my own peril. :)

I'll reimplement my code using Bio::Taxon directly instead. I made a little test using the node_name() method as Chris suggested, and it seems to do the trick nicely.


> If that's really what the documentation says, it's wrong.

I'm afraid so. In the POD
>  Title   : species
>  Usage   : $self->species( $species );
>            $species = $self->species();
>  Function: Get or set the scientific species name.
>  Example : $self->species('Homo sapiens');
>  Returns : Scientific species name as string
>  Args    : Scientific species name as string

and the HOWTO 
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object
> # legible and long
> my $species_object = $seq_object->species;
> my $species_string = $species_object->species;
> 
> # Perlish
> my $species_string = $seq_object->species->species;
> # either way, $species_string is "Homo sapiens"


Unless there's objection, I'll fix both of those.


> It is the binomial() method that does this (as getter and setter).

Great, thanks for the clarification, Hilmar.


From bhakti.dwivedi at gmail.com  Sun Jan 17 11:02:47 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sun, 17 Jan 2010 11:02:47 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
Message-ID: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>

Hi

Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
&& hit1 -> query1)  from a blast table report?

Thanks

BD


From cjfields at illinois.edu  Sun Jan 17 12:45:08 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 17 Jan 2010 11:45:08 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <4FC546A8-079F-4A17-AB96-D4A0060904D6@illinois.edu>

It's probably not best to use BioPerl directly for this.  Have you tried OrthoMCL, or InParanoid? 

chris

On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote:

> Hi
> 
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?
> 
> Thanks
> 
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Sun Jan 17 16:03:24 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 17 Jan 2010 16:03:24 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <B602C24552CF42C58F80F3883198121C@NewLife>

re Chris's answer, check out this archived post:
http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
cheers MAJ
----- Original Message ----- 
From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 17, 2010 11:02 AM
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?


> Hi
> 
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?
> 
> Thanks
> 
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bhakti.dwivedi at gmail.com  Sun Jan 17 16:10:03 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sun, 17 Jan 2010 16:10:03 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <B602C24552CF42C58F80F3883198121C@NewLife>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<B602C24552CF42C58F80F3883198121C@NewLife>
Message-ID: <b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>

Thank you!


On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> re Chris's answer, check out this archived post:
> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
> cheers MAJ
> ----- Original Message ----- From: "Bhakti Dwivedi" <
> bhakti.dwivedi at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Sunday, January 17, 2010 11:02 AM
> Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
>
>
>  Hi
>>
>> Is there a Bio-perl module to parse the reciprocal best hits (query1->
>> hit1
>> && hit1 -> query1)  from a blast table report?
>>
>> Thanks
>>
>> BD
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>


From cjfields at illinois.edu  Sun Jan 17 17:00:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 17 Jan 2010 16:00:02 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<B602C24552CF42C58F80F3883198121C@NewLife>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
Message-ID: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>

OrthoMCL has updated to v2 and no longer uses BioPerl, just plain perl.  Database is available here:

http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi

Package (you'll need a few other things to get it working):

http://orthomcl.org/common/downloads/software/

chris

On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:

> Thank you!
> 
> 
> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> 
>> re Chris's answer, check out this archived post:
>> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
>> cheers MAJ
>> ----- Original Message ----- From: "Bhakti Dwivedi" <
>> bhakti.dwivedi at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Sunday, January 17, 2010 11:02 AM
>> Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
>> 
>> 
>> Hi
>>> 
>>> Is there a Bio-perl module to parse the reciprocal best hits (query1->
>>> hit1
>>> && hit1 -> query1)  from a blast table report?
>>> 
>>> Thanks
>>> 
>>> BD
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From tristan.lefebure at gmail.com  Sun Jan 17 18:12:56 2010
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Sun, 17 Jan 2010 18:12:56 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
	<392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
Message-ID: <201001171812.56238.tristan.lefebure@gmail.com>

The transition to orthoMCL v2 being a bit painful (you need 
a MySQL database), I recently switched directly to MCL and 
the accompanying mclblastline and co programs. Modular, 
simple and very fast. Following some simulations, It gives 
better results with incomplete genomes than orthoMCL v1.x 
...

http://micans.org/mcl/

--Tristan

On Sunday 17 January 2010 17:00:02 Chris Fields wrote:
> OrthoMCL has updated to v2 and no longer uses BioPerl,
>  just plain perl.  Database is available here:
> 
> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi
> 
> Package (you'll need a few other things to get it
>  working):
> 
> http://orthomcl.org/common/downloads/software/
> 
> chris
> 
> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:
> > Thank you!
> >
> > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen 
<maj at fortinbras.us> wrote:
> >> re Chris's answer, check out this archived post:
> >> http://bioperl.org/pipermail/bioperl-l/2008-March/0273
> >>57.html cheers MAJ
> >> ----- Original Message ----- From: "Bhakti Dwivedi" <
> >> bhakti.dwivedi at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Sunday, January 17, 2010 11:02 AM
> >> Subject: [Bioperl-l] Reciprocal best hits using
> >> Bioperl?
> >>
> >>
> >> Hi
> >>
> >>> Is there a Bio-perl module to parse the reciprocal
> >>> best hits (query1-> hit1
> >>> && hit1 -> query1)  from a blast table report?
> >>>
> >>> Thanks
> >>>
> >>> BD
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason at bioperl.org  Sun Jan 17 18:59:05 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 17 Jan 2010 15:59:05 -0800
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <201001171812.56238.tristan.lefebure@gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
	<392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
	<201001171812.56238.tristan.lefebure@gmail.com>
Message-ID: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>

yes - but mcl alone is something slightly different in that it doesn't  
correct for inparalogs, but for incomplete genomes this is probably  
okay.

orthomcl2 does correct the major memory hog problem and efficiencies  
in the parsing in the previous version by relying on the db for the  
indexing and looking of the reciprocal hits.

-jason
On Jan 17, 2010, at 3:12 PM, Tristan Lefebure wrote:

> The transition to orthoMCL v2 being a bit painful (you need
> a MySQL database), I recently switched directly to MCL and
> the accompanying mclblastline and co programs. Modular,
> simple and very fast. Following some simulations, It gives
> better results with incomplete genomes than orthoMCL v1.x
> ...
>
> http://micans.org/mcl/
>
> --Tristan
>
> On Sunday 17 January 2010 17:00:02 Chris Fields wrote:
>> OrthoMCL has updated to v2 and no longer uses BioPerl,
>> just plain perl.  Database is available here:
>>
>> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi
>>
>> Package (you'll need a few other things to get it
>> working):
>>
>> http://orthomcl.org/common/downloads/software/
>>
>> chris
>>
>> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:
>>> Thank you!
>>>
>>> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen
> <maj at fortinbras.us> wrote:
>>>> re Chris's answer, check out this archived post:
>>>> http://bioperl.org/pipermail/bioperl-l/2008-March/0273
>>>> 57.html cheers MAJ
>>>> ----- Original Message ----- From: "Bhakti Dwivedi" <
>>>> bhakti.dwivedi at gmail.com>
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Sunday, January 17, 2010 11:02 AM
>>>> Subject: [Bioperl-l] Reciprocal best hits using
>>>> Bioperl?
>>>>
>>>>
>>>> Hi
>>>>
>>>>> Is there a Bio-perl module to parse the reciprocal
>>>>> best hits (query1-> hit1
>>>>> && hit1 -> query1)  from a blast table report?
>>>>>
>>>>> Thanks
>>>>>
>>>>> BD
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From tristan.lefebure at gmail.com  Sun Jan 17 20:36:38 2010
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Sun, 17 Jan 2010 20:36:38 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
Message-ID: <201001172036.39032.tristan.lefebure@gmail.com>

On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
> yes - but mcl alone is something slightly different in
>  that it doesn't   correct for inparalogs, but for
>  incomplete genomes this is probably okay.

interestingly, my experience with not too divergent 
bacterial genomes (same genera) does not support the 
normalization used in the orthoMCL (which, as far as I 
understand, is a standardization of the -Log10(evalue) per 
taxa combination, including a taxa with itself). MCL, which 
does not do any normalization (just -Log10(evalue)) gives 
about the same number of false negative (i.e. missed 
orthologs), but a lot less false positive (false orthologs). 
In other words, you get many fake singletons. I don't known 
exactly if the problem lies in the normalization process or 
the fact that orthoMCLv1.x is using a very old version of 
MCL. What I do known is that many false positive are made of 
short or incomplete proteins that are very common in draft 
genomes and automatic annotations... Things might be 
completely different with more divergent and globally longer 
proteins. Testing orthoMCLv2 on the same data set would 
probably give the answer.

--Tristan


From robert.bradbury at gmail.com  Mon Jan 18 05:20:33 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Mon, 18 Jan 2010 05:20:33 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <201001172036.39032.tristan.lefebure@gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
Message-ID: <deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>

My comment might be that the problem with OrthoMCL is that it is
primarily lower organisms.  The problem with Ensembl (and some other
databases) is that it is primarliy higher organisms (though they do
include Drosophila, C. elegans and Yeast).

The problem arises when one wants to cross those boundaries.  For
example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
tRNAs, and the fundamental biochemistry (EC) proteins are homologous
all the way from the most ancient bacteria through H. sapiens.  The
only way to play in the mixed arena of prokaryotes and eukaryotes
involving fundamental vectors in evolution is to either construct ones
own databases (which presumably means getting involved with MySQL, and
probably spending some $$$ on hardware) or to develop some BioPerl
modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
using some part of the cloud.  This problem isn't going to get smaller
its only going to get larger, now that the cost of sequencing
(pseudo-resequencing) a vertebrate genome is starting to come in under
$10,000 and people are starting to seriously talk about 10,000
vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
people are going to undertake very soon.

Robert


On 1/17/10, Tristan Lefebure <tristan.lefebure at gmail.com> wrote:
> On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
>> yes - but mcl alone is something slightly different in
>>  that it doesn't   correct for inparalogs, but for
>>  incomplete genomes this is probably okay.
>
> interestingly, my experience with not too divergent
> bacterial genomes (same genera) does not support the
> normalization used in the orthoMCL (which, as far as I
> understand, is a standardization of the -Log10(evalue) per
> taxa combination, including a taxa with itself). MCL, which
> does not do any normalization (just -Log10(evalue)) gives
> about the same number of false negative (i.e. missed
> orthologs), but a lot less false positive (false orthologs).
> In other words, you get many fake singletons. I don't known
> exactly if the problem lies in the normalization process or
> the fact that orthoMCLv1.x is using a very old version of
> MCL. What I do known is that many false positive are made of
> short or incomplete proteins that are very common in draft
> genomes and automatic annotations... Things might be
> completely different with more divergent and globally longer
> proteins. Testing orthoMCLv2 on the same data set would
> probably give the answer.
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ghhu at sibs.ac.cn  Sun Jan 17 21:34:23 2010
From: ghhu at sibs.ac.cn (Guohong Hu)
Date: Mon, 18 Jan 2010 10:34:23 +0800
Subject: [Bioperl-l] Bioperl 1.6
Message-ID: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>

Hi there,

 
I was trying to install BioPerl in windows using ppm, by following the
instruction in
"http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
the repositories, and did the search of Bioperl packages. The latest version
available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
install it, a number of prerequisite modules were being installed too, which
include Bioperl 1.4. Then an error message showed up during installation:

 
"ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
BioPerl has already installed a file that package bioperl wants to install."

 
It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
wanted to install again. I don't know why bioperl 1.4 was one of the
prerequisites for 1.6.1. If I just install 1.4, it will be installed without
errors. But I need a newer version, because some modules (like

Bio::Tools::HMM) is not included in 1.4.

 
I saw on internet that somebody had the same problem when he was trying to
install BioPerl 1.5, but I didn't find the solution.

 
Anybody has a clue on that? Thank you for your time.

 
GH

 
From cjfields at illinois.edu  Mon Jan 18 10:30:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 09:30:20 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
Message-ID: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>

Guohong, 

1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first.  Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused).  Just curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I can physically remove it to prevent this from happening.

chris

On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:

> Hi there,
> 
> 
> 
> I was trying to install BioPerl in windows using ppm, by following the
> instruction in
> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
> the repositories, and did the search of Bioperl packages. The latest version
> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
> install it, a number of prerequisite modules were being installed too, which
> include Bioperl 1.4. Then an error message showed up during installation:
> 
> 
> 
> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
> BioPerl has already installed a file that package bioperl wants to install."
> 
> 
> 
> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
> wanted to install again. I don't know why bioperl 1.4 was one of the
> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
> errors. But I need a newer version, because some modules (like
> 
> Bio::Tools::HMM) is not included in 1.4.
> 
> 
> 
> I saw on internet that somebody had the same problem when he was trying to
> install BioPerl 1.5, but I didn't find the solution.
> 
> 
> 
> Anybody has a clue on that? Thank you for your time.
> 
> 
> 
> GH
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 18 11:12:08 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 10:12:08 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
	<deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
Message-ID: <B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>

(my small rant on this)

On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote:

> My comment might be that the problem with OrthoMCL is that it is
> primarily lower organisms.  The problem with Ensembl (and some other
> databases) is that it is primarliy higher organisms (though they do
> include Drosophila, C. elegans and Yeast).

OrthoMCL v2 handles both lower and higher organism; I've used it for both, with decent success.  Most other ortholog tools do as well (if I'm not mistaken, ensembl also uses MCL under the hood, unless that's changed).  I don't believe one should be completely bound to one toolset, particularly in this case (there are lots of nice ortholog clustering tools using various moeans of comparison out there), but I do think OrthoMCL is very good as an initial pass.  If anything, I would like a set of (possibly bioperl-based, definitely DB-based) modules that can deal with this information.

The more imperative issue in my opinion is that one is prisoner to the gene models for those specific organisms of interest, and this may vary widely depending on the source of those gene models (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc).  For instance, if gene models are poorly curated or rarely updated, the comparisons may be significantly flawed.  Some of these issues may also be (somewhat) alleviated once more transcriptome data is available that helps clear up gene model ambiguities, but that won't be true for all organisms, at least initially.

Note this isn't meant as a slam on any specific DBs or MODs in general, the problem is one born of the fact that there isn't a single, centralized, trusted, consistently updated source for this data, specifically something that will handle moderated third-party annotation.  That's a very difficult problem to solve effectively.  Some of these very issues crept up at the GMOD conference, and there appears to be consensus that a real attempt is needed to address this.  

I don't know, maybe it's just unicorns and rainbows.  Personally I do think the situation will improve, as there seems to be great demand for it, but it requires time, resources, manpower, money, cat herding, etc.

> The problem arises when one wants to cross those boundaries.  For
> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
> tRNAs, and the fundamental biochemistry (EC) proteins are homologous
> all the way from the most ancient bacteria through H. sapiens.  The
> only way to play in the mixed arena of prokaryotes and eukaryotes
> involving fundamental vectors in evolution is to either construct ones
> own databases (which presumably means getting involved with MySQL, and
> probably spending some $$$ on hardware) or to develop some BioPerl
> modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
> using some part of the cloud.  This problem isn't going to get smaller
> its only going to get larger, now that the cost of sequencing
> (pseudo-resequencing) a vertebrate genome is starting to come in under
> $10,000 and people are starting to seriously talk about 10,000
> vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
> people are going to undertake very soon.
> 
> Robert

They're already undertaking it now using a broad range of organisms, in and out of the cloud.  In most cases one can amend a prior recip. comparative analysis with new data fairly easily, if one takes care to do so early on (i.e. set up the BLAST databases with a specified defined size for comparative stats between separate analyses).  OrthoMCL v2 describes a procedure to do this, and I believe others have similar methodology.  

I could also see possible ways one can further optimize this, for instance in cases where two very closely-related organisms are compared, where translated seqs are 100% identical, etc.  IIRC, the OrthoMCL DB site already has a way to upload custom sets of protein data for mapping to (already pre-run) clusters.  Just the fact that the tools are available as OS, they're semi-automated, and can be generically applied to data of personal interest is a great boon.  Not sure I see the downside of that, and I'm pretty confident the scalability issues will be addressed in some way.

chris


From maj at fortinbras.us  Mon Jan 18 11:33:12 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 11:33:12 -0500
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
Message-ID: <6093E45F17B543438AC02E6C626439E1@NewLife>

this issue's come up before, see this thread
http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Guohong Hu" <ghhu at sibs.ac.cn>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 10:30 AM
Subject: Re: [Bioperl-l] Bioperl 1.6


> Guohong,
>
> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed 
> first.  Make sure the repos are set according to the Windows installation 
> instructions on the BioPerl wiki:
>
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> IIRC the actual order of the PPM repository can be critical (PPM pulls based 
> on highest version, first repo, but sometimes it gets confused).  Just curious 
> but where is the v 1.4 PPM located?  If it is local to our PPM repo I can 
> physically remove it to prevent this from happening.
>
> chris
>
> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>
>> Hi there,
>>
>>
>>
>> I was trying to install BioPerl in windows using ppm, by following the
>> instruction in
>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>> the repositories, and did the search of Bioperl packages. The latest version
>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>> install it, a number of prerequisite modules were being installed too, which
>> include Bioperl 1.4. Then an error message showed up during installation:
>>
>>
>>
>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>> BioPerl has already installed a file that package bioperl wants to install."
>>
>>
>>
>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>> wanted to install again. I don't know why bioperl 1.4 was one of the
>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>> errors. But I need a newer version, because some modules (like
>>
>> Bio::Tools::HMM) is not included in 1.4.
>>
>>
>>
>> I saw on internet that somebody had the same problem when he was trying to
>> install BioPerl 1.5, but I didn't find the solution.
>>
>>
>>
>> Anybody has a clue on that? Thank you for your time.
>>
>>
>>
>> GH
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jan 18 12:18:34 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 11:18:34 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <6093E45F17B543438AC02E6C626439E1@NewLife>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<6093E45F17B543438AC02E6C626439E1@NewLife>
Message-ID: <E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>

Mark,

Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this?  Regardless, it's problematic for me to test this out directly, at least for the next few days.  Maybe someone could try it?

Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this).

chris

On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote:

> this issue's come up before, see this thread
> http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
> MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Guohong Hu" <ghhu at sibs.ac.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Monday, January 18, 2010 10:30 AM
> Subject: Re: [Bioperl-l] Bioperl 1.6
> 
> 
>> Guohong,
>> 
>> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first.  Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki:
>> 
>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>> 
>> IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused).  Just curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I can physically remove it to prevent this from happening.
>> 
>> chris
>> 
>> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>> 
>>> Hi there,
>>> 
>>> 
>>> 
>>> I was trying to install BioPerl in windows using ppm, by following the
>>> instruction in
>>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>>> the repositories, and did the search of Bioperl packages. The latest version
>>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>>> install it, a number of prerequisite modules were being installed too, which
>>> include Bioperl 1.4. Then an error message showed up during installation:
>>> 
>>> 
>>> 
>>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>>> BioPerl has already installed a file that package bioperl wants to install."
>>> 
>>> 
>>> 
>>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>>> wanted to install again. I don't know why bioperl 1.4 was one of the
>>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>>> errors. But I need a newer version, because some modules (like
>>> 
>>> Bio::Tools::HMM) is not included in 1.4.
>>> 
>>> 
>>> 
>>> I saw on internet that somebody had the same problem when he was trying to
>>> install BioPerl 1.5, but I didn't find the solution.
>>> 
>>> 
>>> 
>>> Anybody has a clue on that? Thank you for your time.
>>> 
>>> 
>>> 
>>> GH
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From clarsen at vecna.com  Mon Jan 18 12:42:13 2010
From: clarsen at vecna.com (Chris Larsen)
Date: Mon, 18 Jan 2010 12:42:13 -0500
Subject: [Bioperl-l] Reciprocal best blast hits using BioPerl?
In-Reply-To: <B7BD7C2F-4A70-49B5-9074-7EBAF5094AE9@vecna.com>
References: <B0218AEF-3CEB-4E06-B8DF-7B302D024797@vecna.com>
	<B7BD7C2F-4A70-49B5-9074-7EBAF5094AE9@vecna.com>
Message-ID: <ED172CDA-A8C3-4488-9648-1FBA7036BAD6@vecna.com>

Bhakti, (and Chris, Mark)--

Yes there is some perl available to parse reciprocal best blast hits.

Mark's referenced / archived post was mine, we were looking to do what  
you wanted. Here we proceed with the thread.

We ended up implementing OrthoMCL 1.4 as Chris F pointed to, and then  
made a simple perl parser that would take the raw OrthoMCL output, do  
splits, and spit out a delimited table of all the orthologs in a  
group, for say Mycobacterium Genus, so you could stuff it into DBLoader.

The link to the script, SOP, and method is at:
http://www.biohealthbase.org/brcDocs/documents/BHB_ORTHOLOG_SOP.pdf

Giving e.g.:

Francisella 1 110321310
Francisella 1 110321361
Francisella 1 56707275
Francisella 1 56707366
Francisella 1 56707462

Five members of Ortholog Group 1, with just their gi number.  And you  
can see the results of that parsing, supported by a database, being  
used to load BioHealthbase with all the reciprocal best blast hits  
plus other OrthoMCL parsing, for mycobacterial PolA at:

http://www.biohealthbase.org/brc/details.do?locus=MAV_3155&decorator=mycobacterium

See? Pretty? We were just interested in making ortholog groups on the  
bais of paralog-conscious reciprocal blast stuff. Like you. This  
package and doc I've made does what you want I think, as long as you  
stay in prokaryotes. But--careful...garbage in, garbage out. We  
started with clean Genuses. (. o O Genii?). You'll get more junky HUGE  
and TINY ortholog groups if you put in different Orders of microbes.  
Its taxa sensitive. OrthoMCL author David Roos is great at it though  
and designed it in mind of higher unicellular euks too...comb the docs  
for that; sorry I was doing bacterial work at the time and cant guide  
you if thats what you want.. If you end up installing OrthMCL 1.4, you  
can pipe the output to this method and get out useable stuff.

Hope it works for you.

Cheers,

Chris L

-- 

Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525


From maj at fortinbras.us  Mon Jan 18 14:37:43 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 14:37:43 -0500
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<6093E45F17B543438AC02E6C626439E1@NewLife>
	<E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>
Message-ID: <61F331117B7C4E2282684FA240B9710F@NewLife>

I will play around with it-- in the meantime, Guohong, please look at the 
following
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
where there is a workaround for this issue, using the ppm-shell--
cheers,
Mark
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Guohong Hu" <ghhu at sibs.ac.cn>; <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 12:18 PM
Subject: Re: [Bioperl-l] Bioperl 1.6


Mark,

Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing 
this?  Regardless, it's problematic for me to test this out directly, at least 
for the next few days.  Maybe someone could try it?

Also, there is the Strawberry Perl alternative, which uses CPAN (I think 
ActiveState also supports this).

chris

On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote:

> this issue's come up before, see this thread
> http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
> MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Guohong Hu" <ghhu at sibs.ac.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Monday, January 18, 2010 10:30 AM
> Subject: Re: [Bioperl-l] Bioperl 1.6
>
>
>> Guohong,
>>
>> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed 
>> first.  Make sure the repos are set according to the Windows installation 
>> instructions on the BioPerl wiki:
>>
>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>>
>> IIRC the actual order of the PPM repository can be critical (PPM pulls based 
>> on highest version, first repo, but sometimes it gets confused).  Just 
>> curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I 
>> can physically remove it to prevent this from happening.
>>
>> chris
>>
>> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>>
>>> Hi there,
>>>
>>>
>>>
>>> I was trying to install BioPerl in windows using ppm, by following the
>>> instruction in
>>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>>> the repositories, and did the search of Bioperl packages. The latest version
>>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>>> install it, a number of prerequisite modules were being installed too, which
>>> include Bioperl 1.4. Then an error message showed up during installation:
>>>
>>>
>>>
>>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>>> BioPerl has already installed a file that package bioperl wants to install."
>>>
>>>
>>>
>>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>>> wanted to install again. I don't know why bioperl 1.4 was one of the
>>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>>> errors. But I need a newer version, because some modules (like
>>>
>>> Bio::Tools::HMM) is not included in 1.4.
>>>
>>>
>>>
>>> I saw on internet that somebody had the same problem when he was trying to
>>> install BioPerl 1.5, but I didn't find the solution.
>>>
>>>
>>>
>>> Anybody has a clue on that? Thank you for your time.
>>>
>>>
>>>
>>> GH
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Mon Jan 18 15:24:33 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Jan 2010 12:24:33 -0800
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
	<deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
	<B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>
Message-ID: <68DF70A5-63A6-428D-A7F1-7B3D01528375@bioperl.org>


On Jan 18, 2010, at 8:12 AM, Chris Fields wrote:

> (my small rant on this)
>
> On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote:
>
>> My comment might be that the problem with OrthoMCL is that it is
>> primarily lower organisms.  The problem with Ensembl (and some other
>> databases) is that it is primarliy higher organisms (though they do
>> include Drosophila, C. elegans and Yeast).
>
> OrthoMCL v2 handles both lower and higher organism; I've used it for  
> both, with decent success.  Most other ortholog tools do as well (if  
> I'm not mistaken, ensembl also uses MCL under the hood, unless  
> that's changed).  I don't believe one should be completely bound to  
> one toolset, particularly in this case (there are lots of nice  
> ortholog clustering tools using various moeans of comparison out  
> there), but I do think OrthoMCL is very good as an initial pass.  If  
> anything, I would like a set of (possibly bioperl-based, definitely  
> DB-based) modules that can deal with this information.
>
> The more imperative issue in my opinion is that one is prisoner to  
> the gene models for those specific organisms of interest, and this  
> may vary widely depending on the source of those gene models  
> (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc).  For  
> instance, if gene models are poorly curated or rarely updated, the  
> comparisons may be significantly flawed.  Some of these issues may  
> also be (somewhat) alleviated once more transcriptome data is  
> available that helps clear up gene model ambiguities, but that won't  
> be true for all organisms, at least initially.
>
> Note this isn't meant as a slam on any specific DBs or MODs in  
> general, the problem is one born of the fact that there isn't a  
> single, centralized, trusted, consistently updated source for this  
> data, specifically something that will handle moderated third-party  
> annotation.  That's a very difficult problem to solve effectively.   
> Some of these very issues crept up at the GMOD conference, and there  
> appears to be consensus that a real attempt is needed to address this.
>
> I don't know, maybe it's just unicorns and rainbows.  Personally I  
> do think the situation will improve, as there seems to be great  
> demand for it, but it requires time, resources, manpower, money, cat  
> herding, etc.
>
>> The problem arises when one wants to cross those boundaries.  For
>> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
>> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
>> tRNAs, and the fundamental biochemistry (EC) proteins are homologous
>> all the way from the most ancient bacteria through H. sapiens.  The
>> only way to play in the mixed arena of prokaryotes and eukaryotes
>> involving fundamental vectors in evolution is to either construct  
>> ones
>> own databases (which presumably means getting involved with MySQL,  
>> and
>> probably spending some $$$ on hardware) or to develop some BioPerl
>> modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
>> using some part of the cloud.  This problem isn't going to get  
>> smaller
>> its only going to get larger, now that the cost of sequencing
>> (pseudo-resequencing) a vertebrate genome is starting to come in  
>> under
>> $10,000 and people are starting to seriously talk about 10,000
>> vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
>> people are going to undertake very soon.
>>
>> Robert
>
> They're already undertaking it now using a broad range of organisms,  
> in and out of the cloud.  In most cases one can amend a prior recip.  
> comparative analysis with new data fairly easily, if one takes care  
> to do so early on (i.e. set up the BLAST databases with a specified  
> defined size for comparative stats between separate analyses).   
> OrthoMCL v2 describes a procedure to do this, and I believe others  
> have similar methodology.
>
> I could also see possible ways one can further optimize this, for  
> instance in cases where two very closely-related organisms are  
> compared, where translated seqs are 100% identical, etc.  IIRC, the  
> OrthoMCL DB site already has a way to upload custom sets of protein  
> data for mapping to (already pre-run) clusters.  Just the fact that  
> the tools are available as OS, they're semi-automated, and can be  
> generically applied to data of personal interest is a great boon.   
> Not sure I see the downside of that, and I'm pretty confident the  
> scalability issues will be addressed in some way.


I think that the approach that Paul Thomas's group at SRI http://www.ai.sri.com/esb/ 
  is doing is really what you'd want to focus on if you are only  
interested in a particular set of gene families rather than de novo  
clustering. That or the PhyloFacts approach http://phylogenomics.berkeley.edu/phylofacts/ 
  .  That is where HMMs are more appropriate, focusing on your initial  
seed set of families of proteins. HMMs for your families with some  
automated clustering initially to get better resolution.  Once you  
start throwing multiple 10^6 proteins  the unsupervised clustering  
approach may not be able to give as accurate or timely results but can  
be a good initial filtering step depending on how much initial  
knowledge you are starting with. Using HMM models won't be as  
computationally expensive either if you are compute limited.

TreeFam is also providing curated phylogenies of gene families http://www.treefam.org/ 
  that span the optisthokonts in that a few fungi are sprinkled in.

Also things like http://boinc.bio.wzw.tum.de/boincsimap/ provide ways  
to use distributed computing to calculate the matrix of similarities  
among proteins if you are interested in the exhaustive approach.

-jason

>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jay at jays.net  Mon Jan 18 18:36:20 2010
From: jay at jays.net (Jay Hannah)
Date: Mon, 18 Jan 2010 17:36:20 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <9AA13F94-3336-4CC1-89C4-249D0EB7C857@jays.net>

On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote:
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?

If all the advice and resources in this thread have not dissuaded you from writing your own, you could glance at cross_blast() here as reference:

   https://clabsvn.ist.unomaha.edu/anonsvn/user/jhannah/UNO/seqlab/seqlab/tutorial.pod

About the (abandoned) project:

   http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29

I wrote that in 2006 for clustering a few hundred proteins based on custom criteria.

Cheers,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From jay at jays.net  Mon Jan 18 19:22:48 2010
From: jay at jays.net (Jay Hannah)
Date: Mon, 18 Jan 2010 18:22:48 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
Message-ID: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>

I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do.

   http://github.com/jhannah/bio-broodcomb

It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. 

The first two functions I stuck in the framework:

Find subsequences (Bio::BroodComb::SubSeq):

   use Bio::BroodComb;
   my $bc = Bio::BroodComb->new();
   $bc->load_large_seq(file => "large_seq.fasta");
   $bc->load_small_seq(file => "small_seq.fasta");
   $bc->find_subseqs();
   print $bc->subseq_report1;

In-silico PCR (Bio::BroodComb::PCR):

  use Bio::BroodComb;
  my $bc = Bio::BroodComb->new();
  $bc->load_large_seq(file => "large_seq.fasta");
  $bc->add_primerset(
     description    => "U5/R",   # however you want it reported
     forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA',
     reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT',
  );
  $bc->find_pcr_hits();
  $bc->find_pcr_products();
  print $bc->pcr_report1;

I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever.

Suggestions, contributions welcome.   :)

   http://github.com/jhannah/bio-broodcomb

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From ocornejo at gmail.com  Mon Jan 18 19:46:10 2010
From: ocornejo at gmail.com (Omar Cornejo)
Date: Mon, 18 Jan 2010 16:46:10 -0800 (PST)
Subject: [Bioperl-l] installing bioperl for mac
Message-ID: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>

Dear People,
  I have tried to install Bioperl in my new Mac Book, which carries
the latest perl distribution (5.10.0) and for some reason I can't
(using fink) make it recognize this version or perl.
  I have tried:
fink install bioperl-pm510
fink install bioperl-pm5100

but neither one works.  Is it fine installing bioperl for perl v 5.9?

thank you,
Omar Cornejo


From jason at bioperl.org  Mon Jan 18 20:04:31 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Jan 2010 17:04:31 -0800
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <4B5502D9.2010706@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
	<4B5502D9.2010706@gmail.com>
Message-ID: <F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>

Alexandr -

Thanks for getting back to us - I am guessing the parser needs to  
recognize negative coordinates around about line 370 in Bio/AlignIO/ 
Handler/GenericAlignHandler.pm which assumes a split on '-' will be  
sufficient.

Can you post it as a bug to bugzilla along with attaching a record and  
script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/

-jason
On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote:

> I have contacted Pfam, and I have been told that The PDB file actually
> does include a reference to residue "-1":
>
> DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>
> DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>
>
> Since negative numbers are allowed in PDB, the data should probably be
> considered valid.
>
> There are quite a few records like this, so this is not an isolated  
> issue.
>
> Alexandr
>
> On 1/14/2010 7:20 PM, Jason Stajich wrote:
>> Seems like improper data really -- "-1" is an improper coordinate  
>> as far
>> as the parser is concerned. You may want to tell Pfam that there is
>> possible error in the dumper since that was the only record that had
>> this problem?
>>
>> -jason
>> On Jan 13, 2010, at 5:57 PM, albezg wrote:
>>
>>> Hi all,
>>>
>>> I have a problem using AlignIO to read Pfam database:
>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>>> alignment OK until the alignment PF00331.13. There it crashes with  
>>> the
>>> following message:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: '1-344' is not an integer.
>>>
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>>> STACK: Bio::Range::end
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>>> STACK: Bio::Annotation::Target::new
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
>>> GenericAlignHandler.pm:293
>>>
>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
>>> GenericAlignHandler.pm:73
>>>
>>> STACK: Bio::AlignIO::stockholm::next_aln
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>>> -----------------------------------------------------------
>>>
>>> It appears this is caused by this entry:
>>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>>
>>> I don't care about residues in PDB, so I have just removed minus  
>>> signs
>>> from the ranges. This seems to have fixed the crashing.
>>>
>>> Is it a known problem? Is there a solution for it?
>>>
>>> Thanks,
>>> Alexandr
>>>
>>>
>>> On 03/20/2009 05:09 PM, albezg wrote:
>>>>
>>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>>> alignment(SimpleAlign).
>>>>
>>>> There are no issues when I print it, however when I use AlignIO  
>>>> to write
>>>> the alignment to a FASTA file, it does not work. Is this behavior
>>>> intended?
>>>>
>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>>
>>>> The error:
>>>> ------------- EXCEPTION -------------
>>>> MSG: No sequence with name [1/1-11]
>>>> STACK Bio::SimpleAlign::displayname
>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>>> STACK Bio::AlignIO::fasta::write_aln
>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>>> STACK toplevel ./demo.pl:14
>>>> -------------------------------------
>>>>
>>>> Alexandr
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>>
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From cjfields at illinois.edu  Mon Jan 18 21:19:30 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 20:19:30 -0600
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
	<4B5502D9.2010706@gmail.com>
	<F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>
Message-ID: <46FD172A-69C0-436C-A005-AC38668C3347@illinois.edu>

Alexandr,

Posting the bug report would be great, should be an easy enough fix.

chris

On Jan 18, 2010, at 7:04 PM, Jason Stajich wrote:

> Alexandr -
> 
> Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient.
> 
> Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/
> 
> -jason
> On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote:
> 
>> I have contacted Pfam, and I have been told that The PDB file actually
>> does include a reference to residue "-1":
>> 
>> DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>> 
>> DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>> 
>> 
>> Since negative numbers are allowed in PDB, the data should probably be
>> considered valid.
>> 
>> There are quite a few records like this, so this is not an isolated issue.
>> 
>> Alexandr
>> 
>> On 1/14/2010 7:20 PM, Jason Stajich wrote:
>>> Seems like improper data really -- "-1" is an improper coordinate as far
>>> as the parser is concerned. You may want to tell Pfam that there is
>>> possible error in the dumper since that was the only record that had
>>> this problem?
>>> 
>>> -jason
>>> On Jan 13, 2010, at 5:57 PM, albezg wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have a problem using AlignIO to read Pfam database:
>>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>>>> alignment OK until the alignment PF00331.13. There it crashes with the
>>>> following message:
>>>> 
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: '1-344' is not an integer.
>>>> 
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>>>> STACK: Bio::Range::end
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>>>> STACK: Bio::Annotation::Target::new
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
>>>> 
>>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
>>>> 
>>>> STACK: Bio::AlignIO::stockholm::next_aln
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>>>> -----------------------------------------------------------
>>>> 
>>>> It appears this is caused by this entry:
>>>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>>> 
>>>> I don't care about residues in PDB, so I have just removed minus signs
>>>> from the ranges. This seems to have fixed the crashing.
>>>> 
>>>> Is it a known problem? Is there a solution for it?
>>>> 
>>>> Thanks,
>>>> Alexandr
>>>> 
>>>> 
>>>> On 03/20/2009 05:09 PM, albezg wrote:
>>>>> 
>>>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>>>> alignment(SimpleAlign).
>>>>> 
>>>>> There are no issues when I print it, however when I use AlignIO to write
>>>>> the alignment to a FASTA file, it does not work. Is this behavior
>>>>> intended?
>>>>> 
>>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>>> 
>>>>> The error:
>>>>> ------------- EXCEPTION -------------
>>>>> MSG: No sequence with name [1/1-11]
>>>>> STACK Bio::SimpleAlign::displayname
>>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>>>> STACK Bio::AlignIO::fasta::write_aln
>>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>>>> STACK toplevel ./demo.pl:14
>>>>> -------------------------------------
>>>>> 
>>>>> Alexandr
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> -- 
>>> Jason Stajich
>>> jason.stajich at gmail.com
>>> jason at bioperl.org
>>> http://fungalgenomes.org/
>>> 
>> 
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 18 21:20:31 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 20:20:31 -0600
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
Message-ID: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>

On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:

> Dear People,
>  I have tried to install Bioperl in my new Mac Book, which carries
> the latest perl distribution (5.10.0) and for some reason I can't
> (using fink) make it recognize this version or perl.
>  I have tried:
> fink install bioperl-pm510
> fink install bioperl-pm5100
> 
> but neither one works.  Is it fine installing bioperl for perl v 5.9?
> 
> thank you,
> Omar Cornejo

fink doesn't have a package for perl 5.10.  You can install it using CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX installation instructions on the wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

chris


From dan.kortschak at adelaide.edu.au  Mon Jan 18 21:47:47 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 19 Jan 2010 13:17:47 +1030
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie
 now available BETA
Message-ID: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>

Hi All,

A wrapper and output parser for bowtie 'ultrafast, memory-efficient
short read aligner' are now available in the bioperl-live and
bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
bioperl-run/trunk at 16726). Bowtie details are available here:

http://bowtie-bio.sourceforge.net/index.shtml

The modules can return a Bio::Assembly::Scaffold object (operating via
the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
uses large amounts of memory - the test suite works for me with >=2GB
but not with 1GB due to this. (Is there a disk file system based tool
for this for large projects?)

Bowtie (>0.12.0) can align in colour space, but this is not currently
supported by the wrapper though it should not be difficult to add. If
someone can point me to a small set of colour space reads and a
reference sequence I will be able to use these for testing.

Thanks to the core devs for helping me with many of my problems in
putting this together.

Dan


From maj at fortinbras.us  Mon Jan 18 22:31:36 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 22:31:36 -0500
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
	Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <D26A5B3DAFDA4068863C7735BAF7894B@NewLife>

Excellent Dan! Thanks for all this work-- MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 9:47 PM
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now 
available BETA


> Hi All,
>
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
>
> http://bowtie-bio.sourceforge.net/index.shtml
>
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
>
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
>
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
>
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jan 18 22:36:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 21:36:12 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
	Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <CD36CE88-DC05-4A17-86A7-17A85C14F67A@illinois.edu>

On Jan 18, 2010, at 8:47 PM, Dan Kortschak wrote:

> Hi All,
> 
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
> 
> http://bowtie-bio.sourceforge.net/index.shtml
> 
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
> 
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
> 
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
> 
> Dan

And (on behalf of the core devs) thank you for putting this together!

chris


From scott at scottcain.net  Mon Jan 18 22:41:43 2010
From: scott at scottcain.net (Scott Cain)
Date: Mon, 18 Jan 2010 22:41:43 -0500
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
Message-ID: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>

But make sure you have the developers tools installed before the first
time you run the cpan shell; it will make your life easier.

Scott


On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
>
>> Dear People,
>> ?I have tried to install Bioperl in my new Mac Book, which carries
>> the latest perl distribution (5.10.0) and for some reason I can't
>> (using fink) make it recognize this version or perl.
>> ?I have tried:
>> fink install bioperl-pm510
>> fink install bioperl-pm5100
>>
>> but neither one works. ?Is it fine installing bioperl for perl v 5.9?
>>
>> thank you,
>> Omar Cornejo
>
> fink doesn't have a package for perl 5.10. ?You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. ?See the UNIX installation instructions on the wiki:
>
> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Mon Jan 18 23:04:57 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 22:04:57 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <009801c8b957$2af4f8d0$80deea70$@ac.cn>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<009801c8b957$2af4f8d0$80deea70$@ac.cn>
Message-ID: <79D53148-1FDA-4025-99A6-77A7F124E6BD@illinois.edu>

Hmm, the trouchelle repo is the only one that had a working DB_File for perl 5.10 (not sure but I think 5.8.9 was fine).  Probably worth contacting them about this to see if they can drop the (way out-of-date) 1.4 distribution.

chris

On May 18, 2008, at 9:22 PM, Guohong Hu wrote:

> Thank for you all. The problem is solved. The bioperl 1.4 version is from
> the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I
> added all the repo according to the bioperl wiki instruction, somehow 1.4
> became a prerequisite for 1.6. But Chris's question reminded me, so I
> removed Trouchelle repo, and the installation proceeded without errors. I
> suggested we put a note in the wiki link since it looks like an odd issue
> not just for me.
> 
> Best,
> Guohong
> 
> 
> 
> _________________________________________
> ??????: Chris Fields [mailto:cjfields at illinois.edu] 
> ????????: 2010??1??18?? 23:30
> ??????: Guohong Hu
> ????: bioperl-l at lists.open-bio.org
> ????: Re: [Bioperl-l] Bioperl 1.6
> 
> Guohong, 
> 
> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed
> first.  Make sure the repos are set according to the Windows installation
> instructions on the BioPerl wiki:
> 
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> 
> IIRC the actual order of the PPM repository can be critical (PPM pulls based
> on highest version, first repo, but sometimes it gets confused).  Just
> curious but where is the v 1.4 PPM located?  If it is local to our PPM repo
> I can physically remove it to prevent this from happening.
> 
> chris
> 
> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
> 
>> Hi there,
>> 
>> 
>> 
>> I was trying to install BioPerl in windows using ppm, by following the
>> instruction in
>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>> the repositories, and did the search of Bioperl packages. The latest
> version
>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>> install it, a number of prerequisite modules were being installed too,
> which
>> include Bioperl 1.4. Then an error message showed up during installation:
>> 
>> 
>> 
>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>> BioPerl has already installed a file that package bioperl wants to
> install."
>> 
>> 
>> 
>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>> wanted to install again. I don't know why bioperl 1.4 was one of the
>> prerequisites for 1.6.1. If I just install 1.4, it will be installed
> without
>> errors. But I need a newer version, because some modules (like
>> 
>> Bio::Tools::HMM) is not included in 1.4.
>> 
>> 
>> 
>> I saw on internet that somebody had the same problem when he was trying to
>> install BioPerl 1.5, but I didn't find the solution.
>> 
>> 
>> 
>> Anybody has a clue on that? Thank you for your time.
>> 
>> 
>> 
>> GH
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From ocornejo at gmail.com  Mon Jan 18 23:18:00 2010
From: ocornejo at gmail.com (Omar Eduardo Cornejo Ordaz)
Date: Mon, 18 Jan 2010 23:18:00 -0500
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
	<4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
	<5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>
Message-ID: <ddd346a41001182018o5952415fx7930d85a9430453@mail.gmail.com>

I see.
  thank you Scott and Chris.
  I had already installed the latest version of the Xcode Developer Tools.
  I will go the cpan way then.

have a nice one,
Omar

On Mon, Jan 18, 2010 at 10:58 PM, Chris Fields <cjfields at illinois.edu>wrote:

> Yes, definitely!
>
> -c
>
> On Jan 18, 2010, at 9:41 PM, Scott Cain wrote:
>
> > But make sure you have the developers tools installed before the first
> > time you run the cpan shell; it will make your life easier.
> >
> > Scott
> >
> >
> > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
> >>
> >>> Dear People,
> >>>  I have tried to install Bioperl in my new Mac Book, which carries
> >>> the latest perl distribution (5.10.0) and for some reason I can't
> >>> (using fink) make it recognize this version or perl.
> >>>  I have tried:
> >>> fink install bioperl-pm510
> >>> fink install bioperl-pm5100
> >>>
> >>> but neither one works.  Is it fine installing bioperl for perl v 5.9?
> >>>
> >>> thank you,
> >>> Omar Cornejo
> >>
> >> fink doesn't have a package for perl 5.10.  You can install it using
> CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX
> installation instructions on the wiki:
> >>
> >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
> >>
> >> chris
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Mon Jan 18 22:58:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 21:58:36 -0600
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
	<4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
Message-ID: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>

Yes, definitely!

-c

On Jan 18, 2010, at 9:41 PM, Scott Cain wrote:

> But make sure you have the developers tools installed before the first
> time you run the cpan shell; it will make your life easier.
> 
> Scott
> 
> 
> On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
>> 
>>> Dear People,
>>>  I have tried to install Bioperl in my new Mac Book, which carries
>>> the latest perl distribution (5.10.0) and for some reason I can't
>>> (using fink) make it recognize this version or perl.
>>>  I have tried:
>>> fink install bioperl-pm510
>>> fink install bioperl-pm5100
>>> 
>>> but neither one works.  Is it fine installing bioperl for perl v 5.9?
>>> 
>>> thank you,
>>> Omar Cornejo
>> 
>> fink doesn't have a package for perl 5.10.  You can install it using CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX installation instructions on the wiki:
>> 
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From albezg at gmail.com  Mon Jan 18 19:54:49 2010
From: albezg at gmail.com (Alexandr Bezginov)
Date: Mon, 18 Jan 2010 19:54:49 -0500
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
 with negative PDB ranges
In-Reply-To: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
Message-ID: <4B5502D9.2010706@gmail.com>

I have contacted Pfam, and I have been told that The PDB file actually
does include a reference to residue "-1":

DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611

DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611


Since negative numbers are allowed in PDB, the data should probably be
considered valid.

There are quite a few records like this, so this is not an isolated issue.

Alexandr

On 1/14/2010 7:20 PM, Jason Stajich wrote:
> Seems like improper data really -- "-1" is an improper coordinate as far
> as the parser is concerned. You may want to tell Pfam that there is
> possible error in the dumper since that was the only record that had
> this problem?
> 
> -jason
> On Jan 13, 2010, at 5:57 PM, albezg wrote:
> 
>> Hi all,
>>
>> I have a problem using AlignIO to read Pfam database:
>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>> alignment OK until the alignment PF00331.13. There it crashes with the
>> following message:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: '1-344' is not an integer.
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>> STACK: Bio::Range::end
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>> STACK: Bio::Annotation::Target::new
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
>>
>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
>>
>> STACK: Bio::AlignIO::stockholm::next_aln
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>> -----------------------------------------------------------
>>
>> It appears this is caused by this entry:
>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>
>> I don't care about residues in PDB, so I have just removed minus signs
>> from the ranges. This seems to have fixed the crashing.
>>
>> Is it a known problem? Is there a solution for it?
>>
>> Thanks,
>> Alexandr
>>
>>
>> On 03/20/2009 05:09 PM, albezg wrote:
>>>
>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>> alignment(SimpleAlign).
>>>
>>> There are no issues when I print it, however when I use AlignIO to write
>>> the alignment to a FASTA file, it does not work. Is this behavior
>>> intended?
>>>
>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>
>>> The error:
>>> ------------- EXCEPTION -------------
>>> MSG: No sequence with name [1/1-11]
>>> STACK Bio::SimpleAlign::displayname
>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>> STACK Bio::AlignIO::fasta::write_aln
>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>> STACK toplevel ./demo.pl:14
>>> -------------------------------------
>>>
>>> Alexandr
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 


From ghhu at sibs.ac.cn  Mon Jan 18 21:22:19 2010
From: ghhu at sibs.ac.cn (Guohong Hu)
Date: Tue, 19 Jan 2010 02:22:19 -0000
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
Message-ID: <009801c8b957$2af4f8d0$80deea70$@ac.cn>

Thank for you all. The problem is solved. The bioperl 1.4 version is from
the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I
added all the repo according to the bioperl wiki instruction, somehow 1.4
became a prerequisite for 1.6. But Chris's question reminded me, so I
removed Trouchelle repo, and the installation proceeded without errors. I
suggested we put a note in the wiki link since it looks like an odd issue
not just for me.

Best,
Guohong


_________________________________________
??????: Chris Fields [mailto:cjfields at illinois.edu] 
????????: 2010??1??18?? 23:30
??????: Guohong Hu
????: bioperl-l at lists.open-bio.org
????: Re: [Bioperl-l] Bioperl 1.6

Guohong, 

1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed
first.  Make sure the repos are set according to the Windows installation
instructions on the BioPerl wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

IIRC the actual order of the PPM repository can be critical (PPM pulls based
on highest version, first repo, but sometimes it gets confused).  Just
curious but where is the v 1.4 PPM located?  If it is local to our PPM repo
I can physically remove it to prevent this from happening.

chris

On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:

> Hi there,
> 
> 
> 
> I was trying to install BioPerl in windows using ppm, by following the
> instruction in
> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
> the repositories, and did the search of Bioperl packages. The latest
version
> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
> install it, a number of prerequisite modules were being installed too,
which
> include Bioperl 1.4. Then an error message showed up during installation:
> 
> 
> 
> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
> BioPerl has already installed a file that package bioperl wants to
install."
> 
> 
> 
> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
> wanted to install again. I don't know why bioperl 1.4 was one of the
> prerequisites for 1.6.1. If I just install 1.4, it will be installed
without
> errors. But I need a newer version, because some modules (like
> 
> Bio::Tools::HMM) is not included in 1.4.
> 
> 
> 
> I saw on internet that somebody had the same problem when he was trying to
> install BioPerl 1.5, but I didn't find the solution.
> 
> 
> 
> Anybody has a clue on that? Thank you for your time.
> 
> 
> 
> GH
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jw12 at sanger.ac.uk  Tue Jan 19 05:41:12 2010
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 19 Jan 2010 10:41:12 +0000
Subject: [Bioperl-l] DAS Workshop Registrations now Open (workshop date 7-9
	April 2010)
Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk>

If you don't know about DAS and wish to know how to distribute your  
latest biological annotation to the world then the upcoming DAS  
workshop maybe for you.
If you know about DAS and are maybe a DAS client developer then the  
upcoming DAS workshop is for you (as you will need to know about the  
upcoming DAS 1.6 Specification and how it may affect your software).

For information on the workshop and registration please go to:

http://www.ebi.ac.uk/training/handson/DAS_070410.html


Jonathan Warren
Senior Developer and DAS coordinator
jw12 at sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From SMarkel at accelrys.com  Tue Jan 19 13:00:22 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Tue, 19 Jan 2010 10:00:22 -0800
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
 Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>

Dan,

Life Tech has sample data for E. coli at

http://solidsoftwaretools.com/gf/project/ecoli2x50/

and

http://solidsoftwaretools.com/gf/project/dh10bfrag/.

Reference sequences are included.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
Sent: Monday, 18 January 2010 6:48 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA

Hi All,

A wrapper and output parser for bowtie 'ultrafast, memory-efficient
short read aligner' are now available in the bioperl-live and
bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
bioperl-run/trunk at 16726). Bowtie details are available here:

http://bowtie-bio.sourceforge.net/index.shtml

The modules can return a Bio::Assembly::Scaffold object (operating via
the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
uses large amounts of memory - the test suite works for me with >=2GB
but not with 1GB due to this. (Is there a disk file system based tool
for this for large projects?)

Bowtie (>0.12.0) can align in colour space, but this is not currently
supported by the wrapper though it should not be difficult to add. If
someone can point me to a small set of colour space reads and a
reference sequence I will be able to use these for testing.

Thanks to the core devs for helping me with many of my problems in
putting this together.

Dan

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Tue Jan 19 16:18:20 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 07:48:20 +1030
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
 Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
	<5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>
Message-ID: <1263935900.4813.0.camel@epistle>

Great.

Thanks, Scott.

Dan

On Tue, 2010-01-19 at 10:00 -0800, Scott Markel wrote:
> Dan,
> 
> Life Tech has sample data for E. coli at
> 
> http://solidsoftwaretools.com/gf/project/ecoli2x50/
> 
> and
> 
> http://solidsoftwaretools.com/gf/project/dh10bfrag/.
> 
> Reference sequences are included.
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>     International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
> Sent: Monday, 18 January 2010 6:48 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA
> 
> Hi All,
> 
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
> 
> http://bowtie-bio.sourceforge.net/index.shtml
> 
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
> 
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
> 
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
> 
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Wed Jan 20 00:32:05 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 16:02:05 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
Message-ID: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>

Hi Chris (or others),

I've been looking at ways to do large assemblies (really rnaseq/readseq
comparisons for coverage) with maq/bowtie output and it's clear that for
the size of project that I'm working on the space complexity is too
nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
go.

I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF

This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
read through the docs, and it's not entirely clear (I'm hoping I've
interpreted it the right way), but does this result in the return of
features such that overlapping features are returned as a single feature
while non-overlapping features come back separately. If this is the
case, it would satisfy my requirements perfectly.

thanks for your time
Dan


From jason at bioperl.org  Wed Jan 20 01:35:24 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 19 Jan 2010 22:35:24 -0800
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>

Are you looking at the bowtie features file or the SAM?
-jason
On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote:

> Hi Chris (or others),
>
> I've been looking at ways to do large assemblies (really rnaseq/ 
> readseq
> comparisons for coverage) with maq/bowtie output and it's clear that  
> for
> the size of project that I'm working on the space complexity is too
> nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> go.
>
> I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF ->  
> B:DB:GFF
>
> This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> read through the docs, and it's not entirely clear (I'm hoping I've
> interpreted it the right way), but does this result in the return of
> features such that overlapping features are returned as a single  
> feature
> while non-overlapping features come back separately. If this is the
> case, it would satisfy my requirements perfectly.
>
> thanks for your time
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From dan.kortschak at adelaide.edu.au  Wed Jan 20 02:19:05 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 17:49:05 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
	<C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>
Message-ID: <1263971945.4582.2.camel@epistle>

It doesn't really matter, they are largely inter-convertible. The
problem is not really the upstream processing, but the aggregation of
reads into read-assigned regions (unless I've misunderstood your
question).

Dan

On Tue, 2010-01-19 at 22:35 -0800, Jason Stajich wrote:
> Are you looking at the bowtie features file or the SAM?
> -jason
> On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote:
> 
> > Hi Chris (or others),
> >
> > I've been looking at ways to do large assemblies (really rnaseq/ 
> > readseq
> > comparisons for coverage) with maq/bowtie output and it's clear that  
> > for
> > the size of project that I'm working on the space complexity is too
> > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> > go.
> >
> > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF ->  
> > B:DB:GFF
> >
> > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> > read through the docs, and it's not entirely clear (I'm hoping I've
> > interpreted it the right way), but does this result in the return of
> > features such that overlapping features are returned as a single  
> > feature
> > while non-overlapping features come back separately. If this is the
> > case, it would satisfy my requirements perfectly.
> >
> > thanks for your time
> > Dan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/

-- 
Dan Kortschak <dan.kortschak at adelaide.edu.au>


From ajmackey at gmail.com  Wed Jan 20 07:59:38 2010
From: ajmackey at gmail.com (Aaron Mackey)
Date: Wed, 20 Jan 2010 07:59:38 -0500
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>

I would advise using BEDtools or the R IRanges package for this kind of
aggregation/merging work, rather than trying to reinvent this particular
wheel.

-Aaron

On Wed, Jan 20, 2010 at 12:32 AM, Dan Kortschak <
dan.kortschak at adelaide.edu.au> wrote:

> Hi Chris (or others),
>
> I've been looking at ways to do large assemblies (really rnaseq/readseq
> comparisons for coverage) with maq/bowtie output and it's clear that for
> the size of project that I'm working on the space complexity is too
> nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> go.
>
> I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF
>
> This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> read through the docs, and it's not entirely clear (I'm hoping I've
> interpreted it the right way), but does this result in the return of
> features such that overlapping features are returned as a single feature
> while non-overlapping features come back separately. If this is the
> case, it would satisfy my requirements perfectly.
>
> thanks for your time
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From dan.kortschak at adelaide.edu.au  Wed Jan 20 16:16:39 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 21 Jan 2010 07:46:39 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
	<24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>
Message-ID: <1264022199.4688.29.camel@epistle>

Thanks for that, I'll look into those. BEDtools looks like what I want.

cheers
Dan

On Wed, 2010-01-20 at 07:59 -0500, Aaron Mackey wrote:
> I would advise using BEDtools or the R IRanges package for this kind
> of aggregation/merging work, rather than trying to reinvent this
> particular wheel.
> 
> -Aaron


From biopython at maubp.freeserve.co.uk  Thu Jan 21 07:33:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 21 Jan 2010 12:33:53 +0000
Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in
	BioSQL
Message-ID: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>

Hi all,

This is cross posted to try and ensure relevant people see it.
I suggest we continue the discussion on the BioSQL list
(for how to serialise structured annotation to BioSQL), and/or
the OpenBio list (for things like file format naming conventions).

I am hoping we (Bio*) can be consistent in how we parse and load
into BioSQL the SwissProt DE lines (known as "swiss" format in
both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
equivalent UniProt XML tags (which we are tentatively going to
call the "uniprot" format in Biopython's SeqIO - comments?).

Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
files and load them into BioSQL. Biopython currently treats the DE
comment lines as a long string, as BioPerl used to:

http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html

I understand that BioPerl now turns the SwissProt DE lines into a
TagTree, and for storing this in BioSQL this gets serialised as XML.
I would like Biopython to handle this the same way (although rather
than a Perl TagTree, we'd use a Python structure of course), and
would appreciate clarification of what exactly was implemented
(e.g. which bit of the BioPerl source code should be look at,
and could you show a worked example?).

Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
Open-Bio lists yet) has started work on parsing UniProt XML
files for Biopython. Here the DE comment lines are already
provided broken up with XML markup. Hopefully their nested
structure matches what BioPerl was doing with the SwissProt
DE lines.

Regards,

Peter


From cjfields at illinois.edu  Thu Jan 21 08:34:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Jan 2010 07:34:12 -0600
Subject: [Bioperl-l] [Open-bio-l] SwissProt DE lines and UniProt XML /
	TagTree as XML in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <A6F5F623-2750-4BB0-91F7-5A87BABE367B@illinois.edu>

Peter,

The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag:

http://search.cpan.org/~cmungall/Data-Stag-0.11/Data/Stag.pm

This is where the text output is derived from.  It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable.  We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback. 

If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.).  

chris

On Jan 21, 2010, at 6:33 AM, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter


From sharmashalu.bio at gmail.com  Thu Jan 21 09:25:44 2010
From: sharmashalu.bio at gmail.com (shalu sharma)
Date: Thu, 21 Jan 2010 09:25:44 -0500
Subject: [Bioperl-l] sequence orientation
Message-ID: <465b5a661001210625j3d84a165u69d8c8d21d2fe7ac@mail.gmail.com>

Hi All,
         This is not a perl/bioperl query but i thought that its a best
place to ask.
I have some pyro reads ( from CAMERA) and i want to find out their 5' and 3'
ends. Is there any way i can do this?

I would really appreciate if anyone can help me out.

Thanks
Shalu


From rtbio.2009 at gmail.com  Thu Jan 21 13:28:43 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Thu, 21 Jan 2010 19:28:43 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <196889DF87964224ACDB948681BA7F86@NewLife>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<4C2E8133F916495B876628EF3E8FCBB2@NewLife>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
Message-ID: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>

Hello Mark,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");

while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;
   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;
              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen <maj at fortinbras.us> wrote:

>  Excellent Roopa- it's my pleasure-- MAJ
>
> ----- Original Message -----
> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
> *To:* Mark A. Jensen <maj at fortinbras.us>
> *Sent:* Saturday, January 09, 2010 6:41 PM
> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>
> Hi Mark,
>
> Thank you very very much. The code is working now. Thanks for the support
> and time you have spent on me.
>
> Thanks in advance
> Roopa.
>
> On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>>  There is still a bug with the double quotes. Use "$organ\[ORGN]", which
>> prevents perl from
>> looking for a member of an array called @organ. This would have shown up
>> if 'use strict;' had
>> been in place. Still don't know whether this would work precisely; can you
>> send me the query
>> sequence so I can reproduce your ouput?
>> thanks MAJ
>>
>>  ----- Original Message -----
>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>> *Sent:* Saturday, January 09, 2010 2:02 PM
>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>
>> Hi Mark,
>>
>> I tried it with double quotes but still i got the same o/p with sequences
>> from different species.
>>
>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...  1813
>> 0.0
>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...  1622
>> 0.0
>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...   773
>> 0.0
>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...   749
>> 0.0
>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   542
>> 2e-151
>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...   196
>> 3e-47
>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...   190
>> 1e-45
>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...   181
>> 7e-43
>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...   179
>> 2e-42
>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   178
>> 8e-42
>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...   170
>> 1e-39
>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...   169
>> 4e-39
>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...   167
>> 1e-38
>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...   150
>> 1e-33
>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...   145
>> 5e-32
>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...   143
>> 2e-31
>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...   143
>> 2e-31
>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...   138
>> 7e-30
>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...   138
>> 7e-30
>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...   136
>> 2e-29
>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...   136
>> 2e-29
>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...   134
>> 9e-29
>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...   132
>> 3e-28
>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...   132
>> 3e-28
>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...   132
>> 3e-28
>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...   131
>> 1e-27
>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...   131
>> 1e-27
>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...   129
>> 4e-27
>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...   129
>> 4e-27
>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...   129
>> 4e-27
>> ref|NM_001003470.1|  Danio rerio protein kinase, cAMP-dependen...   129
>> 4e-27
>> ref|XM_001141503.1|  PREDICTED: Pan troglodytes verus protein ...   127
>> 1e-26
>> ref|XM_001145269.1|  PREDICTED: Pan troglodytes protein kinase...   127
>> 1e-26
>> ref|XM_512434.2|  PREDICTED: Pan troglodytes cAMP-dependent pr...   127
>> 1e-26
>> ref|XM_001171457.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_001171437.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_847420.1|  PREDICTED: Canis familiaris similar to Serin...   127
>> 1e-26
>> ref|NM_207518.1|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>> ref|NM_002730.3|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>>
>>
>> Thanks in advance.
>>
>> Roopa.
>>
>> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>>  I understand you. Put in the double quotes and see what happens.
>>>
>>>  ----- Original Message -----
>>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>> *To:* Mark A. Jensen <maj at fortinbras.us>
>>>   *Sent:* Saturday, January 09, 2010 1:40 PM
>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>
>>> Hi Mark,
>>>
>>> Thanks for your reply. It was working when I specifically use the name of
>>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a
>>> $organ which takes the organism given by the user i.e., let it be anything
>>>
>>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc.,  I should get the
>>> sequences related to only those organisms.
>>>
>>> i.e., If the user enters Pseudomonas,the $organ parameter of the code
>>> takes Pseudomonas ,does BLAST and returns only those sequences that produce
>>> significant alignment with Pseudomonas(only).But this is not happening like
>>> that .
>>>
>>> Please help me in this regard.
>>>
>>> Thanks in advance
>>> Roopa
>>>
>>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>
>>>>  Hi Roopa-- You may get what you want if you make the change.
>>>> With single quotes, ENTREZ_QUERY is set to the literal string
>>>>
>>>>  $organ[ORGN]
>>>>
>>>> while, with double quotes, the variable value will be substituted,
>>>> and the parameter should be set to
>>>>
>>>>  Trypanosoma brucei[ORGN]
>>>>
>>>> I'm guess that it worked because the database ignored the strange
>>>> parameter,
>>>> and returned all the matches. Try this and if it doesn't work I look
>>>> harder.
>>>> cheers,
>>>> Mark
>>>>
>>>>  ----- Original Message -----
>>>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>>>> *Sent:* Saturday, January 09, 2010 1:24 PM
>>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>>
>>>> hello Mark,
>>>>
>>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in
>>>> double quotations,but. I would like to have only those specific sequences
>>>> which are specific for my Organism i.e., I need sequences only from the
>>>> organism that I entered.
>>>>
>>>> When the organism is Trypanosoma brucei,I could get even Leishmania and
>>>> other species as the similar sequences. But I want to get only trypanosoma
>>>> brucei sequences.
>>>>
>>>> Could you please help me out in this regard?
>>>>
>>>> Roopa.
>>>>
>>>> My output
>>>>
>>>> I/P organism: Trypanosoma brucei
>>>>
>>>> O/P:-
>>>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1813    0.0
>>>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1622    0.0
>>>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 773    0.0
>>>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 749    0.0
>>>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 542    2e-151
>>>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...
>>>> 196    3e-47
>>>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 190    1e-45
>>>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...
>>>> 181    7e-43
>>>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 179    2e-42
>>>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 178    8e-42
>>>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 170    1e-39
>>>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...
>>>> 168    4e-39
>>>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...
>>>> 167    1e-38
>>>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 150    1e-33
>>>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...
>>>> 145    5e-32
>>>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...
>>>> 143    2e-31
>>>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...
>>>> 143    2e-31
>>>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...
>>>> 138    7e-30
>>>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...
>>>> 138    7e-30
>>>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...
>>>> 136    2e-29
>>>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...
>>>> 136    2e-29
>>>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...
>>>> 134    9e-29
>>>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...
>>>> 132    3e-28
>>>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...
>>>> 132    3e-28
>>>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...
>>>> 132    3e-28
>>>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...
>>>> 131    1e-27
>>>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 131    1e-27
>>>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...
>>>> 129    4e-27
>>>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...
>>>> 129    4e-27
>>>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...
>>>> 129    4e-27
>>>>
>>>> Roopa.
>>>>
>>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>>
>>>>> I see it immediately (from making same bug many times) :
>>>>>
>>>>>
>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>> =>
>>>>> - '$organ[ORGN]');
>>>>> +"$organ[ORGN]");
>>>>>
>>>>>
>>>>> MAJ
>>>>>
>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>> rtbio.2009 at gmail.com>
>>>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>>>> Cc: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Saturday, January 09, 2010 11:57 AM
>>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>>
>>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei
>>>>>> as
>>>>>> the organism parameter,but when I tried to use the Organism parameter
>>>>>> from
>>>>>> the user,it was not working i.e., I was unable to get the target
>>>>>> sequences.
>>>>>> Please help me in this regard. My code is
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>>
>>>>>> #path for extra camel module
>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>> use Roopablast;
>>>>>>
>>>>>>
>>>>>> use Bio::SearchIO;
>>>>>> use Bio::Search::Result::BlastResult;
>>>>>> use Bio::Perl;
>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::DB::GenBank;
>>>>>>
>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>> my $outstring ="";
>>>>>>
>>>>>> &parse_form;
>>>>>>
>>>>>> print "Content-type: text/html\n\n";
>>>>>> print "<HTML>\n";
>>>>>> print "<head><title>RNAi Result</title>";
>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>> print "</head>\n";
>>>>>> print "<body>\n";
>>>>>> print " Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>> print "</BODY>\n";
>>>>>> print "</HTML>\n";
>>>>>>
>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>> exit if $pid;
>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>
>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>
>>>>>> print OUTFILE "<HTML>\n
>>>>>> <head><title>RNAi Result</title>
>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>> </head>\n
>>>>>> <body>\n
>>>>>>  Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>> wait......<br>
>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>> </BODY>\n
>>>>>> </HTML>\n";
>>>>>>
>>>>>> close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
>>>>>>
>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>
>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>> $in{'Threshold'});
>>>>>>
>>>>>>
>>>>>> sub blastcode
>>>>>> {
>>>>>>
>>>>>> $inpu1= $_[0];
>>>>>>
>>>>>> $organ= $_[1];
>>>>>>
>>>>>> open(NUC,'>',$nuc);
>>>>>> print NUC $inpu1,"\n";
>>>>>> close(NUC);
>>>>>>
>>>>>> my $prog = 'blastn';
>>>>>> my $db   = 'refseq_rna';
>>>>>> my $e_val= '1e-10';
>>>>>> my $organism= $organ;
>>>>>>
>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>
>>>>>> my @params = ( '-prog' => $prog,
>>>>>>        '-data' => $db,
>>>>>>        '-expect' => $e_val,
>>>>>>        '-readmethod' => 'SearchIO',
>>>>>>       '-Organism'   => $organism );
>>>>>>
>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>            print OUTFILE $inpu1;
>>>>>>             close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>>> =>
>>>>>> '$organ[ORGN]');
>>>>>>
>>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>
>>>>>>  #change a paramter
>>>>>>
>>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>> Brucei[ORGN]';
>>>>>>
>>>>>> #change a paramter
>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>> '$input2[ORGN]';
>>>>>>
>>>>>>  my $v = 1;
>>>>>>  #$v is just to turn on and off the messages
>>>>>>
>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>> '-organism' => $organ );
>>>>>>
>>>>>>
>>>>>> while (my $input = $str->next_seq())
>>>>>> {
>>>>>>  #Blast a sequence against a database:
>>>>>>   #Alternatively, you could  pass in a file with many
>>>>>>   #sequences rather than loop through sequence one at a time
>>>>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>   #and swap the two lines below for an example of that.
>>>>>>
>>>>>>            #open(OUTFILE,'>',$debugfile);
>>>>>>             # print OUTFILE $input;
>>>>>>             #close(OUTFILE);
>>>>>>
>>>>>>
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>
>>>>>>               open(OUTFILE,'>',$debugfile);
>>>>>>            #   print OUTFILE $r;
>>>>>>               close(OUTFILE);
>>>>>>
>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>  #   open(OUTFILE,'>',$debugfile);
>>>>>>   #           print OUTFILE "while entered";
>>>>>>    #         close(OUTFILE);
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>
>>>>>>     #         open(OUTFILE,'>',$debugfile);
>>>>>>      #        print OUTFILE "foreach entered";
>>>>>>       #      close(OUTFILE);
>>>>>>
>>>>>>       my $rc = $factory->retrieve_blast($rid);
>>>>>>
>>>>>>       if( !ref($rc) )
>>>>>>       {
>>>>>>       if( $rc < 0 )
>>>>>>       {
>>>>>>       $factory->remove_rid($rid);
>>>>>>       }
>>>>>>        open(OUTFILE,'>',$debugfile);
>>>>>>        #      print OUTFILE "if entered";
>>>>>>             close(OUTFILE);
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>       }
>>>>>>      else {
>>>>>>         #    open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "else entered";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>         my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>         print BLASTDEBUGFILE $result->next_hit();
>>>>>>         close(BLASTDEBUGFILE);
>>>>>>
>>>>>>       my $filename =
>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>
>>>>>>        # open(DEBUGFILE,'>',$debugfile);
>>>>>>        # open(new,'>',$filename);
>>>>>>        # @arra=<new>;
>>>>>>        # print DEBUGFILE @arra;
>>>>>>        # close(DEBUGFILE);
>>>>>>        # close(new);
>>>>>>
>>>>>>        $factory->save_output($filename);
>>>>>>  # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>      # close(BLASTDEBUGFILE);
>>>>>>
>>>>>>      $factory->remove_rid($rid);
>>>>>>
>>>>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>      print BLASTDEBUGFILE  $organism;
>>>>>>       close(BLASTDEBUGFILE);
>>>>>>
>>>>>>   # open(OUTFILE,'>',$outfile);
>>>>>>   # print OUTFILE "Test2 $result->database_name()";
>>>>>>   # close(OUTFILE);
>>>>>>
>>>>>> #$hit = $result->next_hit;
>>>>>> #open(new,'>',$debugfile);
>>>>>> #print $hit;
>>>>>> #close(new);
>>>>>>
>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>
>>>>>>           next unless ( $v > 0);
>>>>>>
>>>>>>         #     open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "$hit in while hits";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>>>>                 push(@seqs,$dna);
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>>  }
>>>>>>
>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>  #print OUTFILE $seqs[0];
>>>>>>  #close(OUTFILE);
>>>>>>
>>>>>> return(@seqs);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Roopa.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Roopa--
>>>>>>>
>>>>>>> I got your code to work with the following changes:
>>>>>>>
>>>>>>> +# the input should be a valid FASTA file...
>>>>>>> ...
>>>>>>> open(NUC,'>',$nuc);
>>>>>>> +print NUC ">seq (need a name line for valid fasta)\n";
>>>>>>> print NUC $inpu1, "\n";
>>>>>>> close(NUC);
>>>>>>> ...
>>>>>>>
>>>>>>> +# you can set these header parms in the call itself...
>>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params,
>>>>>>> -ENTREZ_QUERY =>
>>>>>>> ''Trypanosoma Brucei[ORGN]');
>>>>>>>
>>>>>>>  #change a paramter
>>>>>>> +# commented this out...
>>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>> 'Trypanosoma
>>>>>>> Brucei[ORGN]';
>>>>>>>
>>>>>>> MAJ
>>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>>>> rtbio.2009 at gmail.com
>>>>>>> >
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, January 08, 2010 10:00 AM
>>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>>>>>>
>>>>>>>
>>>>>>>  Hello all,
>>>>>>>
>>>>>>>>
>>>>>>>> I was trying Remote blast using Bioperl. My input data is a
>>>>>>>> Trypanosoma
>>>>>>>> brucei sequence in Fasta format. When I was trying to submit to
>>>>>>>> BLAST
>>>>>>>> using
>>>>>>>> the step
>>>>>>>> $r=$factory->submit_blast($input)
>>>>>>>> It was not returning anything which I checked by debugging the code.
>>>>>>>> It is
>>>>>>>> not blasting my input sequence even though I mentioned all the
>>>>>>>> parameters.I
>>>>>>>> would paste the code below.
>>>>>>>>
>>>>>>>> Please help me in solving put this problem. It is very urgent.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Roopa.
>>>>>>>>
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #path for extra camel module
>>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>>>> use Roopablast;
>>>>>>>>
>>>>>>>>
>>>>>>>> use Bio::SearchIO;
>>>>>>>> use Bio::Search::Result::BlastResult;
>>>>>>>> use Bio::Perl;
>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>> use Bio::Seq;
>>>>>>>> use Bio::SeqIO;
>>>>>>>> use Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>> my $outstring ="";
>>>>>>>>
>>>>>>>> &parse_form;
>>>>>>>>
>>>>>>>> print "Content-type: text/html\n\n";
>>>>>>>> print "<HTML>\n";
>>>>>>>> print "<head><title>RNAi Result</title>";
>>>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>>>> print "</head>\n";
>>>>>>>> print "<body>\n";
>>>>>>>> print " Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>>>> print "</BODY>\n";
>>>>>>>> print "</HTML>\n";
>>>>>>>>
>>>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>>>> exit if $pid;
>>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>>>> </head>\n
>>>>>>>> <body>\n
>>>>>>>>  Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>>>> wait......<br>
>>>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>>>> </BODY>\n
>>>>>>>> </HTML>\n";
>>>>>>>>
>>>>>>>> close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> @compseqs = blastcode($in{'Inputseq'});
>>>>>>>>
>>>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>>>
>>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>>>> $in{'Threshold'});
>>>>>>>>
>>>>>>>>
>>>>>>>> sub blastcode
>>>>>>>> {
>>>>>>>>
>>>>>>>> $inpu1= $_[0];
>>>>>>>>
>>>>>>>> #$organ= $_[1];
>>>>>>>>
>>>>>>>> open(NUC,'>',$nuc);
>>>>>>>> print NUC $inpu1;
>>>>>>>> close(NUC);
>>>>>>>>
>>>>>>>> my $prog = 'blastn';
>>>>>>>> my $db   = 'refseq_rna';
>>>>>>>> my $e_val= '1e-10';
>>>>>>>> my $organism= 'Trypanosoma Brucei';
>>>>>>>>
>>>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>       '-data' => $db,
>>>>>>>>       '-expect' => $e_val,
>>>>>>>>       '-readmethod' => 'SearchIO',
>>>>>>>>       '-Organism'   => $organism );
>>>>>>>>
>>>>>>>>          # open(OUTFILE,'>',$debugfile);
>>>>>>>>           #  print OUTFILE @params;
>>>>>>>>           # close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>
>>>>>>>>  #change a paramter
>>>>>>>>
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>>>> Brucei[ORGN]';
>>>>>>>>
>>>>>>>> #change a paramter
>>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>>> '$input2[ORGN]';
>>>>>>>>
>>>>>>>>  my $v = 1;
>>>>>>>>  #$v is just to turn on and off the messages
>>>>>>>>
>>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>>>> '-organism' => 'Trypanosoma Brucei' );
>>>>>>>>
>>>>>>>>
>>>>>>>> while (my $input = $str->next_seq())
>>>>>>>> {
>>>>>>>>  #Blast a sequence against a database:
>>>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>>>  #and swap the two lines below for an example of that.
>>>>>>>>
>>>>>>>>           open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE $input;
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  my $r = $factory->submit_blast($input);    #The program stops here
>>>>>>>> it
>>>>>>>> does not return any value and it does not enter the While
>>>>>>>> loop,Please help
>>>>>>>> me in this regard.#
>>>>>>>>              open(OUTFILE,'>',$debugfile);
>>>>>>>>              print OUTFILE $r;
>>>>>>>>              close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>>>
>>>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>>>    open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "while entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>   foreach my $rid ( @rids ) {
>>>>>>>>
>>>>>>>>             open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "foreach entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>>>
>>>>>>>>      if( !ref($rc) )
>>>>>>>>      {
>>>>>>>>      if( $rc < 0 )
>>>>>>>>      {
>>>>>>>>      $factory->remove_rid($rid);
>>>>>>>>      }
>>>>>>>>       open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "if entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>       print STDERR "." if ( $v > 0 );
>>>>>>>>       sleep 5;
>>>>>>>>      }
>>>>>>>>     else {
>>>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "else entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>        my $result = $rc->next_result();
>>>>>>>>       #save the output
>>>>>>>>      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>        print BLASTDEBUGFILE $result->next_hit();
>>>>>>>>        close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>      my $filename =
>>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>>>
>>>>>>>>       # open(DEBUGFILE,'>',$debugfile);
>>>>>>>>       # open(new,'>',$filename);
>>>>>>>>       # @arra=<new>;
>>>>>>>>       # print DEBUGFILE @arra;
>>>>>>>>       # close(DEBUGFILE);
>>>>>>>>       # close(new);
>>>>>>>>
>>>>>>>>       $factory->save_output($filename);
>>>>>>>>
>>>>>>>>     # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>>>     # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>>>     # close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>     $factory->remove_rid($rid);
>>>>>>>>
>>>>>>>>     open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>     print BLASTDEBUGFILE  $organism;
>>>>>>>>      close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>  # open(OUTFILE,'>',$outfile);
>>>>>>>>  # print OUTFILE "Test2 $result->database_name()";
>>>>>>>>  # close(OUTFILE);
>>>>>>>>
>>>>>>>> #$hit = $result->next_hit;
>>>>>>>> #open(new,'>',$debugfile);
>>>>>>>> #print $hit;
>>>>>>>> #close(new);
>>>>>>>>
>>>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>>>
>>>>>>>>          next unless ( $v > 0);
>>>>>>>>
>>>>>>>>        #     open(OUTFILE,'>',$debugfile);
>>>>>>>>         #    print OUTFILE "$hit in while hits";
>>>>>>>>          #  close(OUTFILE);
>>>>>>>>
>>>>>>>>     my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>>>         my $dna = $sequ->seq();        # get the sequence as a
>>>>>>>> string
>>>>>>>>                push(@seqs,$dna);
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>    }
>>>>>>>>  }
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>>>  #print OUTFILE $seqs[0];
>>>>>>>>  #close(OUTFILE);
>>>>>>>>
>>>>>>>> return(@seqs);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile) || die ;
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>>>>>>> <body>\n
>>>>>>>> <p><font face=\"Courier, monospace font set\">
>>>>>>>> Inputsequence: <br>";
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      print OUTFILE " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      print OUTFILE "<br>\n";
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> print OUTFILE "</font> <p>";
>>>>>>>>
>>>>>>>> $z=@compseqs;
>>>>>>>>
>>>>>>>> for($k=1;$k<$z;$k++) {
>>>>>>>>  print OUTFILE "<font face=\"Courier, monospace font
>>>>>>>> set\"><p>Compare
>>>>>>>> Sequence: <br>";
>>>>>>>>
>>>>>>>>  for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>>>>>>
>>>>>>>>      print OUTFILE substr ($compseqs[$k], $i, 1);
>>>>>>>>
>>>>>>>>      if ( ($i+1)%10==0){
>>>>>>>>          print OUTFILE " ";
>>>>>>>>      }
>>>>>>>>      if ( ($i+1)%60==0){
>>>>>>>>          print OUTFILE "<br>\n";
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  print OUTFILE "<p></font>";
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<p>
>>>>>>>> Window: <br>$in{'Windowsize'}
>>>>>>>> <p>
>>>>>>>> <p>
>>>>>>>> Threshold: <br>$in{'Threshold'}
>>>>>>>> <p>";
>>>>>>>> my $j=0;
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>>>>>>      if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>>>>>>          $j=$in{'Windowsize'};
>>>>>>>>      }
>>>>>>>>      $height=$out[$i]->{similar}*5;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ($j>0) {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>      $j--;
>>>>>>>>  }
>>>>>>>>  else {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      $outstring .= " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      $outstring .= "<br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%800==0){
>>>>>>>>      print OUTFILE "<br><br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>>>>>>> set\">$outstring</font>";
>>>>>>>>
>>>>>>>> #foreach (@out) {
>>>>>>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar}
>>>>>>>> matchs<p>";
>>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>>>>>>
>>>>>>>> #    }
>>>>>>>> #}
>>>>>>>>
>>>>>>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>>>>>>
>>>>>>>> close OUTFILE;
>>>>>>>>
>>>>>>>> #nameprint();
>>>>>>>>
>>>>>>>> sub parse_form {
>>>>>>>>  local ($buffer, @pairs, $pair, $name, $value);
>>>>>>>>  # Read in text
>>>>>>>>  $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>>>>>>  if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>>>>>>  {
>>>>>>>>      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>>>>>>  }
>>>>>>>>  else
>>>>>>>>  {
>>>>>>>>      $buffer = $ENV{'QUERY_STRING'};
>>>>>>>>  }
>>>>>>>>  @pairs = split(/&/, $buffer);
>>>>>>>>  foreach $pair (@pairs)
>>>>>>>>  {
>>>>>>>>      ($name, $value) = split(/=/, $pair);
>>>>>>>>      $value =~ tr/+/ /;
>>>>>>>>      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>>>>>>      $in{$name} = $value;
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>


From bernd.web at gmail.com  Thu Jan 21 13:37:18 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 21 Jan 2010 19:37:18 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
	<c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
Message-ID: <716af09c1001211037p59b19a29l1967f1e514469e79@mail.gmail.com>

Hi,

Regarding RemoteBlast, my I add a query?
It seems that Bio::Tools::Run::RemoteBlast  is sending each sequence
seperately to the NCBI (at least in BP 1.5.2).
This means that for each Sequence a RID is to be checked. Is this
indeed the case?
The BLAST URL-API or batch interface supports sending multiple
sequences at once.

Regards,
Bernd

On Thu, Jan 21, 2010 at 7:28 PM, Roopa Raghuveer <rtbio.2009 at gmail.com> wrote:
> Hello Mark,
>
> This is Roopa again. I have a small problem again. I am working on Remote
> blast. The program works well. But the problem is this. ?The program
> accesses the server and gets the output correctly. I am trying to send the
> result sequences into an array and I found that always the first sequence
> among the Result sequences is missing. The code is
>
> ?my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => "$organ\[ORGN]");


From cjfields at illinois.edu  Thu Jan 21 23:31:25 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Jan 2010 22:31:25 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
In-Reply-To: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
Message-ID: <BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>

Jay,

Did you want to release it to CPAN?  I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight.

chris

On Jan 18, 2010, at 6:22 PM, Jay Hannah wrote:

> I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do.
> 
>   http://github.com/jhannah/bio-broodcomb
> 
> It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. 
> 
> The first two functions I stuck in the framework:
> 
> Find subsequences (Bio::BroodComb::SubSeq):
> 
>   use Bio::BroodComb;
>   my $bc = Bio::BroodComb->new();
>   $bc->load_large_seq(file => "large_seq.fasta");
>   $bc->load_small_seq(file => "small_seq.fasta");
>   $bc->find_subseqs();
>   print $bc->subseq_report1;
> 
> In-silico PCR (Bio::BroodComb::PCR):
> 
>  use Bio::BroodComb;
>  my $bc = Bio::BroodComb->new();
>  $bc->load_large_seq(file => "large_seq.fasta");
>  $bc->add_primerset(
>     description    => "U5/R",   # however you want it reported
>     forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA',
>     reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT',
>  );
>  $bc->find_pcr_hits();
>  $bc->find_pcr_products();
>  print $bc->pcr_report1;
> 
> I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever.
> 
> Suggestions, contributions welcome.   :)
> 
>   http://github.com/jhannah/bio-broodcomb
> 
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Fri Jan 22 01:17:14 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 21 Jan 2010 22:17:14 -0800
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
Message-ID: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>

I'm considering putting in allowable initialization parameter (and get/ 
set) for Bio::AlignIO that would allow setting of the alphabet.  This  
is then passed to Bio::LocatableSeq creation so that _guess_alphabet  
isn't called. This will allow removal of warnings about empty  
sequences because _guess_alphabet won't be called on a sequence if we  
have explictly set the alphabet.

This worked great on my local install and tests pass.  Any objections  
or concerns?

basically it means when you make an AlignIO you can specify the  
alphabet i.e.

my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
file => 'genome.fasaln');

I have some alignments with empty sequences and I think turning off  
the warnings is appropriate where I force the alphabet choice. It  
should also have a very modest speedup benefit too.

-jason
--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From rtbio.2009 at gmail.com  Fri Jan 22 04:54:32 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 22 Jan 2010 10:54:32 +0100
Subject: [Bioperl-l] Fwd:  Regarding blast in Bioperl
In-Reply-To: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
	<c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
Message-ID: <c7cac1601001220154r4f92651ejb79663898e0b8fc2@mail.gmail.com>

---------- Forwarded message ----------
From: Roopa Raghuveer <rtbio.2009 at gmail.com>
Date: Thu, Jan 21, 2010 at 7:28 PM
Subject: Re: [Bioperl-l] Regarding blast in Bioperl
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: bioperl-l at lists.open-bio.org


Hello Mark,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen <maj at fortinbras.us> wrote:

>  Excellent Roopa- it's my pleasure-- MAJ
>
> ----- Original Message -----
>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
> *To:* Mark A. Jensen <maj at fortinbras.us>
>  *Sent:* Saturday, January 09, 2010 6:41 PM
> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>
> Hi Mark,
>
> Thank you very very much. The code is working now. Thanks for the support
> and time you have spent on me.
>
> Thanks in advance
> Roopa.
>
> On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>>  There is still a bug with the double quotes. Use "$organ\[ORGN]", which
>> prevents perl from
>> looking for a member of an array called @organ. This would have shown up
>> if 'use strict;' had
>> been in place. Still don't know whether this would work precisely; can you
>> send me the query
>> sequence so I can reproduce your ouput?
>> thanks MAJ
>>
>>  ----- Original Message -----
>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>> *Sent:* Saturday, January 09, 2010 2:02 PM
>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>
>> Hi Mark,
>>
>> I tried it with double quotes but still i got the same o/p with sequences
>> from different species.
>>
>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...  1813
>> 0.0
>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...  1622
>> 0.0
>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...   773
>> 0.0
>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...   749
>> 0.0
>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   542
>> 2e-151
>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...   196
>> 3e-47
>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...   190
>> 1e-45
>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...   181
>> 7e-43
>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...   179
>> 2e-42
>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   178
>> 8e-42
>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...   170
>> 1e-39
>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...   169
>> 4e-39
>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...   167
>> 1e-38
>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...   150
>> 1e-33
>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...   145
>> 5e-32
>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...   143
>> 2e-31
>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...   143
>> 2e-31
>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...   138
>> 7e-30
>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...   138
>> 7e-30
>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...   136
>> 2e-29
>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...   136
>> 2e-29
>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...   134
>> 9e-29
>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...   132
>> 3e-28
>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...   132
>> 3e-28
>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...   132
>> 3e-28
>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...   131
>> 1e-27
>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...   131
>> 1e-27
>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...   129
>> 4e-27
>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...   129
>> 4e-27
>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...   129
>> 4e-27
>> ref|NM_001003470.1|  Danio rerio protein kinase, cAMP-dependen...   129
>> 4e-27
>> ref|XM_001141503.1|  PREDICTED: Pan troglodytes verus protein ...   127
>> 1e-26
>> ref|XM_001145269.1|  PREDICTED: Pan troglodytes protein kinase...   127
>> 1e-26
>> ref|XM_512434.2|  PREDICTED: Pan troglodytes cAMP-dependent pr...   127
>> 1e-26
>> ref|XM_001171457.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_001171437.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_847420.1|  PREDICTED: Canis familiaris similar to Serin...   127
>> 1e-26
>> ref|NM_207518.1|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>> ref|NM_002730.3|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>>
>>
>> Thanks in advance.
>>
>> Roopa.
>>
>> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>>  I understand you. Put in the double quotes and see what happens.
>>>
>>>  ----- Original Message -----
>>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>> *To:* Mark A. Jensen <maj at fortinbras.us>
>>>   *Sent:* Saturday, January 09, 2010 1:40 PM
>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>
>>> Hi Mark,
>>>
>>> Thanks for your reply. It was working when I specifically use the name of
>>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a
>>> $organ which takes the organism given by the user i.e., let it be anything
>>>
>>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc.,  I should get the
>>> sequences related to only those organisms.
>>>
>>> i.e., If the user enters Pseudomonas,the $organ parameter of the code
>>> takes Pseudomonas ,does BLAST and returns only those sequences that produce
>>> significant alignment with Pseudomonas(only).But this is not happening like
>>> that .
>>>
>>> Please help me in this regard.
>>>
>>> Thanks in advance
>>> Roopa
>>>
>>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>
>>>>  Hi Roopa-- You may get what you want if you make the change.
>>>> With single quotes, ENTREZ_QUERY is set to the literal string
>>>>
>>>>  $organ[ORGN]
>>>>
>>>> while, with double quotes, the variable value will be substituted,
>>>> and the parameter should be set to
>>>>
>>>>  Trypanosoma brucei[ORGN]
>>>>
>>>> I'm guess that it worked because the database ignored the strange
>>>> parameter,
>>>> and returned all the matches. Try this and if it doesn't work I look
>>>> harder.
>>>> cheers,
>>>> Mark
>>>>
>>>>  ----- Original Message -----
>>>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>>>> *Sent:* Saturday, January 09, 2010 1:24 PM
>>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>>
>>>> hello Mark,
>>>>
>>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in
>>>> double quotations,but. I would like to have only those specific sequences
>>>> which are specific for my Organism i.e., I need sequences only from the
>>>> organism that I entered.
>>>>
>>>> When the organism is Trypanosoma brucei,I could get even Leishmania and
>>>> other species as the similar sequences. But I want to get only trypanosoma
>>>> brucei sequences.
>>>>
>>>> Could you please help me out in this regard?
>>>>
>>>> Roopa.
>>>>
>>>> My output
>>>>
>>>> I/P organism: Trypanosoma brucei
>>>>
>>>> O/P:-
>>>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1813    0.0
>>>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1622    0.0
>>>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 773    0.0
>>>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 749    0.0
>>>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 542    2e-151
>>>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...
>>>> 196    3e-47
>>>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 190    1e-45
>>>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...
>>>> 181    7e-43
>>>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 179    2e-42
>>>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 178    8e-42
>>>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 170    1e-39
>>>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...
>>>> 168    4e-39
>>>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...
>>>> 167    1e-38
>>>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 150    1e-33
>>>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...
>>>> 145    5e-32
>>>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...
>>>> 143    2e-31
>>>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...
>>>> 143    2e-31
>>>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...
>>>> 138    7e-30
>>>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...
>>>> 138    7e-30
>>>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...
>>>> 136    2e-29
>>>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...
>>>> 136    2e-29
>>>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...
>>>> 134    9e-29
>>>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...
>>>> 132    3e-28
>>>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...
>>>> 132    3e-28
>>>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...
>>>> 132    3e-28
>>>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...
>>>> 131    1e-27
>>>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 131    1e-27
>>>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...
>>>> 129    4e-27
>>>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...
>>>> 129    4e-27
>>>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...
>>>> 129    4e-27
>>>>
>>>> Roopa.
>>>>
>>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>>
>>>>> I see it immediately (from making same bug many times) :
>>>>>
>>>>>
>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>> =>
>>>>> - '$organ[ORGN]');
>>>>> +"$organ[ORGN]");
>>>>>
>>>>>
>>>>> MAJ
>>>>>
>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>> rtbio.2009 at gmail.com>
>>>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>>>> Cc: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Saturday, January 09, 2010 11:57 AM
>>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>>
>>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei
>>>>>> as
>>>>>> the organism parameter,but when I tried to use the Organism parameter
>>>>>> from
>>>>>> the user,it was not working i.e., I was unable to get the target
>>>>>> sequences.
>>>>>> Please help me in this regard. My code is
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>>
>>>>>> #path for extra camel module
>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>> use Roopablast;
>>>>>>
>>>>>>
>>>>>> use Bio::SearchIO;
>>>>>> use Bio::Search::Result::BlastResult;
>>>>>> use Bio::Perl;
>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::DB::GenBank;
>>>>>>
>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>> my $outstring ="";
>>>>>>
>>>>>> &parse_form;
>>>>>>
>>>>>> print "Content-type: text/html\n\n";
>>>>>> print "<HTML>\n";
>>>>>> print "<head><title>RNAi Result</title>";
>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>> print "</head>\n";
>>>>>> print "<body>\n";
>>>>>> print " Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>> print "</BODY>\n";
>>>>>> print "</HTML>\n";
>>>>>>
>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>> exit if $pid;
>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>
>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>
>>>>>> print OUTFILE "<HTML>\n
>>>>>> <head><title>RNAi Result</title>
>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>> </head>\n
>>>>>> <body>\n
>>>>>>  Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>> wait......<br>
>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>> </BODY>\n
>>>>>> </HTML>\n";
>>>>>>
>>>>>> close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
>>>>>>
>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>
>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>> $in{'Threshold'});
>>>>>>
>>>>>>
>>>>>> sub blastcode
>>>>>> {
>>>>>>
>>>>>> $inpu1= $_[0];
>>>>>>
>>>>>> $organ= $_[1];
>>>>>>
>>>>>> open(NUC,'>',$nuc);
>>>>>> print NUC $inpu1,"\n";
>>>>>> close(NUC);
>>>>>>
>>>>>> my $prog = 'blastn';
>>>>>> my $db   = 'refseq_rna';
>>>>>> my $e_val= '1e-10';
>>>>>> my $organism= $organ;
>>>>>>
>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>
>>>>>> my @params = ( '-prog' => $prog,
>>>>>>        '-data' => $db,
>>>>>>        '-expect' => $e_val,
>>>>>>        '-readmethod' => 'SearchIO',
>>>>>>       '-Organism'   => $organism );
>>>>>>
>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>            print OUTFILE $inpu1;
>>>>>>             close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>>> =>
>>>>>> '$organ[ORGN]');
>>>>>>
>>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>
>>>>>>  #change a paramter
>>>>>>
>>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>> Brucei[ORGN]';
>>>>>>
>>>>>> #change a paramter
>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>> '$input2[ORGN]';
>>>>>>
>>>>>>  my $v = 1;
>>>>>>  #$v is just to turn on and off the messages
>>>>>>
>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>> '-organism' => $organ );
>>>>>>
>>>>>>
>>>>>> while (my $input = $str->next_seq())
>>>>>> {
>>>>>>  #Blast a sequence against a database:
>>>>>>   #Alternatively, you could  pass in a file with many
>>>>>>   #sequences rather than loop through sequence one at a time
>>>>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>   #and swap the two lines below for an example of that.
>>>>>>
>>>>>>            #open(OUTFILE,'>',$debugfile);
>>>>>>             # print OUTFILE $input;
>>>>>>             #close(OUTFILE);
>>>>>>
>>>>>>
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>
>>>>>>               open(OUTFILE,'>',$debugfile);
>>>>>>            #   print OUTFILE $r;
>>>>>>               close(OUTFILE);
>>>>>>
>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>  #   open(OUTFILE,'>',$debugfile);
>>>>>>   #           print OUTFILE "while entered";
>>>>>>    #         close(OUTFILE);
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>
>>>>>>     #         open(OUTFILE,'>',$debugfile);
>>>>>>      #        print OUTFILE "foreach entered";
>>>>>>       #      close(OUTFILE);
>>>>>>
>>>>>>       my $rc = $factory->retrieve_blast($rid);
>>>>>>
>>>>>>       if( !ref($rc) )
>>>>>>       {
>>>>>>       if( $rc < 0 )
>>>>>>       {
>>>>>>       $factory->remove_rid($rid);
>>>>>>       }
>>>>>>        open(OUTFILE,'>',$debugfile);
>>>>>>        #      print OUTFILE "if entered";
>>>>>>             close(OUTFILE);
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>       }
>>>>>>      else {
>>>>>>         #    open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "else entered";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>         my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>         print BLASTDEBUGFILE $result->next_hit();
>>>>>>         close(BLASTDEBUGFILE);
>>>>>>
>>>>>>       my $filename =
>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>
>>>>>>        # open(DEBUGFILE,'>',$debugfile);
>>>>>>        # open(new,'>',$filename);
>>>>>>        # @arra=<new>;
>>>>>>        # print DEBUGFILE @arra;
>>>>>>        # close(DEBUGFILE);
>>>>>>        # close(new);
>>>>>>
>>>>>>        $factory->save_output($filename);
>>>>>>  # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>      # close(BLASTDEBUGFILE);
>>>>>>
>>>>>>      $factory->remove_rid($rid);
>>>>>>
>>>>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>      print BLASTDEBUGFILE  $organism;
>>>>>>       close(BLASTDEBUGFILE);
>>>>>>
>>>>>>   # open(OUTFILE,'>',$outfile);
>>>>>>   # print OUTFILE "Test2 $result->database_name()";
>>>>>>   # close(OUTFILE);
>>>>>>
>>>>>> #$hit = $result->next_hit;
>>>>>> #open(new,'>',$debugfile);
>>>>>> #print $hit;
>>>>>> #close(new);
>>>>>>
>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>
>>>>>>           next unless ( $v > 0);
>>>>>>
>>>>>>         #     open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "$hit in while hits";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>>>>                 push(@seqs,$dna);
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>>  }
>>>>>>
>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>  #print OUTFILE $seqs[0];
>>>>>>  #close(OUTFILE);
>>>>>>
>>>>>> return(@seqs);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Roopa.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Roopa--
>>>>>>>
>>>>>>> I got your code to work with the following changes:
>>>>>>>
>>>>>>> +# the input should be a valid FASTA file...
>>>>>>> ...
>>>>>>> open(NUC,'>',$nuc);
>>>>>>> +print NUC ">seq (need a name line for valid fasta)\n";
>>>>>>> print NUC $inpu1, "\n";
>>>>>>> close(NUC);
>>>>>>> ...
>>>>>>>
>>>>>>> +# you can set these header parms in the call itself...
>>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params,
>>>>>>> -ENTREZ_QUERY =>
>>>>>>> ''Trypanosoma Brucei[ORGN]');
>>>>>>>
>>>>>>>  #change a paramter
>>>>>>> +# commented this out...
>>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>> 'Trypanosoma
>>>>>>> Brucei[ORGN]';
>>>>>>>
>>>>>>> MAJ
>>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>>>> rtbio.2009 at gmail.com
>>>>>>> >
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, January 08, 2010 10:00 AM
>>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>>>>>>
>>>>>>>
>>>>>>>  Hello all,
>>>>>>>
>>>>>>>>
>>>>>>>> I was trying Remote blast using Bioperl. My input data is a
>>>>>>>> Trypanosoma
>>>>>>>> brucei sequence in Fasta format. When I was trying to submit to
>>>>>>>> BLAST
>>>>>>>> using
>>>>>>>> the step
>>>>>>>> $r=$factory->submit_blast($input)
>>>>>>>> It was not returning anything which I checked by debugging the code.
>>>>>>>> It is
>>>>>>>> not blasting my input sequence even though I mentioned all the
>>>>>>>> parameters.I
>>>>>>>> would paste the code below.
>>>>>>>>
>>>>>>>> Please help me in solving put this problem. It is very urgent.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Roopa.
>>>>>>>>
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #path for extra camel module
>>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>>>> use Roopablast;
>>>>>>>>
>>>>>>>>
>>>>>>>> use Bio::SearchIO;
>>>>>>>> use Bio::Search::Result::BlastResult;
>>>>>>>> use Bio::Perl;
>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>> use Bio::Seq;
>>>>>>>> use Bio::SeqIO;
>>>>>>>> use Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>> my $outstring ="";
>>>>>>>>
>>>>>>>> &parse_form;
>>>>>>>>
>>>>>>>> print "Content-type: text/html\n\n";
>>>>>>>> print "<HTML>\n";
>>>>>>>> print "<head><title>RNAi Result</title>";
>>>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>>>> print "</head>\n";
>>>>>>>> print "<body>\n";
>>>>>>>> print " Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>>>> print "</BODY>\n";
>>>>>>>> print "</HTML>\n";
>>>>>>>>
>>>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>>>> exit if $pid;
>>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>>>> </head>\n
>>>>>>>> <body>\n
>>>>>>>>  Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>>>> wait......<br>
>>>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>>>> </BODY>\n
>>>>>>>> </HTML>\n";
>>>>>>>>
>>>>>>>> close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> @compseqs = blastcode($in{'Inputseq'});
>>>>>>>>
>>>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>>>
>>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>>>> $in{'Threshold'});
>>>>>>>>
>>>>>>>>
>>>>>>>> sub blastcode
>>>>>>>> {
>>>>>>>>
>>>>>>>> $inpu1= $_[0];
>>>>>>>>
>>>>>>>> #$organ= $_[1];
>>>>>>>>
>>>>>>>> open(NUC,'>',$nuc);
>>>>>>>> print NUC $inpu1;
>>>>>>>> close(NUC);
>>>>>>>>
>>>>>>>> my $prog = 'blastn';
>>>>>>>> my $db   = 'refseq_rna';
>>>>>>>> my $e_val= '1e-10';
>>>>>>>> my $organism= 'Trypanosoma Brucei';
>>>>>>>>
>>>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>       '-data' => $db,
>>>>>>>>       '-expect' => $e_val,
>>>>>>>>       '-readmethod' => 'SearchIO',
>>>>>>>>       '-Organism'   => $organism );
>>>>>>>>
>>>>>>>>          # open(OUTFILE,'>',$debugfile);
>>>>>>>>           #  print OUTFILE @params;
>>>>>>>>           # close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>
>>>>>>>>  #change a paramter
>>>>>>>>
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>>>> Brucei[ORGN]';
>>>>>>>>
>>>>>>>> #change a paramter
>>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>>> '$input2[ORGN]';
>>>>>>>>
>>>>>>>>  my $v = 1;
>>>>>>>>  #$v is just to turn on and off the messages
>>>>>>>>
>>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>>>> '-organism' => 'Trypanosoma Brucei' );
>>>>>>>>
>>>>>>>>
>>>>>>>> while (my $input = $str->next_seq())
>>>>>>>> {
>>>>>>>>  #Blast a sequence against a database:
>>>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>>>  #and swap the two lines below for an example of that.
>>>>>>>>
>>>>>>>>           open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE $input;
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  my $r = $factory->submit_blast($input);    #The program stops here
>>>>>>>> it
>>>>>>>> does not return any value and it does not enter the While
>>>>>>>> loop,Please help
>>>>>>>> me in this regard.#
>>>>>>>>              open(OUTFILE,'>',$debugfile);
>>>>>>>>              print OUTFILE $r;
>>>>>>>>              close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>>>
>>>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>>>    open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "while entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>   foreach my $rid ( @rids ) {
>>>>>>>>
>>>>>>>>             open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "foreach entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>>>
>>>>>>>>      if( !ref($rc) )
>>>>>>>>      {
>>>>>>>>      if( $rc < 0 )
>>>>>>>>      {
>>>>>>>>      $factory->remove_rid($rid);
>>>>>>>>      }
>>>>>>>>       open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "if entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>       print STDERR "." if ( $v > 0 );
>>>>>>>>       sleep 5;
>>>>>>>>      }
>>>>>>>>     else {
>>>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "else entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>        my $result = $rc->next_result();
>>>>>>>>       #save the output
>>>>>>>>      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>        print BLASTDEBUGFILE $result->next_hit();
>>>>>>>>        close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>      my $filename =
>>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>>>
>>>>>>>>       # open(DEBUGFILE,'>',$debugfile);
>>>>>>>>       # open(new,'>',$filename);
>>>>>>>>       # @arra=<new>;
>>>>>>>>       # print DEBUGFILE @arra;
>>>>>>>>       # close(DEBUGFILE);
>>>>>>>>       # close(new);
>>>>>>>>
>>>>>>>>       $factory->save_output($filename);
>>>>>>>>
>>>>>>>>     # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>>>     # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>>>     # close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>     $factory->remove_rid($rid);
>>>>>>>>
>>>>>>>>     open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>     print BLASTDEBUGFILE  $organism;
>>>>>>>>      close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>  # open(OUTFILE,'>',$outfile);
>>>>>>>>  # print OUTFILE "Test2 $result->database_name()";
>>>>>>>>  # close(OUTFILE);
>>>>>>>>
>>>>>>>> #$hit = $result->next_hit;
>>>>>>>> #open(new,'>',$debugfile);
>>>>>>>> #print $hit;
>>>>>>>> #close(new);
>>>>>>>>
>>>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>>>
>>>>>>>>          next unless ( $v > 0);
>>>>>>>>
>>>>>>>>        #     open(OUTFILE,'>',$debugfile);
>>>>>>>>         #    print OUTFILE "$hit in while hits";
>>>>>>>>          #  close(OUTFILE);
>>>>>>>>
>>>>>>>>     my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>>>         my $dna = $sequ->seq();        # get the sequence as a
>>>>>>>> string
>>>>>>>>                push(@seqs,$dna);
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>    }
>>>>>>>>  }
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>>>  #print OUTFILE $seqs[0];
>>>>>>>>  #close(OUTFILE);
>>>>>>>>
>>>>>>>> return(@seqs);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile) || die ;
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>>>>>>> <body>\n
>>>>>>>> <p><font face=\"Courier, monospace font set\">
>>>>>>>> Inputsequence: <br>";
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      print OUTFILE " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      print OUTFILE "<br>\n";
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> print OUTFILE "</font> <p>";
>>>>>>>>
>>>>>>>> $z=@compseqs;
>>>>>>>>
>>>>>>>> for($k=1;$k<$z;$k++) {
>>>>>>>>  print OUTFILE "<font face=\"Courier, monospace font
>>>>>>>> set\"><p>Compare
>>>>>>>> Sequence: <br>";
>>>>>>>>
>>>>>>>>  for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>>>>>>
>>>>>>>>      print OUTFILE substr ($compseqs[$k], $i, 1);
>>>>>>>>
>>>>>>>>      if ( ($i+1)%10==0){
>>>>>>>>          print OUTFILE " ";
>>>>>>>>      }
>>>>>>>>      if ( ($i+1)%60==0){
>>>>>>>>          print OUTFILE "<br>\n";
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  print OUTFILE "<p></font>";
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<p>
>>>>>>>> Window: <br>$in{'Windowsize'}
>>>>>>>> <p>
>>>>>>>> <p>
>>>>>>>> Threshold: <br>$in{'Threshold'}
>>>>>>>> <p>";
>>>>>>>> my $j=0;
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>>>>>>      if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>>>>>>          $j=$in{'Windowsize'};
>>>>>>>>      }
>>>>>>>>      $height=$out[$i]->{similar}*5;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ($j>0) {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>      $j--;
>>>>>>>>  }
>>>>>>>>  else {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      $outstring .= " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      $outstring .= "<br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%800==0){
>>>>>>>>      print OUTFILE "<br><br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>>>>>>> set\">$outstring</font>";
>>>>>>>>
>>>>>>>> #foreach (@out) {
>>>>>>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar}
>>>>>>>> matchs<p>";
>>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>>>>>>
>>>>>>>> #    }
>>>>>>>> #}
>>>>>>>>
>>>>>>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>>>>>>
>>>>>>>> close OUTFILE;
>>>>>>>>
>>>>>>>> #nameprint();
>>>>>>>>
>>>>>>>> sub parse_form {
>>>>>>>>  local ($buffer, @pairs, $pair, $name, $value);
>>>>>>>>  # Read in text
>>>>>>>>  $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>>>>>>  if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>>>>>>  {
>>>>>>>>      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>>>>>>  }
>>>>>>>>  else
>>>>>>>>  {
>>>>>>>>      $buffer = $ENV{'QUERY_STRING'};
>>>>>>>>  }
>>>>>>>>  @pairs = split(/&/, $buffer);
>>>>>>>>  foreach $pair (@pairs)
>>>>>>>>  {
>>>>>>>>      ($name, $value) = split(/=/, $pair);
>>>>>>>>      $value =~ tr/+/ /;
>>>>>>>>      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>>>>>>      $in{$name} = $value;
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>


From maj at fortinbras.us  Fri Jan 22 07:34:59 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 07:34:59 -0500
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
Message-ID: <BB6A0E3FAC154E8FB690E5749375A1BC@NewLife>

I'm down with that.

----- Original Message ----- 
From: "Jason Stajich" <jason at bioperl.org>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 1:17 AM
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO


> I'm considering putting in allowable initialization parameter (and get/ 
> set) for Bio::AlignIO that would allow setting of the alphabet.  This  
> is then passed to Bio::LocatableSeq creation so that _guess_alphabet  
> isn't called. This will allow removal of warnings about empty  
> sequences because _guess_alphabet won't be called on a sequence if we  
> have explictly set the alphabet.
> 
> This worked great on my local install and tests pass.  Any objections  
> or concerns?
> 
> basically it means when you make an AlignIO you can specify the  
> alphabet i.e.
> 
> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
> file => 'genome.fasaln');
> 
> I have some alignments with empty sequences and I think turning off  
> the warnings is appropriate where I force the alphabet choice. It  
> should also have a very modest speedup benefit too.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From avilella at gmail.com  Fri Jan 22 08:07:26 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 13:07:26 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
Message-ID: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>

Hi,

I would like to write a script that merges fragments in a Bio::SimpleAlign
object on the basis of
some $seq->display_name rule.

I basically want to start with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.234     QWERTYU-------------------
seq2.345     ----------ASDFGH----------
seq2.456     -------------------ZXCVBNM

And end with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM

Can people suggest any Bio::SimpleAlign methods that would help here?

Cheers,

Albert.


From maj at fortinbras.us  Fri Jan 22 08:31:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 08:31:54 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
Message-ID: <EF1FEC1B43C146B6BBF827EA56171777@NewLife>

Here's one of my favorite tricks for this: XOR mask on gap symbol.
MAJ

use Bio::SeqIO;
use Bio::Seq;
use strict; 

my $seqio = Bio::SeqIO->new( -fh => \*DATA );

my $acc = $seqio->next_seq->seq ^ '-';
while ($_ = $seqio->next_seq ) {
    $acc ^= ($_->seq ^ '-');
}
my $mrg = Bio::Seq->new( -id => 'merged',
    -seq => $acc ^ '-' );
1;


__END__
>seq2.234     
QWERTYU-------------------
>seq2.345     
----------ASDFGH----------
>seq2.456     
-------------------ZXCVBNM

----- Original Message ----- 
From: "Albert Vilella" <avilella at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 8:07 AM
Subject: [Bioperl-l] Merging fragments in a simplealign


> Hi,
> 
> I would like to write a script that merges fragments in a Bio::SimpleAlign
> object on the basis of
> some $seq->display_name rule.
> 
> I basically want to start with something like this:
> 
> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> seq2.234     QWERTYU-------------------
> seq2.345     ----------ASDFGH----------
> seq2.456     -------------------ZXCVBNM
> 
> And end with something like this:
> 
> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
> 
> Can people suggest any Bio::SimpleAlign methods that would help here?
> 
> Cheers,
> 
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Fri Jan 22 08:34:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 07:34:07 -0600
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
Message-ID: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>

Sounds good to me.  The warnings are a bit too tight on this module anyway.

I still think we have plans towards refactoring some of this, not sure how far along they are:

http://www.bioperl.org/wiki/Align_Refactor

chris

On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote:

> I'm considering putting in allowable initialization parameter (and get/set) for Bio::AlignIO that would allow setting of the alphabet.  This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet.
> 
> This worked great on my local install and tests pass.  Any objections or concerns?
> 
> basically it means when you make an AlignIO you can specify the alphabet i.e.
> 
> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', -file => 'genome.fasaln');
> 
> I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Fri Jan 22 08:40:57 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 07:40:57 -0600
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <EF1FEC1B43C146B6BBF827EA56171777@NewLife>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
Message-ID: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>

May be something for the cook/scrapbook?

chris

On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:

> Here's one of my favorite tricks for this: XOR mask on gap symbol.
> MAJ
> 
> use Bio::SeqIO;
> use Bio::Seq;
> use strict; 
> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
> 
> my $acc = $seqio->next_seq->seq ^ '-';
> while ($_ = $seqio->next_seq ) {
>   $acc ^= ($_->seq ^ '-');
> }
> my $mrg = Bio::Seq->new( -id => 'merged',
>   -seq => $acc ^ '-' );
> 1;
> 
> 
> __END__
>> seq2.234     
> QWERTYU-------------------
>> seq2.345     
> ----------ASDFGH----------
>> seq2.456     
> -------------------ZXCVBNM
> 
> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 22, 2010 8:07 AM
> Subject: [Bioperl-l] Merging fragments in a simplealign
> 
> 
>> Hi,
>> I would like to write a script that merges fragments in a Bio::SimpleAlign
>> object on the basis of
>> some $seq->display_name rule.
>> I basically want to start with something like this:
>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> seq2.234     QWERTYU-------------------
>> seq2.345     ----------ASDFGH----------
>> seq2.456     -------------------ZXCVBNM
>> And end with something like this:
>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>> Can people suggest any Bio::SimpleAlign methods that would help here?
>> Cheers,
>> Albert.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From holland at eaglegenomics.com  Fri Jan 22 05:51:52 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 22 Jan 2010 10:51:52 +0000
Subject: [Bioperl-l] [BioSQL-l] SwissProt DE lines and UniProt XML /
	TagTree as XML in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <8FECCBDE-2DE1-40EE-B5A4-73BDAC893E2D@eaglegenomics.com>

Nice idea. Currently, BioJava just stores the complete section as a string without parsing it, but it provides a parser module for converting it into useful tag/value format within a user's program (but not to be stored in BioSQL).

On 21 Jan 2010, at 12:33, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andrea at biocomp.unibo.it  Fri Jan 22 07:18:32 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 22 Jan 2010 13:18:32 +0100 (CET)
Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML
	in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <2b6e30c4628585042366646a7b46386e.squirrel@lipid.biocomp.unibo.it>

I think that the point here can be a little broader, since not only the
swissprot DE lines carry complex and structured data.
To define a common, language-independent way to store structured data into
the comment and *_qualifier_value tables of the actual BioSQL schema could
be very useful.
XML looks like a good candidate to me, and the UniprotXML format can be
used as reference or as a template to start from.
Each Bio* project will then parse and report this structured data in its
own programming language data structure.

Andrea


> Hi all,
>
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
>
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
>
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
>
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
>
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
>
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
>
> Regards,
>
> Peter
>


From avilella at gmail.com  Fri Jan 22 11:04:13 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 16:04:13 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
Message-ID: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>

Is there/should be a 'have_pairwise_overlap' method similar to this?

# $seq1 and $seq3 have matching ids
my $seq1 = $aln->each_seq_by_id($seq1->display_id);
my $seq3 = $aln->each_seq_by_id($seq3->display_id);

my $ret = $aln->have_pairwise_overlap($seq1,$seq3);

On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu> wrote:

> May be something for the cook/scrapbook?
>
> chris
>
> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>
> > Here's one of my favorite tricks for this: XOR mask on gap symbol.
> > MAJ
> >
> > use Bio::SeqIO;
> > use Bio::Seq;
> > use strict;
> > my $seqio = Bio::SeqIO->new( -fh => \*DATA );
> >
> > my $acc = $seqio->next_seq->seq ^ '-';
> > while ($_ = $seqio->next_seq ) {
> >   $acc ^= ($_->seq ^ '-');
> > }
> > my $mrg = Bio::Seq->new( -id => 'merged',
> >   -seq => $acc ^ '-' );
> > 1;
> >
> >
> > __END__
> >> seq2.234
> > QWERTYU-------------------
> >> seq2.345
> > ----------ASDFGH----------
> >> seq2.456
> > -------------------ZXCVBNM
> >
> > ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
> > To: <bioperl-l at lists.open-bio.org>
> > Sent: Friday, January 22, 2010 8:07 AM
> > Subject: [Bioperl-l] Merging fragments in a simplealign
> >
> >
> >> Hi,
> >> I would like to write a script that merges fragments in a
> Bio::SimpleAlign
> >> object on the basis of
> >> some $seq->display_name rule.
> >> I basically want to start with something like this:
> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> >> seq2.234     QWERTYU-------------------
> >> seq2.345     ----------ASDFGH----------
> >> seq2.456     -------------------ZXCVBNM
> >> And end with something like this:
> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> >> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
> >> Can people suggest any Bio::SimpleAlign methods that would help here?
> >> Cheers,
> >> Albert.
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From maj at fortinbras.us  Fri Jan 22 11:02:55 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 11:02:55 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
Message-ID: <BE7957A2791345DAB092D997A4656AA8@NewLife>

http://www.bioperl.org/wiki/Merge_gapped_sequences_across_a_common_region
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Albert Vilella" <avilella at gmail.com>; <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 8:40 AM
Subject: Re: [Bioperl-l] Merging fragments in a simplealign


> May be something for the cook/scrapbook?
> 
> chris
> 
> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
> 
>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>> MAJ
>> 
>> use Bio::SeqIO;
>> use Bio::Seq;
>> use strict; 
>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>> 
>> my $acc = $seqio->next_seq->seq ^ '-';
>> while ($_ = $seqio->next_seq ) {
>>   $acc ^= ($_->seq ^ '-');
>> }
>> my $mrg = Bio::Seq->new( -id => 'merged',
>>   -seq => $acc ^ '-' );
>> 1;
>> 
>> 
>> __END__
>>> seq2.234     
>> QWERTYU-------------------
>>> seq2.345     
>> ----------ASDFGH----------
>>> seq2.456     
>> -------------------ZXCVBNM
>> 
>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 22, 2010 8:07 AM
>> Subject: [Bioperl-l] Merging fragments in a simplealign
>> 
>> 
>>> Hi,
>>> I would like to write a script that merges fragments in a Bio::SimpleAlign
>>> object on the basis of
>>> some $seq->display_name rule.
>>> I basically want to start with something like this:
>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>> seq2.234     QWERTYU-------------------
>>> seq2.345     ----------ASDFGH----------
>>> seq2.456     -------------------ZXCVBNM
>>> And end with something like this:
>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>> Cheers,
>>> Albert.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
>


From avilella at gmail.com  Fri Jan 22 12:50:57 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 17:50:57 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
	<358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
Message-ID: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>

Or to rephrase my answer, what is the closest way for the code below that
already exists?

On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:

> Is there/should be a 'have_pairwise_overlap' method similar to this?
>
> # $seq1 and $seq3 have matching ids
> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>
> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>
>
> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> May be something for the cook/scrapbook?
>>
>> chris
>>
>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>
>> > Here's one of my favorite tricks for this: XOR mask on gap symbol.
>> > MAJ
>> >
>> > use Bio::SeqIO;
>> > use Bio::Seq;
>> > use strict;
>> > my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>> >
>> > my $acc = $seqio->next_seq->seq ^ '-';
>> > while ($_ = $seqio->next_seq ) {
>> >   $acc ^= ($_->seq ^ '-');
>> > }
>> > my $mrg = Bio::Seq->new( -id => 'merged',
>> >   -seq => $acc ^ '-' );
>> > 1;
>> >
>> >
>> > __END__
>> >> seq2.234
>> > QWERTYU-------------------
>> >> seq2.345
>> > ----------ASDFGH----------
>> >> seq2.456
>> > -------------------ZXCVBNM
>> >
>> > ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>> >
>> > To: <bioperl-l at lists.open-bio.org>
>> > Sent: Friday, January 22, 2010 8:07 AM
>> > Subject: [Bioperl-l] Merging fragments in a simplealign
>> >
>> >
>> >> Hi,
>> >> I would like to write a script that merges fragments in a
>> Bio::SimpleAlign
>> >> object on the basis of
>> >> some $seq->display_name rule.
>> >> I basically want to start with something like this:
>> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> >> seq2.234     QWERTYU-------------------
>> >> seq2.345     ----------ASDFGH----------
>> >> seq2.456     -------------------ZXCVBNM
>> >> And end with something like this:
>> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> >> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>> >> Can people suggest any Bio::SimpleAlign methods that would help here?
>> >> Cheers,
>> >> Albert.
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From jay at jays.net  Fri Jan 22 13:30:57 2010
From: jay at jays.net (Jay Hannah)
Date: Fri, 22 Jan 2010 12:30:57 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
In-Reply-To: <BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>
References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
	<BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>
Message-ID: <EAD0FFCE-6DDF-4723-8D08-70ECF157FAAA@jays.net>

On Jan 21, 2010, at 10:31 PM, Chris Fields wrote:
> Did you want to release it to CPAN?  I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight.

Yes, I was thinking I would. No one has (yet) told me it's the worst idea ever, so I'm feeling encouraged.  :)

Given smallish inputs / databases (up to a few million rows) where some lightweight schema + SQLite + BioPerl can get the job done, it's nice to have a little easy-to-run toolbox. New tables and Roles bolt on easily, so I'll be adding them as they surface at $work[1]. 

Thanks for your interest.   :)

Jay Hannah
http://github.com/jhannah/bio-broodcomb
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From dalalhina at gmail.com  Fri Jan 22 12:31:09 2010
From: dalalhina at gmail.com (hina dalal)
Date: Fri, 22 Jan 2010 17:31:09 +0000
Subject: [Bioperl-l] Bioperl installation failed
Message-ID: <425f75df1001220931t49f5c768j97d91d2dd1757f19@mail.gmail.com>

Hi


I have installed PERL from Activesate and now trying to install bioperl but
can not do it . Neither from PPM (it is showing error ?Ppm install failed:
404 not found?) nor from CPAN / manual installation. It is not allowing me
to download nmake, showing that ?the version of this file is not compatible
with the version of windows you are running. Check your computer system
information to see whether you need 32 bit or 64 bit of this program.? I am
using windows VISTA.


Please help.


Regards


Hina


From H.Dalal at sms.ed.ac.uk  Fri Jan 22 12:34:55 2010
From: H.Dalal at sms.ed.ac.uk (Hina Dalal)
Date: Fri, 22 Jan 2010 17:34:55 +0000
Subject: [Bioperl-l] BioPerl installation failed: please help
Message-ID: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>

Hi

I have installed PERL from Activesate and now trying to install  
bioperl but can not do it . Neither from PPM (it is showing error ?Ppm  
install failed: 404 not found?) nor from CPAN manual installation. It  
is not allowing me to download nmake, showing that ?the version of  
this file is not compatible with the version of windows you are  
running. Check your computer system information to see whether you  
need 32 bit or 64 bit of this program.?

Please help.

Regards

Hina


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jason at bioperl.org  Fri Jan 22 14:18:30 2010
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 22 Jan 2010 11:18:30 -0800
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
	<55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>
Message-ID: <59EC9331-FB2F-4338-AD58-2D501A528A18@bioperl.org>

Done, as of r16739. Look forward to the refactor work too.

-jason
On Jan 22, 2010, at 5:34 AM, Chris Fields wrote:

> Sounds good to me.  The warnings are a bit too tight on this module  
> anyway.
>
> I still think we have plans towards refactoring some of this, not  
> sure how far along they are:
>
> http://www.bioperl.org/wiki/Align_Refactor
>
> chris
>
> On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote:
>
>> I'm considering putting in allowable initialization parameter (and  
>> get/set) for Bio::AlignIO that would allow setting of the  
>> alphabet.  This is then passed to Bio::LocatableSeq creation so  
>> that _guess_alphabet isn't called. This will allow removal of  
>> warnings about empty sequences because _guess_alphabet won't be  
>> called on a sequence if we have explictly set the alphabet.
>>
>> This worked great on my local install and tests pass.  Any  
>> objections or concerns?
>>
>> basically it means when you make an AlignIO you can specify the  
>> alphabet i.e.
>>
>> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
>> file => 'genome.fasaln');
>>
>> I have some alignments with empty sequences and I think turning off  
>> the warnings is appropriate where I force the alphabet choice. It  
>> should also have a very modest speedup benefit too.
>>
>> -jason
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>> http://twitter.com/hyphaltip
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From cjfields at illinois.edu  Fri Jan 22 14:22:43 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 13:22:43 -0600
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
	<358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
	<358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
Message-ID: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>

This could exist, but should go into a general Utilities module.  Part of the Align refactoring was to pull a good number of the methods into a general utilities module, so this would fit into that category.

chris

On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote:

> Or to rephrase my answer, what is the closest way for the code below that
> already exists?
> 
> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:
> 
>> Is there/should be a 'have_pairwise_overlap' method similar to this?
>> 
>> # $seq1 and $seq3 have matching ids
>> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
>> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>> 
>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>> 
>> 
>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>> 
>>> May be something for the cook/scrapbook?
>>> 
>>> chris
>>> 
>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>> 
>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>>>> MAJ
>>>> 
>>>> use Bio::SeqIO;
>>>> use Bio::Seq;
>>>> use strict;
>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>>>> 
>>>> my $acc = $seqio->next_seq->seq ^ '-';
>>>> while ($_ = $seqio->next_seq ) {
>>>>  $acc ^= ($_->seq ^ '-');
>>>> }
>>>> my $mrg = Bio::Seq->new( -id => 'merged',
>>>>  -seq => $acc ^ '-' );
>>>> 1;
>>>> 
>>>> 
>>>> __END__
>>>>> seq2.234
>>>> QWERTYU-------------------
>>>>> seq2.345
>>>> ----------ASDFGH----------
>>>>> seq2.456
>>>> -------------------ZXCVBNM
>>>> 
>>>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>>>> 
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Friday, January 22, 2010 8:07 AM
>>>> Subject: [Bioperl-l] Merging fragments in a simplealign
>>>> 
>>>> 
>>>>> Hi,
>>>>> I would like to write a script that merges fragments in a
>>> Bio::SimpleAlign
>>>>> object on the basis of
>>>>> some $seq->display_name rule.
>>>>> I basically want to start with something like this:
>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>> seq2.234     QWERTYU-------------------
>>>>> seq2.345     ----------ASDFGH----------
>>>>> seq2.456     -------------------ZXCVBNM
>>>>> And end with something like this:
>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>>>> Cheers,
>>>>> Albert.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Fri Jan 22 14:29:07 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 14:29:07 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com><EF1FEC1B43C146B6BBF827EA56171777@NewLife><058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu><358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com><358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
	<14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>
Message-ID: <0F7B7E5FE70D4C5CB34B27045561823C@NewLife>

I'd recommend making an enhancement request via Bugzilla, so we don't forget-
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Albert Vilella" <avilella at gmail.com>
Cc: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 2:22 PM
Subject: Re: [Bioperl-l] Merging fragments in a simplealign


> This could exist, but should go into a general Utilities module.  Part of the 
> Align refactoring was to pull a good number of the methods into a general 
> utilities module, so this would fit into that category.
>
> chris
>
> On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote:
>
>> Or to rephrase my answer, what is the closest way for the code below that
>> already exists?
>>
>> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:
>>
>>> Is there/should be a 'have_pairwise_overlap' method similar to this?
>>>
>>> # $seq1 and $seq3 have matching ids
>>> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
>>> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>>>
>>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>>>
>>>
>>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>>>
>>>> May be something for the cook/scrapbook?
>>>>
>>>> chris
>>>>
>>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>>>
>>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>>>>> MAJ
>>>>>
>>>>> use Bio::SeqIO;
>>>>> use Bio::Seq;
>>>>> use strict;
>>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>>>>>
>>>>> my $acc = $seqio->next_seq->seq ^ '-';
>>>>> while ($_ = $seqio->next_seq ) {
>>>>>  $acc ^= ($_->seq ^ '-');
>>>>> }
>>>>> my $mrg = Bio::Seq->new( -id => 'merged',
>>>>>  -seq => $acc ^ '-' );
>>>>> 1;
>>>>>
>>>>>
>>>>> __END__
>>>>>> seq2.234
>>>>> QWERTYU-------------------
>>>>>> seq2.345
>>>>> ----------ASDFGH----------
>>>>>> seq2.456
>>>>> -------------------ZXCVBNM
>>>>>
>>>>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>>>>>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Friday, January 22, 2010 8:07 AM
>>>>> Subject: [Bioperl-l] Merging fragments in a simplealign
>>>>>
>>>>>
>>>>>> Hi,
>>>>>> I would like to write a script that merges fragments in a
>>>> Bio::SimpleAlign
>>>>>> object on the basis of
>>>>>> some $seq->display_name rule.
>>>>>> I basically want to start with something like this:
>>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>>> seq2.234     QWERTYU-------------------
>>>>>> seq2.345     ----------ASDFGH----------
>>>>>> seq2.456     -------------------ZXCVBNM
>>>>>> And end with something like this:
>>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>>>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>>>>> Cheers,
>>>>>> Albert.
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Fri Jan 22 14:33:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 14:33:41 -0500
Subject: [Bioperl-l] BioPerl installation failed: please help
In-Reply-To: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>
References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>
Message-ID: <2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife>

Hina-- 
See the protocol at 
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
for ActiveState installation. If it doesn't work, please let us know at which 
step the failure happened.
cheers, MAJ
----- Original Message ----- 
From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 12:34 PM
Subject: [Bioperl-l] BioPerl installation failed: please help


Hi

I have installed PERL from Activesate and now trying to install
bioperl but can not do it . Neither from PPM (it is showing error "Ppm
install failed: 404 not found") nor from CPAN manual installation. It
is not allowing me to download nmake, showing that "the version of
this file is not compatible with the version of windows you are
running. Check your computer system information to see whether you
need 32 bit or 64 bit of this program."

Please help.

Regards

Hina


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Fri Jan 22 15:13:15 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 15:13:15 -0500
Subject: [Bioperl-l] BioPerl installation failed: please help
In-Reply-To: <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk>
References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk><2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife>
	<20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk>
Message-ID: <9E5DE384E2C8416B8373E390ABDB7DFE@NewLife>

Ok Hina,
I'm not seeing any issues with the presence or availability of 
http://bioperl.org/DIST
from my machine. Can you access that url in a browser? If not, the king of the 
King's
Buildings may not be allowing access. Also, can you do the following:

C:> ppm-shell
ppm> repo list

Note the number of the repo that corresponds to bioperl (if any) and do

ppm> repo describe n

where 'n' is that number, and send the output along.

cheers, MAJ

----- Original Message ----- 
From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Sent: Friday, January 22, 2010 3:01 PM
Subject: Re: [Bioperl-l] BioPerl installation failed: please help


Hi Mark

warm regards

I was following that protocol only , but the problem is when I tried
to do it from PPM, and when I reach at the stem install BioPerl, it is
showing error "Ppm
install failed: 404 not found" in the end. and when I tried it by CPAN
/manual installation, I couldn't download nmake,its showing that "the
version of
this file is not compatible with the version of windows you are
running. Check your computer system information to see whether you
need 32 bit or 64 bit of this program and than contact the software
publisher."


What should I do? Please help.

Regards

Hina


Quoting "Mark A. Jensen" <maj at fortinbras.us>:

> Hina-- See the protocol at
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
> for ActiveState installation. If it doesn't work, please let us know at
> which step the failure happened.
> cheers, MAJ
> ----- Original Message ----- From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 22, 2010 12:34 PM
> Subject: [Bioperl-l] BioPerl installation failed: please help
>
>
> Hi
>
> I have installed PERL from Activesate and now trying to install
> bioperl but can not do it . Neither from PPM (it is showing error "Ppm
> install failed: 404 not found") nor from CPAN manual installation. It
> is not allowing me to download nmake, showing that "the version of
> this file is not compatible with the version of windows you are
> running. Check your computer system information to see whether you
> need 32 bit or 64 bit of this program."
>
> Please help.
>
> Regards
>
> Hina
>
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From pengyu.ut at gmail.com  Sun Jan 24 20:29:59 2010
From: pengyu.ut at gmail.com (Peng Yu)
Date: Sun, 24 Jan 2010 19:29:59 -0600
Subject: [Bioperl-l] Transcribe in bioperl
Message-ID: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>

I found the function 'translate' in bioperl. But I don't find
'transcribe'. Is there such a function?


From jason at bioperl.org  Sun Jan 24 21:06:48 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 24 Jan 2010 18:06:48 -0800
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
Message-ID: <BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>

What exactly do you want to do?
spliced_seq for a feature would be the closest thing...

-jason
On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:

> I found the function 'translate' in bioperl. But I don't find
> 'transcribe'. Is there such a function?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From pengyu.ut at gmail.com  Sun Jan 24 21:22:12 2010
From: pengyu.ut at gmail.com (Peng Yu)
Date: Sun, 24 Jan 2010 20:22:12 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
	<BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
Message-ID: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>

To convert from T to U. I could use perl's builtin function. But it is
semantically far away from 'transcribe'. If there is a function with
name 'transcribe', it will be better.

On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
> What exactly do you want to do?
> spliced_seq for a feature would be the closest thing...
>
> -jason
> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>
>> I found the function 'translate' in bioperl. But I don't find
>> 'transcribe'. Is there such a function?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
>
>


From maj at fortinbras.us  Sun Jan 24 21:48:33 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 24 Jan 2010 21:48:33 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
Message-ID: <FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>

Not a bad idea, a semantics-preserving/checking thing. 
transcribe() could return an object with alphabet == 'rna'
and the T's flipped, or bork if called against an object with alphbet != 'dna'.
I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to 
be stashed), if desired.

----- Original Message ----- 
From: "Peng Yu" <pengyu.ut at gmail.com>
To: "Jason Stajich" <jason at bioperl.org>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 24, 2010 9:22 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


> To convert from T to U. I could use perl's builtin function. But it is
> semantically far away from 'transcribe'. If there is a function with
> name 'transcribe', it will be better.
> 
> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>> What exactly do you want to do?
>> spliced_seq for a feature would be the closest thing...
>>
>> -jason
>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>
>>> I found the function 'translate' in bioperl. But I don't find
>>> 'transcribe'. Is there such a function?
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>> http://twitter.com/hyphaltip
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Sun Jan 24 23:39:43 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 22:39:43 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
	<FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
Message-ID: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>

I think the main reason there hasn't been a transcribe() is that very few users ask for it.  Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() and/or translate() (i.e. they don't care about the intermediate mRNA).  I don't have a problem with adding a transcribe method to PrimarySeq, but (and Mark has already picked up on this) it should be constrained to DNA only and return RNA.  And there might be a case for adding the analogous reverse_translate().  

Also worth adding this to the proper interface class (PrimarySeqI, I think) so all Seq/PrimarySeq will have it (or have to implement their own).

chris

On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote:

> Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna'
> and the T's flipped, or bork if called against an object with alphbet != 'dna'.
> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired.
> 
> ----- Original Message ----- From: "Peng Yu" <pengyu.ut at gmail.com>
> To: "Jason Stajich" <jason at bioperl.org>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Sunday, January 24, 2010 9:22 PM
> Subject: Re: [Bioperl-l] Transcribe in bioperl
> 
> 
>> To convert from T to U. I could use perl's builtin function. But it is
>> semantically far away from 'transcribe'. If there is a function with
>> name 'transcribe', it will be better.
>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>>> What exactly do you want to do?
>>> spliced_seq for a feature would be the closest thing...
>>> 
>>> -jason
>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>> 
>>>> I found the function 'translate' in bioperl. But I don't find
>>>> 'transcribe'. Is there such a function?
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> --
>>> Jason Stajich
>>> jason.stajich at gmail.com
>>> jason at bioperl.org
>>> http://fungalgenomes.org/
>>> http://twitter.com/hyphaltip
>>> 
>>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Sun Jan 24 23:43:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 22:43:07 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
	<FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
	<B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
Message-ID: <489E0B85-0BC3-45DB-8660-494CF69F35FF@illinois.edu>


On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:

> ...And there might be a case for adding the analogous reverse_translate().  

Bah.  Meant reverse_transcribe().  Ah well.

chris


From dan.kortschak at adelaide.edu.au  Mon Jan 25 00:33:28 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 25 Jan 2010 16:03:28 +1030
Subject: [Bioperl-l] BEDTools module
Message-ID: <1264397608.4898.9.camel@epistle>

Hi All,

A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
and Ira Hall is now available in the bioperl-run subversion repository
(bioperl-run/trunk r16754).

Using BEDTools you can, among other things:

      * Intersecting two BED files in search of overlapping features.
      * Merging overlapping features.
      * Screening for paired-end (PE) overlaps between PE sequences and
        existing genomic features.
      * Calculating the depth and breadth of sequence coverage across
        defined "windows" in a genome.

(see <http://code.google.com/p/bedtools/> for manuals and downloads).

BEDTools is a suite of 17 commandline executable. The module attempts to
provide and options comprehensively and can return Bio::SeqIO or
Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
where specific handling has not been implemented - please give feedback
on desired features for this).

cheers
Dan


From cjfields at illinois.edu  Mon Jan 25 00:35:06 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 23:35:06 -0600
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
Message-ID: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>

Just a quick question for those using DNAStatistics.  I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data:

>seq1
GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>seq2
GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>seq3
GGTACCAGCAGGTGGTCCGCCTA------------------------------
>seq4
--------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC

Since seq3 and seq4 don't overlap, the distance can't be calculated.  In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage.  Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere?

chris


From jason at bioperl.org  Mon Jan 25 00:58:03 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 24 Jan 2010 21:58:03 -0800
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
In-Reply-To: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
Message-ID: <B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>

It could also return -1 which is used as place holder for NA in other  
programs that generate distance matrices.
-jason
On Jan 24, 2010, at 9:35 PM, Chris Fields wrote:

> Just a quick question for those using DNAStatistics.  I just fixed a  
> bug in Bio::Align::DNAStatistics that failed with a div by zero  
> error (bug 2901) on this data:
>
>> seq1
> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>> seq2
> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>> seq3
> GGTACCAGCAGGTGGTCCGCCTA------------------------------
>> seq4
> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC
>
> Since seq3 and seq4 don't overlap, the distance can't be  
> calculated.  In our case, I replace the score with 'NA' as a  
> placeholder, but I'm worried about downstream app breakage.  Anyone  
> have an objection to using 'NA' here, or know of ways this may lead  
> to problems elsewhere?
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From maj at fortinbras.us  Mon Jan 25 08:17:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 08:17:54 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org><366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com><FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
	<B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
Message-ID: <ED0F320909EF4DB99FF0C91423F83209@NewLife>

transcribe() and rev_transcribe added to Bio::PrimarySeqI, plus tests in 
t/Seq.t, @ r16757
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Peng Yu" <pengyu.ut at gmail.com>
Sent: Sunday, January 24, 2010 11:39 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


>I think the main reason there hasn't been a transcribe() is that very few users 
>ask for it.  Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() 
>and/or translate() (i.e. they don't care about the intermediate mRNA).  I don't 
>have a problem with adding a transcribe method to PrimarySeq, but (and Mark has 
>already picked up on this) it should be constrained to DNA only and return RNA. 
>And there might be a case for adding the analogous reverse_translate().
>
> Also worth adding this to the proper interface class (PrimarySeqI, I think) so 
> all Seq/PrimarySeq will have it (or have to implement their own).
>
> chris
>
> On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote:
>
>> Not a bad idea, a semantics-preserving/checking thing. transcribe() could 
>> return an object with alphabet == 'rna'
>> and the T's flipped, or bork if called against an object with alphbet != 
>> 'dna'.
>> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to 
>> be stashed), if desired.
>>
>> ----- Original Message ----- From: "Peng Yu" <pengyu.ut at gmail.com>
>> To: "Jason Stajich" <jason at bioperl.org>
>> Cc: <bioperl-l at lists.open-bio.org>
>> Sent: Sunday, January 24, 2010 9:22 PM
>> Subject: Re: [Bioperl-l] Transcribe in bioperl
>>
>>
>>> To convert from T to U. I could use perl's builtin function. But it is
>>> semantically far away from 'transcribe'. If there is a function with
>>> name 'transcribe', it will be better.
>>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>> What exactly do you want to do?
>>>> spliced_seq for a feature would be the closest thing...
>>>>
>>>> -jason
>>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>>>
>>>>> I found the function 'translate' in bioperl. But I don't find
>>>>> 'transcribe'. Is there such a function?
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at gmail.com
>>>> jason at bioperl.org
>>>> http://fungalgenomes.org/
>>>> http://twitter.com/hyphaltip
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Mon Jan 25 08:23:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 07:23:12 -0600
Subject: [Bioperl-l] BEDTools module
In-Reply-To: <1264397608.4898.9.camel@epistle>
References: <1264397608.4898.9.camel@epistle>
Message-ID: <0F5CE93E-0E6C-4317-806B-A463A9B0917E@illinois.edu>

Great work Dan!  

chris

On Jan 24, 2010, at 11:33 PM, Dan Kortschak wrote:

> Hi All,
> 
> A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
> and Ira Hall is now available in the bioperl-run subversion repository
> (bioperl-run/trunk r16754).
> 
> Using BEDTools you can, among other things:
> 
>      * Intersecting two BED files in search of overlapping features.
>      * Merging overlapping features.
>      * Screening for paired-end (PE) overlaps between PE sequences and
>        existing genomic features.
>      * Calculating the depth and breadth of sequence coverage across
>        defined "windows" in a genome.
> 
> (see <http://code.google.com/p/bedtools/> for manuals and downloads).
> 
> BEDTools is a suite of 17 commandline executable. The module attempts to
> provide and options comprehensively and can return Bio::SeqIO or
> Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
> where specific handling has not been implemented - please give feedback
> on desired features for this).
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 25 08:27:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 07:27:26 -0600
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
In-Reply-To: <B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>
References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
	<B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>
Message-ID: <D46CA8B2-780B-4AA5-B9E3-07EADC0D79C1@illinois.edu>

That works for me, just want to ensure we're DTRT.  I'll change it over.

chris

On Jan 24, 2010, at 11:58 PM, Jason Stajich wrote:

> It could also return -1 which is used as place holder for NA in other programs that generate distance matrices.
> -jason
> On Jan 24, 2010, at 9:35 PM, Chris Fields wrote:
> 
>> Just a quick question for those using DNAStatistics.  I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data:
>> 
>>> seq1
>> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>>> seq2
>> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>>> seq3
>> GGTACCAGCAGGTGGTCCGCCTA------------------------------
>>> seq4
>> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC
>> 
>> Since seq3 and seq4 don't overlap, the distance can't be calculated.  In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage.  Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere?
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jan 25 08:41:38 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 08:41:38 -0500
Subject: [Bioperl-l] BEDTools module
In-Reply-To: <1264397608.4898.9.camel@epistle>
References: <1264397608.4898.9.camel@epistle>
Message-ID: <8D494783F87E4C32BD797008E260C3C2@NewLife>

Rock 'n' roll, Dan!
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 12:33 AM
Subject: [Bioperl-l] BEDTools module


> Hi All,
> 
> A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
> and Ira Hall is now available in the bioperl-run subversion repository
> (bioperl-run/trunk r16754).
> 
> Using BEDTools you can, among other things:
> 
>      * Intersecting two BED files in search of overlapping features.
>      * Merging overlapping features.
>      * Screening for paired-end (PE) overlaps between PE sequences and
>        existing genomic features.
>      * Calculating the depth and breadth of sequence coverage across
>        defined "windows" in a genome.
> 
> (see <http://code.google.com/p/bedtools/> for manuals and downloads).
> 
> BEDTools is a suite of 17 commandline executable. The module attempts to
> provide and options comprehensively and can return Bio::SeqIO or
> Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
> where specific handling has not been implemented - please give feedback
> on desired features for this).
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From rtbio.2009 at gmail.com  Mon Jan 25 08:43:19 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 14:43:19 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
Message-ID: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>

Hello Mark,Chris and all,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");
- Show quoted text -


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".
time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


From rtbio.2009 at gmail.com  Mon Jan 25 08:44:57 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 14:44:57 +0100
Subject: [Bioperl-l] remote blast bioperl
Message-ID: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>

Hello all,

I have a small problem again. I am working on Remote blast. The program
works well. But the problem is this.  The program accesses the server and
gets the output correctly. I am trying to send the result sequences into an
array and I found that always the first sequence among the Result sequences
is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");
- Show quoted text -


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".
time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


From cjfields at illinois.edu  Mon Jan 25 09:05:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 08:05:44 -0600
Subject: [Bioperl-l] remote blast bioperl
In-Reply-To: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>
References: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>
Message-ID: <7E402CC5-9C66-4315-B437-7C4EC2317371@illinois.edu>

Roopa,

We have received all 4+ of your posts.  There is absolutely no need for you to keep repeatedly posting the same thing to the list.  Be patient, we'll try to get to you as soon as we can!

chris

On Jan 25, 2010, at 7:44 AM, Roopa Raghuveer wrote:

> Hello all,
> 
> I have a small problem again. I am working on Remote blast. The program works well. But the problem is this.  The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is
> 
>  my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]");
> - Show quoted text -
> 
> 
> while (my $input = $str->next_seq())
> {
>    #Blast a sequence against a database:
>     #Alternatively, you could  pass in a file with many
>     #sequences rather than loop through sequence one at a time
>     #Remove the loop starting 'while (my $input = $str->next_seq())'
>     #and swap the two lines below for an example of that.
> 
>              open(OUTFILE,'>',$debugfile);
>                print OUTFILE $input;
>               close(OUTFILE);
> 
> 
>    my $r = $factory->submit_blast($input);
> 
>                 open(OUTFILE,'>',$debugfile);
>              #   print OUTFILE $r;
>                 close(OUTFILE);
> 
> 
>    print STDERR "waiting...." if($v>0);
> 
>   while ( my @rids = $factory->each_rid ) {
>       open(OUTFILE,'>',$debugfile);
>             #   print OUTFILE "while entered";
>               close(OUTFILE);
>      foreach my $rid ( @rids ) {
> 
>                open(OUTFILE,'>',$debugfile);
>  #  print OUTFILE "foreach entered";
>               close(OUTFILE);
> 
>         my $rc = $factory->retrieve_blast($rid);
> 
>         if( !ref($rc) )
>         {
>         if( $rc < 0 )
>         {
>         $factory->remove_rid($rid);
>         }
>          open(OUTFILE,'>',$debugfile);
>               # print OUTFILE "if entered";
>               close(OUTFILE);
>          print STDERR "." if ( $v > 0 );
>          sleep 5;
>         }
>        else {
>               open(OUTFILE,'>',$debugfile);
>               # print OUTFILE "else entered";
>               close(OUTFILE);
> 
>           my $result = $rc->next_result();
>          #save the output
>         $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>           open(BLASTDEBUGFILE,'>',$blastdebugfile);
>           print BLASTDEBUGFILE $result->next_hit();
>           close(BLASTDEBUGFILE);
> 
>         my $filename = $serverpath."/blastdata_".
> time()."\.out";
> 
> 
>          # open(DEBUGFILE,'>',$debugfile);
>          # open(new,'>',$filename);
>          # @arra=<new>;
>          # print DEBUGFILE @arra;
>          # close(DEBUGFILE);
>          # close(new);
> 
>          $factory->save_output($filename);
> 
>        # open(BLASTDEBUGFILE,'>',$debugfile);
>        # print BLASTDEBUGFILE  "Hello $rid";
>        # close(BLASTDEBUGFILE);
> 
>        $factory->remove_rid($rid);
> 
>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>        print BLASTDEBUGFILE  $organism;
>         close(BLASTDEBUGFILE);
> 
>     # open(OUTFILE,'>',$outfile);
>     # print OUTFILE "Test2 $result->database_name()";
>     # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> $dummy=0;
> 
>    while ( my $hit = $result->next_hit ) {
> 
>             next unless ( $v >= 0);
> 
>           #     open(OUTFILE,'>',$debugfile);
>            #    print OUTFILE "$hit in while hits";
>             #  close(OUTFILE);
>  my $sequ = $gb->get_Seq_by_version($hit->name);
>            my $dna = $sequ->seq(); # get the sequence as a string
>         $dummy++;
>              open(OUTFILE,'>',$debugfile);
>           #     print OUTFILE $dummy;
>               close(OUTFILE);
>           push(@seqs,$dna);
>          }
>         }
>       }
>     }
>   }
> 
> $warum=@seqs;
>  open(OUTFILE,'>',$debugfile);
>              #  print OUTFILE $warum;
>                print OUTFILE @seqs;
> 
>               close(OUTFILE);
> return(@seqs);
> }
> 
> open(OUTFILE, '>',$outfile) || die ;
> 
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <meta http-equiv=\"expires\" content=\"0\"></head>\n
> <body>\n
> <p><font face=\"Courier, monospace font set\">
> Inputsequence: <br>";
> 
> 
> Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was  3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences.
> 
> Please help me in sorting out this problem.
> 
> Regards,
> Roopa.


From jiann-jy at hotmail.com  Sun Jan 24 21:03:55 2010
From: jiann-jy at hotmail.com (JY)
Date: Sun, 24 Jan 2010 18:03:55 -0800 (PST)
Subject: [Bioperl-l] how to retrieve accession number by taxon id??
Message-ID: <4cef88b5-fa53-4e63-9167-30075c10a058@k19g2000yqc.googlegroups.com>

i need to retrieve accession number and sequence to complete one of my
part in my project, but how to retrieve accession number  by the taxon
id.


From lpaulet at ual.es  Mon Jan 25 15:25:55 2010
From: lpaulet at ual.es (Lorenzo Carretero-Paulet)
Date: Mon, 25 Jan 2010 21:25:55 +0100
Subject: [Bioperl-l] HTMLResultWriter
Message-ID: <4B5DFE53.2000201@ual.es>

Hi all,

I'm trying to generate a subroutine that performs a BLAST search and 
returns the corresponding reports in txt, xml and html format. I?m 
experiencing problems with the latter, as the program returns the 
following error message:

"Can't call method "next_result" without a package or object reference 
at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e 
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e 
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                 -file   => ">$outputfilenameH");
while( my $result = _$blast_report_->next_result ) { # get a result from 
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


From lpaulet at ual.es  Mon Jan 25 15:31:08 2010
From: lpaulet at ual.es (lpaulet at ual.es)
Date: Mon, 25 Jan 2010 21:31:08 +0100
Subject: [Bioperl-l] HTMLResultWriter
Message-ID: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>

Hi all,

I'm trying to generate a subroutine that performs a BLAST search and  
returns the corresponding reports in txt, xml and html format. I?m  
experiencing problems with the latter, as the program returns the  
following error message:

"Can't call method "next_result" without a package or object reference at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my $blast_report =qx/$blast -p blastp -d $database -i $query -e  
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e  
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                  -file   => ">$outputfilenameH");
while( my $result = $blast_report->next_result ) { # get a result from  
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


From dan.kortschak at adelaide.edu.au  Mon Jan 25 16:00:37 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 26 Jan 2010 07:30:37 +1030
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
Message-ID: <1264453237.4552.3.camel@epistle>

A reverse_translate to IUPAC degenerate codes is not a bad idea,
particularly for PCR primer design.

Dan

On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org
wrote:
> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:
> 
> > ...And there might be a case for adding the analogous
> reverse_translate().  
> 
> Bah.  Meant reverse_transcribe().  Ah well.
> 
> chris


From maj at fortinbras.us  Mon Jan 25 16:07:49 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 16:07:49 -0500
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>
References: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>
Message-ID: <F5772AAC495D475DBEEEF2311B16F941@NewLife>

Lorenzo--
your $blast_report is set to be (some of) the text returned
by a system call of a blast program; this isn't going to be
an object of any kind, and so no functions can be
called from it (as at "$blast_report->next_result"). You need
to parse the text generated by the blast call using Bio::SearchIO
to get a Bio::Search::Result::BlastResult object.
you could do

@blast_lines = qx/ ...your blast call... /;
open my $bf, ">my.blast";
print $bf, @blast_lines;
close $bf;
$blast_result = Bio::SearchIO->new(-file=>'my.blast',
                                                        -format => 'blast');

and carry on from there. But why not look at
Bio::Tools::Run::StandAloneBlast or
Bio::Tools::Run::StandAloneBlastPlus
to run your blasts within perl? These wrap the blast
programs and deliver BioPerl objects, rather than
plain text output.
cheers MAJ
----- Original Message ----- 
From: <lpaulet at ual.es>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 3:31 PM
Subject: [Bioperl-l] HTMLResultWriter


Hi all,

I'm trying to generate a subroutine that performs a BLAST search and
returns the corresponding reports in txt, xml and html format. I?m
experiencing problems with the latter, as the program returns the
following error message:

"Can't call method "next_result" without a package or object reference at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my $blast_report =qx/$blast -p blastp -d $database -i $query -e
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                  -file   => ">$outputfilenameH");
while( my $result = $blast_report->next_result ) { # get a result from
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Mon Jan 25 16:09:24 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 25 Jan 2010 22:09:24 +0100
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <4B5DFE53.2000201@ual.es>
References: <4B5DFE53.2000201@ual.es>
Message-ID: <FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>

> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/;

> while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory


_$blast_report_ is not a valid variable name, as far as I know. Plus there's a space between report and the final '_' in the first of the above two lines.

Does this code compile?

Dave


From Russell.Smithies at agresearch.co.nz  Mon Jan 25 16:14:15 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 26 Jan 2010 10:14:15 +1300
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
References: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>

That's a fair mix of incomplete code you've supplied!!
Did you read the documentation for RemoteBlast? The example there will do 99% of what you want.
http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm

I'm not entirely sure what you're trying to do (as you've left out a bit of your code) but I assume you're trying to retrieve and print the sequence for each hit.

Here's something that works, not sure exactly what/why you want to print but it should get you a bit further.

--Russell


================================
#!perl -w

use Bio::Tools::Run::RemoteBlast;
use Bio::DB::GenBank;

use CGI ':standard';

use strict;

my $q = new CGI;

my @params = (
               -prog         => 'blastn',
               -data         => 'nr',
               -expect       => '1e-30',
               -entrez_query => 'Homo sapiens [ORGN]',
               -readmethod   => 'SearchIO'
);

my $gb = Bio::DB::GenBank->new;

my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

#$v is just to turn on and off the messages
my $v = 1;

my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" );

while ( my $input = $str->next_seq() ) {

  my $r = $factory->submit_blast($input);

  print STDERR "waiting..." if ( $v > 0 );
  while ( my @rids = $factory->each_rid ) {
    foreach my $rid (@rids) {
      my @seqs = ();
      my $rc   = $factory->retrieve_blast($rid);
      if ( !ref($rc) ) {
        if ( $rc < 0 ) {
          $factory->remove_rid($rid);
        }
        print STDERR "." if ( $v > 0 );
        sleep 5;
      }
      else {
        my $result = $rc->next_result();

        #save the blast output
        my $filename = $result->query_accession . '.out';
        $factory->save_output($filename);
        $factory->remove_rid($rid);
        print "\nQuery Name: ", $result->query_name(), "\n";
        while ( my $hit = $result->next_hit ) {

          # store the hit sequences
          push @seqs, $gb->get_Seq_by_version( $hit->name );

          next unless ( $v > 0 );
          print "\thit name is ", $hit->name, "\n";
          while ( my $hsp = $hit->next_hsp ) {
            print "\t\tscore is ", $hsp->score, "\n";
          }
        }

        ## print the seqs you've retrieved??
        open( OUTFILE, '>', $result->query_accession . '.htm' );
        print OUTFILE $q->start_html('RNAi Result'),
          $q->h1('RNAi Result'),
          $q->h2('Input'),
          $q->pre( toString($input) ),
          $q->h2('Output');

        foreach (@seqs) {

          #there's probably a better way of printing the seq
          print OUTFILE $q->pre( toString($_) );
        }
        print OUTFILE $q->end_html;
        close OUTFILE;
      }
    }
  }
}

sub toString {
  my $s = shift;
  return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq;
}


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From biopython at maubp.freeserve.co.uk  Mon Jan 25 16:24:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 25 Jan 2010 21:24:33 +0000
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264453237.4552.3.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
Message-ID: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>

On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak
<dan.kortschak at adelaide.edu.au> wrote:
> A reverse_translate to IUPAC degenerate codes is not a bad idea,
> particularly for PCR primer design.

I would say it could be a bad idea. For any protein string there are
multiple possible back translations, and this cannot be captured
fully as a nucleotide string even using the IUPAC ambiguity chars.

We debated this back and forth for Biopython, and decided to leave it
out. It wasn't possible for a simple back translate to a simple string to
handle the use cases we considered, and other options like returning
a regular expression covering all possible back translations were too
complex (for a core sequence method/function).

Peter


From jason at bioperl.org  Mon Jan 25 16:26:55 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 25 Jan 2010 13:26:55 -0800
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
Message-ID: <98995830-DC7F-4404-A216-874EF5799DB6@bioperl.org>

It was already implemented several years ago -- reverse_translate  
Bio::Tools::CodonTable -> revtanslate


   my $seqobj    = Bio::PrimarySeq->new(-seq => 'FHGERHEL');
   my $iupac_str = $myCodonTable->reverse_translate_all($seqobj);


Chris had meant to say reverse_transcribe of RNA -> DNA FWIW.

-jason
On Jan 25, 2010, at 1:24 PM, Peter wrote:

> On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak
> <dan.kortschak at adelaide.edu.au> wrote:
>> A reverse_translate to IUPAC degenerate codes is not a bad idea,
>> particularly for PCR primer design.
>
> I would say it could be a bad idea. For any protein string there are
> multiple possible back translations, and this cannot be captured
> fully as a nucleotide string even using the IUPAC ambiguity chars.
>
> We debated this back and forth for Biopython, and decided to leave it
> out. It wasn't possible for a simple back translate to a simple  
> string to
> handle the use cases we considered, and other options like returning
> a regular expression covering all possible back translations were too
> complex (for a core sequence method/function).
>
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From maj at fortinbras.us  Mon Jan 25 16:19:24 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 16:19:24 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264453237.4552.3.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
Message-ID: <72B106F0D5FF4F1E858CC9BD1EF33142@NewLife>

I think we have that functionality in Bio::Tools::SeqPattern, 
courtesy of Bruno V---
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 4:00 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


>A reverse_translate to IUPAC degenerate codes is not a bad idea,
> particularly for PCR primer design.
> 
> Dan
> 
> On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org
> wrote:
>> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:
>> 
>> > ...And there might be a case for adding the analogous
>> reverse_translate().  
>> 
>> Bah.  Meant reverse_transcribe().  Ah well.
>> 
>> chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From dan.kortschak at adelaide.edu.au  Mon Jan 25 16:38:44 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 26 Jan 2010 08:08:44 +1030
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
Message-ID: <1264455524.4552.23.camel@epistle>

Good to see that these ideas have been considered.

I'd be interested to see this discussion, or at least the point dealing
with the problems that might arise. I'm at a loss as to how ambiguity
codes can't completely describe all possible coding sequences for any
given codon table (via Bio::Tools::CodonTable - in fact this already has
the revtranslate that could be fitted into a Bio::PrimarySeq method - to
answer Mark and Jason's comments, I think that /if/ a reverse_translate
method exists, it makes logical sense to have it tied to a sequence
object, calling the B:T:CT method on the seq object itself rather than
only in Bio::Tools, 2?). Pete, tcn you provide an example of the
problems?

thanks
Dan

On Mon, 2010-01-25 at 21:24 +0000, Peter wrote:
> I would say it could be a bad idea. For any protein string there are
> multiple possible back translations, and this cannot be captured
> fully as a nucleotide string even using the IUPAC ambiguity chars.


From lpaulet at ual.es  Mon Jan 25 16:53:07 2010
From: lpaulet at ual.es (lpaulet at ual.es)
Date: Mon, 25 Jan 2010 22:53:07 +0100
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>
References: <4B5DFE53.2000201@ual.es>
	<FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>
Message-ID: <20100125225307.2zl2cn2hkcsgccso@webmail.ual.es>

Thanks Dave and Mark.

Quoting Dave Messina <David.Messina at sbc.su.se>:

>> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e   
>> $E_value -b 20000 -o $outputfilenameB/;
>
>> while( my $result = _$blast_report_->next_result ) { # get a result  
>>  from Bio::SearchIO parsing or build it up in memory
>
>
> _$blast_report_ is not a valid variable name, as far as I know. Plus  
>  there's a space between report and the final '_' in the first of  
> the  above two lines.
>
> Does this code compile?
>
> Dave
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From rtbio.2009 at gmail.com  Mon Jan 25 17:35:32 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 23:35:32 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>
References: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>
Message-ID: <c7cac1601001251435k7b75ffbbj64cfa36faf8d89bb@mail.gmail.com>

Hello Russell,

Thank you very much for your reply. My problem is that Remote blast is
getting well executed with my code and I am getting the .out file with
sequences producing significant alignments. But, when I am trying to
retrieve the sequences into an array @seqs, I am able to retrieve all the
sequences except for the first hit. If the number of hits that I get in the
.out file to be 3, I am able to retrieve only 2 hits i.e., I am able to get
only 2 sequences. If there is only one significant hit for my sequence, then
the name and description of the sequence appears in the .out file, but I am
unable to get it into the array,the array count shows 0 and there would not
be any sequence in the array.

I hope that you have got me now.

Here comes my code,

use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds.";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";


open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes <br>
 This page will automatically reload in 30 seconds  <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'},$in{'Organism'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1,"\n";
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

            # open(OUTFILE,'>',$debugfile);
             #  print OUTFILE @params;
             # close(OUTFILE);


my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
"$organ\[ORGN]");

 #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");

while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);
 my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {


        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
       print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output

      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);


       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dna;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=scalar(@seqs);
              open(OUTFILE,'>',$debugfile);
               print OUTFILE $warum;
             #  print OUTFILE @seqs;
              close(OUTFILE);
      return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

        print OUTFILE substr ($in{'Inputseq'}, $i, 1);

        if ( ($i+1)%10==0){
                print OUTFILE " ";
        }
        if ( ($i+1)%60==0){
                print OUTFILE "<br>\n";
        }
}


print OUTFILE "</font> <p>";

$z=@compseqs;

for($k=0;$k<$z;$k++) {
        print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
Sequence: <br>";

        for ($i=0; $i<length ($compseqs[$k]); $i++) {

                print OUTFILE substr ($compseqs[$k], $i, 1);

                if ( ($i+1)%10==0){
                        print OUTFILE " ";
                }
                if ( ($i+1)%60==0){
                        print OUTFILE "<br>\n";
                }
        }
        print OUTFILE "<p></font>";
}

print OUTFILE "<p>
Window: <br>$in{'Windowsize'}
<p>
<p>
Threshold: <br>$in{'Threshold'}
<p>";
my $j=0;

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

        if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
                if ($out[$i]->{similar}<=$in{'Threshold'}){
                        $j=$in{'Windowsize'};
                }
                $height=$out[$i]->{similar}*5;
        }

        if ($j>0) {
                print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
height=\"5\">";
                $outstring .= "<font color=\"green\">".substr
($in{'Inputseq'}, $i, 1)."</font>";
                $j--;
        }
        else {
                print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
height=\"5\">";
                $outstring .= "<font color=\"red\">".substr
($in{'Inputseq'}, $i, 1)."</font>";
        }

        if ( ($i+1)%10==0){
                $outstring .= " ";
        }
        if ( ($i+1)%60==0){
                $outstring .= "<br>\n";

        }
        if ( ($i+1)%800==0){
                print OUTFILE "<br><br>\n";

        }
}

print OUTFILE "<br><br><font face=\"Courier, monospace font
set\">$outstring</font>";

#foreach (@out) {
#print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
#if ($_->{similar}<=$in{'Threshold'}){

#       }
#}

print OUTFILE "</BODY>\n</HTML>\n";

close OUTFILE;

#nameprint();

sub parse_form {
    local ($buffer, @pairs, $pair, $name, $value);
    # Read in text
    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
    if ($ENV{'REQUEST_METHOD'} eq "POST")
    {
      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    }
    else
    {
        $buffer = $ENV{'QUERY_STRING'};
    }
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs)
    {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $in{$name} = $value;
    }
}

Regards,
Roopa.


On Mon, Jan 25, 2010 at 10:14 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

> That's a fair mix of incomplete code you've supplied!!
> Did you read the documentation for RemoteBlast? The example there will do
> 99% of what you want.
> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm<http://search.cpan.org/%7Ecjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm>
>
> I'm not entirely sure what you're trying to do (as you've left out a bit of
> your code) but I assume you're trying to retrieve and print the sequence for
> each hit.
>
> Here's something that works, not sure exactly what/why you want to print
> but it should get you a bit further.
>
> --Russell
>
>
> ================================
> #!perl -w
>
> use Bio::Tools::Run::RemoteBlast;
> use Bio::DB::GenBank;
>
> use CGI ':standard';
>
> use strict;
>
> my $q = new CGI;
>
> my @params = (
>               -prog         => 'blastn',
>               -data         => 'nr',
>               -expect       => '1e-30',
>               -entrez_query => 'Homo sapiens [ORGN]',
>               -readmethod   => 'SearchIO'
> );
>
> my $gb = Bio::DB::GenBank->new;
>
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>
> #$v is just to turn on and off the messages
> my $v = 1;
>
> my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" );
>
> while ( my $input = $str->next_seq() ) {
>
>   my $r = $factory->submit_blast($input);
>
>   print STDERR "waiting..." if ( $v > 0 );
>  while ( my @rids = $factory->each_rid ) {
>     foreach my $rid (@rids) {
>      my @seqs = ();
>       my $rc   = $factory->retrieve_blast($rid);
>      if ( !ref($rc) ) {
>        if ( $rc < 0 ) {
>          $factory->remove_rid($rid);
>        }
>         print STDERR "." if ( $v > 0 );
>        sleep 5;
>      }
>      else {
>         my $result = $rc->next_result();
>
>         #save the blast output
>        my $filename = $result->query_accession . '.out';
>        $factory->save_output($filename);
>        $factory->remove_rid($rid);
>        print "\nQuery Name: ", $result->query_name(), "\n";
>         while ( my $hit = $result->next_hit ) {
>
>           # store the hit sequences
>          push @seqs, $gb->get_Seq_by_version( $hit->name );
>
>          next unless ( $v > 0 );
>          print "\thit name is ", $hit->name, "\n";
>          while ( my $hsp = $hit->next_hsp ) {
>            print "\t\tscore is ", $hsp->score, "\n";
>          }
>        }
>
>        ## print the seqs you've retrieved??
>        open( OUTFILE, '>', $result->query_accession . '.htm' );
>        print OUTFILE $q->start_html('RNAi Result'),
>          $q->h1('RNAi Result'),
>          $q->h2('Input'),
>          $q->pre( toString($input) ),
>          $q->h2('Output');
>
>        foreach (@seqs) {
>
>          #there's probably a better way of printing the seq
>          print OUTFILE $q->pre( toString($_) );
>        }
>        print OUTFILE $q->end_html;
>        close OUTFILE;
>      }
>    }
>  }
> }
>
> sub toString {
>  my $s = shift;
>  return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq;
> }
>
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>


From ajmackey at gmail.com  Tue Jan 26 08:24:43 2010
From: ajmackey at gmail.com (Aaron Mackey)
Date: Tue, 26 Jan 2010 08:24:43 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264455524.4552.23.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org> 
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> 
	<1264455524.4552.23.camel@epistle>
Message-ID: <24c96eca1001260524s3d46e850hfdcc461e22210972@mail.gmail.com>

There's also Bio::Tools::IUPAC; given a sequence with IUPAC ambiguity codes,
it provides a SeqIO stream that enumerates all the possible unambiguous
realizations.  Not the right solution for every situation, but quite useful
when you need it.

-Aaron


On Mon, Jan 25, 2010 at 4:38 PM, Dan Kortschak <
dan.kortschak at adelaide.edu.au> wrote:

> Good to see that these ideas have been considered.
>
> I'd be interested to see this discussion, or at least the point dealing
> with the problems that might arise. I'm at a loss as to how ambiguity
> codes can't completely describe all possible coding sequences for any
> given codon table (via Bio::Tools::CodonTable - in fact this already has
> the revtranslate that could be fitted into a Bio::PrimarySeq method - to
> answer Mark and Jason's comments, I think that /if/ a reverse_translate
> method exists, it makes logical sense to have it tied to a sequence
> object, calling the B:T:CT method on the seq object itself rather than
> only in Bio::Tools, 2?). Pete, tcn you provide an example of the
> problems?
>
> thanks
> Dan
>
> On Mon, 2010-01-25 at 21:24 +0000, Peter wrote:
> > I would say it could be a bad idea. For any protein string there are
> > multiple possible back translations, and this cannot be captured
> > fully as a nucleotide string even using the IUPAC ambiguity chars.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From nml5566 at gmail.com  Tue Jan 26 16:10:54 2010
From: nml5566 at gmail.com (Nathan Liles)
Date: Tue, 26 Jan 2010 15:10:54 -0600
Subject: [Bioperl-l] SVN access
Message-ID: <4B5F5A5E.2070406@gmail.com>

Does anyone know who I need to talk to for getting developer access for 
the Bioperl SVN? I want to submit a patch to the genbank2gff3 converter.

Thanks,
Nathan


From Russell.Smithies at agresearch.co.nz  Tue Jan 26 20:40:40 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 14:40:40 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>

Grrrrrr, I hate eutils!!!!

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
STACK: get_desc.pl:32
-----------------------------------------------------------


Nice error message though :-)


--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> Sent: Monday, 11 January 2010 10:05 a.m.
> To: 'Chris Fields'
> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> I've started to go off eUtils recently (not BioPerl's fault) as I've often
> been finding that with large queries, chunks of the resulting data is
> missing.
> For example, before Xmas I was creating species-specific databases by
> using eUtils to get a list of GI numbers back for a taxid, then retrieving
> the fasta sequences in chunks of 500.
> Very regularly, in the middle of the fasta there would be a message about
> resource unavailable eg.
>   >test_sequence_1
>   TACGATCATCGCTResource UnavailableTACGACTCTGCT
>   >test_sequence_2
>   TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> 
> Often this wasn't detected until formatdb complained about invalid
> characters.
> Inquiries to NCBI as to why this was happening and what to do about it
> returned stupid answers ("do each sequence manually thru the web
> interface", or "use eUtils").
> As we have a nice fast network connection, I now prefer to download very
> large gzip files (i.e. all of refseq) and extract what I need.
> 
> I can't help but think that NCBI could solve a lot of problems if they
> gzipped the output from eUtils queries - it's something I've requested
> regularly for the last 5 years or so!!
> 
> --Russell
> 
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Monday, 11 January 2010 9:50 a.m.
> > To: Smithies, Russell
> > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> >
> > One could also use Bio::DB::Taxonomy, which indexes the same files or
> > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
> > details).
> >
> > chris
> >
> > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >
> > > An alternate non-BioPerly way (that may be faster given NCBI's
> flakiness
> > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
> > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
> and
> > do lookups.
> > > In that same dir, taxdump.tar.gz contains a file called names.dmp
> which
> > lists taxids and descriptions (and synonyms)
> > >
> > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> > could do this:
> > >
> > >   my $taxid  = $gi_taxid_nucl{$accession};
> > >   my $org_name = $names{$taxid};
> > >
> > > --Russell
> > >
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > >> Sent: Saturday, 26 December 2009 4:52 p.m.
> > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >> number?
> > >>
> > >> Bhakti,
> > >> The following example (using EUtilities) may serve your purpose:
> > >>
> > >> use Bio::DB::EUtilities;
> > >>
> > >> my (%taxa, @taxa);
> > >> my (%names, %idmap);
> > >>
> > >> # these are protein ids; nuc ids will work by changing -dbfrom =>
> > >> 'nucleotide',
> > >> # (probably)
> > >>
> > >> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > >>
> > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > >>                                       -db => 'taxonomy',
> > >>                                       -dbfrom => 'protein',
> > >>                                       -correspondence => 1,
> > >>                                       -id => \@ids);
> > >>
> > >> # iterate through the LinkSet objects
> > >> while (my $ds = $factory->next_LinkSet) {
> > >>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > >> }
> > >>
> > >> @taxa = @taxa{@ids};
> > >>
> > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > >>        -db    => 'taxonomy',
> > >>        -id    => \@taxa );
> > >>
> > >> while (local $_ = $factory->next_DocSum) {
> > >>    $names{($_->get_contents_by_name('TaxId'))[0]} =
> > >> ($_->get_contents_by_name('ScientificName'))[0];
> > >> }
> > >>
> > >> foreach (@ids) {
> > >>    $idmap{$_} = $names{$taxa{$_}};
> > >> }
> > >>
> > >> # %idmap is
> > >> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > >> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > >> #    68536103 => 'Corynebacterium jeikeium K411'
> > >> #    730439 => 'Bacillus caldolyticus'
> > >> #    89318838 => undef    (this record has been removed from the db)
> > >>
> > >> 1;
> > >>
> > >> You probably will need to break up your 30000 into chunks
> > >> (say, 1000-3000 each), and do the above on each chunk with a
> > >>
> > >> sleep 3;
> > >>
> > >> or so separating the queries.
> > >> MAJ
> > >> ----- Original Message -----
> > >> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > >> To: <bioperl-l at lists.open-bio.org>
> > >> Sent: Friday, December 25, 2009 9:46 PM
> > >> Subject: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > >>
> > >>
> > >>> Hi,
> > >>>
> > >>> Does anyone know how to retrieve the "Source" or the "Species name"
> > >> given
> > >>> the accession number using Bioperl.   I have these 30,000 accession
> > >> numbers
> > >>> for which I need to get the source organisms.  Any kind of help will
> > be
> > >>> appreciated.
> > >>>
> > >>> Thanks
> > >>>
> > >>> BD
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>
> > >>>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> =======================================================================
> > > Attention: The information contained in this message and/or
> attachments
> > > from AgResearch Limited is intended only for the persons or entities
> > > to which it is addressed and may contain confidential and/or
> privileged
> > > material. Any review, retransmission, dissemination or other use of,
> or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipients is prohibited by
> AgResearch
> > > Limited. If you have received this message in error, please notify the
> > > sender immediately.
> > >
> =======================================================================
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 26 20:46:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 26 Jan 2010 19:46:26 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
Message-ID: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>

It's unfortunate but I have heard this problem popping up quite a bit more frequently lately.  Not to push too many buttons but NCBI isn't very forthcoming with help these days; they have become quite insular.  Not sure if they're short-staffed due to budget or if there are other issues.

chris

On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:

> Grrrrrr, I hate eutils!!!!
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> STACK: get_desc.pl:32
> -----------------------------------------------------------
> 
> 
> Nice error message though :-)
> 
> 
> --Russell
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>> Sent: Monday, 11 January 2010 10:05 a.m.
>> To: 'Chris Fields'
>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> I've started to go off eUtils recently (not BioPerl's fault) as I've often
>> been finding that with large queries, chunks of the resulting data is
>> missing.
>> For example, before Xmas I was creating species-specific databases by
>> using eUtils to get a list of GI numbers back for a taxid, then retrieving
>> the fasta sequences in chunks of 500.
>> Very regularly, in the middle of the fasta there would be a message about
>> resource unavailable eg.
>>> test_sequence_1
>>  TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>> test_sequence_2
>>  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>> 
>> Often this wasn't detected until formatdb complained about invalid
>> characters.
>> Inquiries to NCBI as to why this was happening and what to do about it
>> returned stupid answers ("do each sequence manually thru the web
>> interface", or "use eUtils").
>> As we have a nice fast network connection, I now prefer to download very
>> large gzip files (i.e. all of refseq) and extract what I need.
>> 
>> I can't help but think that NCBI could solve a lot of problems if they
>> gzipped the output from eUtils queries - it's something I've requested
>> regularly for the last 5 years or so!!
>> 
>> --Russell
>> 
>> 
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>> To: Smithies, Russell
>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>> 
>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
>>> details).
>>> 
>>> chris
>>> 
>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>> 
>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>> flakiness
>>> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>> and
>>> do lookups.
>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>> which
>>> lists taxids and descriptions (and synonyms)
>>>> 
>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>> could do this:
>>>> 
>>>>  my $taxid  = $gi_taxid_nucl{$accession};
>>>>  my $org_name = $names{$taxid};
>>>> 
>>>> --Russell
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>> 
>>>>> Bhakti,
>>>>> The following example (using EUtilities) may serve your purpose:
>>>>> 
>>>>> use Bio::DB::EUtilities;
>>>>> 
>>>>> my (%taxa, @taxa);
>>>>> my (%names, %idmap);
>>>>> 
>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>> 'nucleotide',
>>>>> # (probably)
>>>>> 
>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>> 
>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>                                      -db => 'taxonomy',
>>>>>                                      -dbfrom => 'protein',
>>>>>                                      -correspondence => 1,
>>>>>                                      -id => \@ids);
>>>>> 
>>>>> # iterate through the LinkSet objects
>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>   $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>> }
>>>>> 
>>>>> @taxa = @taxa{@ids};
>>>>> 
>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>       -db    => 'taxonomy',
>>>>>       -id    => \@taxa );
>>>>> 
>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>   $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>> }
>>>>> 
>>>>> foreach (@ids) {
>>>>>   $idmap{$_} = $names{$taxa{$_}};
>>>>> }
>>>>> 
>>>>> # %idmap is
>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>> 
>>>>> 1;
>>>>> 
>>>>> You probably will need to break up your 30000 into chunks
>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>> 
>>>>> sleep 3;
>>>>> 
>>>>> or so separating the queries.
>>>>> MAJ
>>>>> ----- Original Message -----
>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>>>> 
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>> given
>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>> numbers
>>>>>> for which I need to get the source organisms.  Any kind of help will
>>> be
>>>>>> appreciated.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> BD
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>> =======================================================================
>>>> Attention: The information contained in this message and/or
>> attachments
>>>> from AgResearch Limited is intended only for the persons or entities
>>>> to which it is addressed and may contain confidential and/or
>> privileged
>>>> material. Any review, retransmission, dissemination or other use of,
>> or
>>>> taking of any action in reliance upon, this information by persons or
>>>> entities other than the intended recipients is prohibited by
>> AgResearch
>>>> Limited. If you have received this message in error, please notify the
>>>> sender immediately.
>>>> 
>> =======================================================================
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Jan 26 20:59:15 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 14:59:15 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>

I've had a wide selection of errors lately:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
STACK: get_desc.pl:32
-----------------------------------------------------------

And I never get a good explanation from NCBI or suggestions on how to avoid it.


--Russell
	

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 27 January 2010 2:46 p.m.
> To: Smithies, Russell
> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> It's unfortunate but I have heard this problem popping up quite a bit more
> frequently lately.  Not to push too many buttons but NCBI isn't very
> forthcoming with help these days; they have become quite insular.  Not
> sure if they're short-staffed due to budget or if there are other issues.
> 
> chris
> 
> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> 
> > Grrrrrr, I hate eutils!!!!
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> (Connection refused)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > STACK: Bio::Tools::EUtilities::parse_data
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > STACK: Bio::Tools::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > STACK: Bio::DB::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > STACK: get_desc.pl:32
> > -----------------------------------------------------------
> >
> >
> > Nice error message though :-)
> >
> >
> > --Russell
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> >> Sent: Monday, 11 January 2010 10:05 a.m.
> >> To: 'Chris Fields'
> >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> I've started to go off eUtils recently (not BioPerl's fault) as I've
> often
> >> been finding that with large queries, chunks of the resulting data is
> >> missing.
> >> For example, before Xmas I was creating species-specific databases by
> >> using eUtils to get a list of GI numbers back for a taxid, then
> retrieving
> >> the fasta sequences in chunks of 500.
> >> Very regularly, in the middle of the fasta there would be a message
> about
> >> resource unavailable eg.
> >>> test_sequence_1
> >>  TACGATCATCGCTResource UnavailableTACGACTCTGCT
> >>> test_sequence_2
> >>  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> >>
> >> Often this wasn't detected until formatdb complained about invalid
> >> characters.
> >> Inquiries to NCBI as to why this was happening and what to do about it
> >> returned stupid answers ("do each sequence manually thru the web
> >> interface", or "use eUtils").
> >> As we have a nice fast network connection, I now prefer to download
> very
> >> large gzip files (i.e. all of refseq) and extract what I need.
> >>
> >> I can't help but think that NCBI could solve a lot of problems if they
> >> gzipped the output from eUtils queries - it's something I've requested
> >> regularly for the last 5 years or so!!
> >>
> >> --Russell
> >>
> >>
> >>> -----Original Message-----
> >>> From: Chris Fields [mailto:cjfields at illinois.edu]
> >>> Sent: Monday, 11 January 2010 9:50 a.m.
> >>> To: Smithies, Russell
> >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >>> number?
> >>>
> >>> One could also use Bio::DB::Taxonomy, which indexes the same files or
> >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> the
> >>> details).
> >>>
> >>> chris
> >>>
> >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >>>
> >>>> An alternate non-BioPerly way (that may be faster given NCBI's
> >> flakiness
> >>> lately) would be to download the gi_taxid_nucl.zip or
> gi_taxid_prot.zip
> >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
> >> and
> >>> do lookups.
> >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> >> which
> >>> lists taxids and descriptions (and synonyms)
> >>>>
> >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> >>> could do this:
> >>>>
> >>>>  my $taxid  = $gi_taxid_nucl{$accession};
> >>>>  my $org_name = $names{$taxid};
> >>>>
> >>>> --Russell
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> >>>>> number?
> >>>>>
> >>>>> Bhakti,
> >>>>> The following example (using EUtilities) may serve your purpose:
> >>>>>
> >>>>> use Bio::DB::EUtilities;
> >>>>>
> >>>>> my (%taxa, @taxa);
> >>>>> my (%names, %idmap);
> >>>>>
> >>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >>>>> 'nucleotide',
> >>>>> # (probably)
> >>>>>
> >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>>>>
> >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>>>>                                      -db => 'taxonomy',
> >>>>>                                      -dbfrom => 'protein',
> >>>>>                                      -correspondence => 1,
> >>>>>                                      -id => \@ids);
> >>>>>
> >>>>> # iterate through the LinkSet objects
> >>>>> while (my $ds = $factory->next_LinkSet) {
> >>>>>   $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >>>>> }
> >>>>>
> >>>>> @taxa = @taxa{@ids};
> >>>>>
> >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>>>>       -db    => 'taxonomy',
> >>>>>       -id    => \@taxa );
> >>>>>
> >>>>> while (local $_ = $factory->next_DocSum) {
> >>>>>   $names{($_->get_contents_by_name('TaxId'))[0]} =
> >>>>> ($_->get_contents_by_name('ScientificName'))[0];
> >>>>> }
> >>>>>
> >>>>> foreach (@ids) {
> >>>>>   $idmap{$_} = $names{$taxa{$_}};
> >>>>> }
> >>>>>
> >>>>> # %idmap is
> >>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> >>>>> #    730439 => 'Bacillus caldolyticus'
> >>>>> #    89318838 => undef    (this record has been removed from the db)
> >>>>>
> >>>>> 1;
> >>>>>
> >>>>> You probably will need to break up your 30000 into chunks
> >>>>> (say, 1000-3000 each), and do the above on each chunk with a
> >>>>>
> >>>>> sleep 3;
> >>>>>
> >>>>> or so separating the queries.
> >>>>> MAJ
> >>>>> ----- Original Message -----
> >>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >>>>> To: <bioperl-l at lists.open-bio.org>
> >>>>> Sent: Friday, December 25, 2009 9:46 PM
> >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> >>> number?
> >>>>>
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
> >>>>> given
> >>>>>> the accession number using Bioperl.   I have these 30,000 accession
> >>>>> numbers
> >>>>>> for which I need to get the source organisms.  Any kind of help
> will
> >>> be
> >>>>>> appreciated.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> BD
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >> =======================================================================
> >>>> Attention: The information contained in this message and/or
> >> attachments
> >>>> from AgResearch Limited is intended only for the persons or entities
> >>>> to which it is addressed and may contain confidential and/or
> >> privileged
> >>>> material. Any review, retransmission, dissemination or other use of,
> >> or
> >>>> taking of any action in reliance upon, this information by persons or
> >>>> entities other than the intended recipients is prohibited by
> >> AgResearch
> >>>> Limited. If you have received this message in error, please notify
> the
> >>>> sender immediately.
> >>>>
> >> =======================================================================
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 26 21:42:22 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 26 Jan 2010 20:42:22 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
Message-ID: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>

Makes me wonder if they're pushing more users towards the SOAP-based services and away from eutils.

chris

On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:

> I've had a wide selection of errors lately:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> STACK: get_desc.pl:32
> -----------------------------------------------------------
> 
> And I never get a good explanation from NCBI or suggestions on how to avoid it.
> 
> 
> --Russell
> 	
> 
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields at illinois.edu]
>> Sent: Wednesday, 27 January 2010 2:46 p.m.
>> To: Smithies, Russell
>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> It's unfortunate but I have heard this problem popping up quite a bit more
>> frequently lately.  Not to push too many buttons but NCBI isn't very
>> forthcoming with help these days; they have become quite insular.  Not
>> sure if they're short-staffed due to budget or if there are other issues.
>> 
>> chris
>> 
>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
>> 
>>> Grrrrrr, I hate eutils!!!!
>>> 
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
>> (Connection refused)
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>> STACK: Bio::Tools::EUtilities::parse_data
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>>> STACK: Bio::Tools::EUtilities::get_ids
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>>> STACK: Bio::DB::EUtilities::get_ids
>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>>> STACK: get_desc.pl:32
>>> -----------------------------------------------------------
>>> 
>>> 
>>> Nice error message though :-)
>>> 
>>> 
>>> --Russell
>>> 
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>>>> Sent: Monday, 11 January 2010 10:05 a.m.
>>>> To: 'Chris Fields'
>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>> number?
>>>> 
>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
>> often
>>>> been finding that with large queries, chunks of the resulting data is
>>>> missing.
>>>> For example, before Xmas I was creating species-specific databases by
>>>> using eUtils to get a list of GI numbers back for a taxid, then
>> retrieving
>>>> the fasta sequences in chunks of 500.
>>>> Very regularly, in the middle of the fasta there would be a message
>> about
>>>> resource unavailable eg.
>>>>> test_sequence_1
>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>>>> test_sequence_2
>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>>>> 
>>>> Often this wasn't detected until formatdb complained about invalid
>>>> characters.
>>>> Inquiries to NCBI as to why this was happening and what to do about it
>>>> returned stupid answers ("do each sequence manually thru the web
>>>> interface", or "use eUtils").
>>>> As we have a nice fast network connection, I now prefer to download
>> very
>>>> large gzip files (i.e. all of refseq) and extract what I need.
>>>> 
>>>> I can't help but think that NCBI could solve a lot of problems if they
>>>> gzipped the output from eUtils queries - it's something I've requested
>>>> regularly for the last 5 years or so!!
>>>> 
>>>> --Russell
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>>>> To: Smithies, Russell
>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>> 
>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
>> the
>>>>> details).
>>>>> 
>>>>> chris
>>>>> 
>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>>>> 
>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>>>> flakiness
>>>>> lately) would be to download the gi_taxid_nucl.zip or
>> gi_taxid_prot.zip
>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>>>> and
>>>>> do lookups.
>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>>>> which
>>>>> lists taxids and descriptions (and synonyms)
>>>>>> 
>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>>>> could do this:
>>>>>> 
>>>>>> my $taxid  = $gi_taxid_nucl{$accession};
>>>>>> my $org_name = $names{$taxid};
>>>>>> 
>>>>>> --Russell
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
>> accession
>>>>>>> number?
>>>>>>> 
>>>>>>> Bhakti,
>>>>>>> The following example (using EUtilities) may serve your purpose:
>>>>>>> 
>>>>>>> use Bio::DB::EUtilities;
>>>>>>> 
>>>>>>> my (%taxa, @taxa);
>>>>>>> my (%names, %idmap);
>>>>>>> 
>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>>>> 'nucleotide',
>>>>>>> # (probably)
>>>>>>> 
>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>>>> 
>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>>>                                     -db => 'taxonomy',
>>>>>>>                                     -dbfrom => 'protein',
>>>>>>>                                     -correspondence => 1,
>>>>>>>                                     -id => \@ids);
>>>>>>> 
>>>>>>> # iterate through the LinkSet objects
>>>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>>>> }
>>>>>>> 
>>>>>>> @taxa = @taxa{@ids};
>>>>>>> 
>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>>>      -db    => 'taxonomy',
>>>>>>>      -id    => \@taxa );
>>>>>>> 
>>>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>>>> }
>>>>>>> 
>>>>>>> foreach (@ids) {
>>>>>>>  $idmap{$_} = $names{$taxa{$_}};
>>>>>>> }
>>>>>>> 
>>>>>>> # %idmap is
>>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>>>> 
>>>>>>> 1;
>>>>>>> 
>>>>>>> You probably will need to break up your 30000 into chunks
>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>>>> 
>>>>>>> sleep 3;
>>>>>>> 
>>>>>>> or so separating the queries.
>>>>>>> MAJ
>>>>>>> ----- Original Message -----
>>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>>>> 
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>>>> given
>>>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>>>> numbers
>>>>>>>> for which I need to get the source organisms.  Any kind of help
>> will
>>>>> be
>>>>>>>> appreciated.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> BD
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> 
>>>> =======================================================================
>>>>>> Attention: The information contained in this message and/or
>>>> attachments
>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>> to which it is addressed and may contain confidential and/or
>>>> privileged
>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>> or
>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>> entities other than the intended recipients is prohibited by
>>>> AgResearch
>>>>>> Limited. If you have received this message in error, please notify
>> the
>>>>>> sender immediately.
>>>>>> 
>>>> =======================================================================
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Jan 26 21:45:58 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 15:45:58 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>

Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today).

--Russell

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 27 January 2010 3:42 p.m.
> To: Smithies, Russell
> Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Makes me wonder if they're pushing more users towards the SOAP-based
> services and away from eutils.
> 
> chris
> 
> On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> 
> > I've had a wide selection of errors lately:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource
> temporarily unavailable)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > STACK: Bio::Tools::EUtilities::parse_data
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > STACK: Bio::Tools::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > STACK: Bio::DB::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > STACK: get_desc.pl:32
> > -----------------------------------------------------------
> >
> > And I never get a good explanation from NCBI or suggestions on how to
> avoid it.
> >
> >
> > --Russell
> >
> >
> >> -----Original Message-----
> >> From: Chris Fields [mailto:cjfields at illinois.edu]
> >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> >> To: Smithies, Russell
> >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> It's unfortunate but I have heard this problem popping up quite a bit
> more
> >> frequently lately.  Not to push too many buttons but NCBI isn't very
> >> forthcoming with help these days; they have become quite insular.  Not
> >> sure if they're short-staffed due to budget or if there are other
> issues.
> >>
> >> chris
> >>
> >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> >>
> >>> Grrrrrr, I hate eutils!!!!
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> >> (Connection refused)
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> >>> STACK: Bio::Tools::EUtilities::parse_data
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> >>> STACK: Bio::Tools::EUtilities::get_ids
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> >>> STACK: Bio::DB::EUtilities::get_ids
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> >>> STACK: get_desc.pl:32
> >>> -----------------------------------------------------------
> >>>
> >>>
> >>> Nice error message though :-)
> >>>
> >>>
> >>> --Russell
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> >>>> To: 'Chris Fields'
> >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> bio.org'
> >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >>>> number?
> >>>>
> >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
> >> often
> >>>> been finding that with large queries, chunks of the resulting data is
> >>>> missing.
> >>>> For example, before Xmas I was creating species-specific databases by
> >>>> using eUtils to get a list of GI numbers back for a taxid, then
> >> retrieving
> >>>> the fasta sequences in chunks of 500.
> >>>> Very regularly, in the middle of the fasta there would be a message
> >> about
> >>>> resource unavailable eg.
> >>>>> test_sequence_1
> >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> >>>>> test_sequence_2
> >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> >>>>
> >>>> Often this wasn't detected until formatdb complained about invalid
> >>>> characters.
> >>>> Inquiries to NCBI as to why this was happening and what to do about
> it
> >>>> returned stupid answers ("do each sequence manually thru the web
> >>>> interface", or "use eUtils").
> >>>> As we have a nice fast network connection, I now prefer to download
> >> very
> >>>> large gzip files (i.e. all of refseq) and extract what I need.
> >>>>
> >>>> I can't help but think that NCBI could solve a lot of problems if
> they
> >>>> gzipped the output from eUtils queries - it's something I've
> requested
> >>>> regularly for the last 5 years or so!!
> >>>>
> >>>> --Russell
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> >>>>> To: Smithies, Russell
> >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> bio.org'
> >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> >>>>> number?
> >>>>>
> >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files
> or
> >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> >> the
> >>>>> details).
> >>>>>
> >>>>> chris
> >>>>>
> >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >>>>>
> >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> >>>> flakiness
> >>>>> lately) would be to download the gi_taxid_nucl.zip or
> >> gi_taxid_prot.zip
> >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a
> hash
> >>>> and
> >>>>> do lookups.
> >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> >>>> which
> >>>>> lists taxids and descriptions (and synonyms)
> >>>>>>
> >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so
> I
> >>>>> could do this:
> >>>>>>
> >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> >>>>>> my $org_name = $names{$taxid};
> >>>>>>
> >>>>>> --Russell
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> >> accession
> >>>>>>> number?
> >>>>>>>
> >>>>>>> Bhakti,
> >>>>>>> The following example (using EUtilities) may serve your purpose:
> >>>>>>>
> >>>>>>> use Bio::DB::EUtilities;
> >>>>>>>
> >>>>>>> my (%taxa, @taxa);
> >>>>>>> my (%names, %idmap);
> >>>>>>>
> >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >>>>>>> 'nucleotide',
> >>>>>>> # (probably)
> >>>>>>>
> >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>>>>>>
> >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>>>>>>                                     -db => 'taxonomy',
> >>>>>>>                                     -dbfrom => 'protein',
> >>>>>>>                                     -correspondence => 1,
> >>>>>>>                                     -id => \@ids);
> >>>>>>>
> >>>>>>> # iterate through the LinkSet objects
> >>>>>>> while (my $ds = $factory->next_LinkSet) {
> >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >>>>>>> }
> >>>>>>>
> >>>>>>> @taxa = @taxa{@ids};
> >>>>>>>
> >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>>>>>>      -db    => 'taxonomy',
> >>>>>>>      -id    => \@taxa );
> >>>>>>>
> >>>>>>> while (local $_ = $factory->next_DocSum) {
> >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> >>>>>>> }
> >>>>>>>
> >>>>>>> foreach (@ids) {
> >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> >>>>>>> }
> >>>>>>>
> >>>>>>> # %idmap is
> >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> >>>>>>> #    730439 => 'Bacillus caldolyticus'
> >>>>>>> #    89318838 => undef    (this record has been removed from the
> db)
> >>>>>>>
> >>>>>>> 1;
> >>>>>>>
> >>>>>>> You probably will need to break up your 30000 into chunks
> >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> >>>>>>>
> >>>>>>> sleep 3;
> >>>>>>>
> >>>>>>> or so separating the queries.
> >>>>>>> MAJ
> >>>>>>> ----- Original Message -----
> >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >>>>>>> To: <bioperl-l at lists.open-bio.org>
> >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> >>>>> number?
> >>>>>>>
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> name"
> >>>>>>> given
> >>>>>>>> the accession number using Bioperl.   I have these 30,000
> accession
> >>>>>>> numbers
> >>>>>>>> for which I need to get the source organisms.  Any kind of help
> >> will
> >>>>> be
> >>>>>>>> appreciated.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> BD
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>
> =======================================================================
> >>>>>> Attention: The information contained in this message and/or
> >>>> attachments
> >>>>>> from AgResearch Limited is intended only for the persons or
> entities
> >>>>>> to which it is addressed and may contain confidential and/or
> >>>> privileged
> >>>>>> material. Any review, retransmission, dissemination or other use
> of,
> >>>> or
> >>>>>> taking of any action in reliance upon, this information by persons
> or
> >>>>>> entities other than the intended recipients is prohibited by
> >>>> AgResearch
> >>>>>> Limited. If you have received this message in error, please notify
> >> the
> >>>>>> sender immediately.
> >>>>>>
> >>>>
> =======================================================================
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Jan 27 10:14:22 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 27 Jan 2010 10:14:22 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com><1CE23DE1068C4FA2BD543D167A1AA901@NewLife><18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz><F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz><4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
Message-ID: <C1C922A99DF24679955608955B2A73B1@NewLife>

Precisely the MO behind SoapEU...get the jump on 'em.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
Cc: <bioperl-l at lists.open-bio.org>; "'Mark A. Jensen'" <maj at fortinbras.us>
Sent: Tuesday, January 26, 2010 9:42 PM
Subject: Re: [Bioperl-l] how to retrieve organism name from accession number?


> Makes me wonder if they're pushing more users towards the SOAP-based services 
> and away from eutils.
>
> chris
>
> On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
>
>> I've had a wide selection of errors lately:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource 
>> temporarily unavailable)
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>> STACK: Bio::Tools::EUtilities::parse_data 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>> STACK: Bio::Tools::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>> STACK: Bio::DB::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>> STACK: get_desc.pl:32
>> -----------------------------------------------------------
>>
>> And I never get a good explanation from NCBI or suggestions on how to avoid 
>> it.
>>
>>
>> --Russell
>>
>>
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Wednesday, 27 January 2010 2:46 p.m.
>>> To: Smithies, Russell
>>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>>
>>> It's unfortunate but I have heard this problem popping up quite a bit more
>>> frequently lately.  Not to push too many buttons but NCBI isn't very
>>> forthcoming with help these days; they have become quite insular.  Not
>>> sure if they're short-staffed due to budget or if there are other issues.
>>>
>>> chris
>>>
>>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
>>>
>>>> Grrrrrr, I hate eutils!!!!
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
>>> (Connection refused)
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>>> STACK: Bio::Tools::EUtilities::parse_data
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>>>> STACK: Bio::Tools::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>>>> STACK: Bio::DB::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>>>> STACK: get_desc.pl:32
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>> Nice error message though :-)
>>>>
>>>>
>>>> --Russell
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>>>>> Sent: Monday, 11 January 2010 10:05 a.m.
>>>>> To: 'Chris Fields'
>>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>>
>>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
>>> often
>>>>> been finding that with large queries, chunks of the resulting data is
>>>>> missing.
>>>>> For example, before Xmas I was creating species-specific databases by
>>>>> using eUtils to get a list of GI numbers back for a taxid, then
>>> retrieving
>>>>> the fasta sequences in chunks of 500.
>>>>> Very regularly, in the middle of the fasta there would be a message
>>> about
>>>>> resource unavailable eg.
>>>>>> test_sequence_1
>>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>>>>> test_sequence_2
>>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>>>>>
>>>>> Often this wasn't detected until formatdb complained about invalid
>>>>> characters.
>>>>> Inquiries to NCBI as to why this was happening and what to do about it
>>>>> returned stupid answers ("do each sequence manually thru the web
>>>>> interface", or "use eUtils").
>>>>> As we have a nice fast network connection, I now prefer to download
>>> very
>>>>> large gzip files (i.e. all of refseq) and extract what I need.
>>>>>
>>>>> I can't help but think that NCBI could solve a lot of problems if they
>>>>> gzipped the output from eUtils queries - it's something I've requested
>>>>> regularly for the last 5 years or so!!
>>>>>
>>>>> --Russell
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>>>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>>>>> To: Smithies, Russell
>>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>
>>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
>>> the
>>>>>> details).
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>>>>>
>>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>>>>> flakiness
>>>>>> lately) would be to download the gi_taxid_nucl.zip or
>>> gi_taxid_prot.zip
>>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>>>>> and
>>>>>> do lookups.
>>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>>>>> which
>>>>>> lists taxids and descriptions (and synonyms)
>>>>>>>
>>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>>>>> could do this:
>>>>>>>
>>>>>>> my $taxid  = $gi_taxid_nucl{$accession};
>>>>>>> my $org_name = $names{$taxid};
>>>>>>>
>>>>>>> --Russell
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
>>> accession
>>>>>>>> number?
>>>>>>>>
>>>>>>>> Bhakti,
>>>>>>>> The following example (using EUtilities) may serve your purpose:
>>>>>>>>
>>>>>>>> use Bio::DB::EUtilities;
>>>>>>>>
>>>>>>>> my (%taxa, @taxa);
>>>>>>>> my (%names, %idmap);
>>>>>>>>
>>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>>>>> 'nucleotide',
>>>>>>>> # (probably)
>>>>>>>>
>>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>>>>>
>>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>>>>                                     -db => 'taxonomy',
>>>>>>>>                                     -dbfrom => 'protein',
>>>>>>>>                                     -correspondence => 1,
>>>>>>>>                                     -id => \@ids);
>>>>>>>>
>>>>>>>> # iterate through the LinkSet objects
>>>>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>>>>> }
>>>>>>>>
>>>>>>>> @taxa = @taxa{@ids};
>>>>>>>>
>>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>>>>      -db    => 'taxonomy',
>>>>>>>>      -id    => \@taxa );
>>>>>>>>
>>>>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>>>>> }
>>>>>>>>
>>>>>>>> foreach (@ids) {
>>>>>>>>  $idmap{$_} = $names{$taxa{$_}};
>>>>>>>> }
>>>>>>>>
>>>>>>>> # %idmap is
>>>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>>>>>
>>>>>>>> 1;
>>>>>>>>
>>>>>>>> You probably will need to break up your 30000 into chunks
>>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>>>>>
>>>>>>>> sleep 3;
>>>>>>>>
>>>>>>>> or so separating the queries.
>>>>>>>> MAJ
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>>>>> given
>>>>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>>>>> numbers
>>>>>>>>> for which I need to get the source organisms.  Any kind of help
>>> will
>>>>>> be
>>>>>>>>> appreciated.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> BD
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>> =======================================================================
>>>>>>> Attention: The information contained in this message and/or
>>>>> attachments
>>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>>> to which it is addressed and may contain confidential and/or
>>>>> privileged
>>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>>> or
>>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>>> entities other than the intended recipients is prohibited by
>>>>> AgResearch
>>>>>>> Limited. If you have received this message in error, please notify
>>> the
>>>>>>> sender immediately.
>>>>>>>
>>>>> =======================================================================
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bhakti.dwivedi at gmail.com  Wed Jan 27 14:42:06 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Wed, 27 Jan 2010 14:42:06 -0500
Subject: [Bioperl-l] Designing primers from multiple sequence alignment of
	amino acid sequences
Message-ID: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>

Hi,

I have to design primers from the multiple sequence alignments of amino acid
sequences.  The sequences I am working with are quite diverged and often the
available primer design programs (such as CODEHOP/iCODEHOP) fail to find any
primer sets.   But, when I look  at the alignment manually, I could see the
regions that I could use to make primers.

So I  designed the degenerate primers the old-fashioned way, starting from
selecting the conserved regions (6-10aa long) from the alignment  to
translating the selected regions to DNA using the appropriate codon usage
table, and then finally checking the primer sets (potential forward and
reverse primers) using tools like OLIGOANALYZER.  In the end, I did find few
good primer sets, but getting them to work in reality is something I will
have to wait and see.

While doing this process manually, I really felt the need to automate it (it
was not just one alignment I did, I worked with several of those).   I was
wondering if there is anyway bioperl can help me here, or making a perl
script is the only way to go.

I would appreciate your suggestions/comments.  Thanks!  (apologize for a
long email..)


Regards
Bhakti


From Kevin.M.Brown at asu.edu  Wed Jan 27 15:23:57 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 27 Jan 2010 13:23:57 -0700
Subject: [Bioperl-l] Designing primers from multiple sequence alignment
	ofamino acid sequences
In-Reply-To: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>
References: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4068498DB@EX02.asurite.ad.asu.edu>

Bioperl is just a collection of tools, not a full blown application.
Most of what you want can be done with the objects available from within
the toolkit, but the application (perl script) would still need to be
written to put the objects to use. You could use clustalw from within
perl to align the sequences (Bio::Tools::Run::Alignment::Clustalw), find
the conserved regions (Bio::SimpleAlign), reverse translate them
(Bio::Tools::CodonTable), then come up with an algorithm for primer
analysis and selction (or even use other apps like primer3
(Bio::Tools::Run::Primer3) from within perl).

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Bhakti Dwivedi
> Sent: Wednesday, January 27, 2010 12:42 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Designing primers from multiple sequence 
> alignment ofamino acid sequences
> 
> Hi,
> 
> I have to design primers from the multiple sequence 
> alignments of amino acid
> sequences.  The sequences I am working with are quite 
> diverged and often the
> available primer design programs (such as CODEHOP/iCODEHOP) 
> fail to find any
> primer sets.   But, when I look  at the alignment manually, I 
> could see the
> regions that I could use to make primers.
> 
> So I  designed the degenerate primers the old-fashioned way, 
> starting from
> selecting the conserved regions (6-10aa long) from the alignment  to
> translating the selected regions to DNA using the appropriate 
> codon usage
> table, and then finally checking the primer sets (potential 
> forward and
> reverse primers) using tools like OLIGOANALYZER.  In the end, 
> I did find few
> good primer sets, but getting them to work in reality is 
> something I will
> have to wait and see.
> 
> While doing this process manually, I really felt the need to 
> automate it (it
> was not just one alignment I did, I worked with several of 
> those).   I was
> wondering if there is anyway bioperl can help me here, or 
> making a perl
> script is the only way to go.
> 
> I would appreciate your suggestions/comments.  Thanks!  
> (apologize for a
> long email..)
> 
> 
> Regards
> Bhakti
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From mike.stubbington at bbsrc.ac.uk  Thu Jan 28 10:41:49 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Thu, 28 Jan 2010 15:41:49 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
Message-ID: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>

Dear all,

I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. 

I have perl code that creates an array of bioperl sequence objects called @primers

I then create a StandAloneBlastPlus factory using the following code?

	my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
		-db_dir => '/Users/stubbing/localBlast/',
		-db_name => 'MouseGenome'
	);

and then attempt to blast my primers using this?

	my @shortPrimers;
	my $count=1;
	foreach (@primers) {
		my $currentSeq = $_;
		print "Checking primer $count/$primerNumber ";
		if ($_->length < 40) {
			push(@shortPrimers,$_);
			print "Too short!\n";
		}
		else {
			print "BLASTing...";
			my $blastResult = $blastFactory->blastn(-query => $currentSeq);
		}
		$count++;
	}

This fails with the following error?

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> line 532.

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

Line 63 in my code is (as you might expect) the one that calls blastn on my factory object.

I'd appreciate any help you might be able to provide to shed light on this.

Thanks in advance,

Mike


From maj at fortinbras.us  Thu Jan 28 10:56:14 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 10:56:14 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
Message-ID: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>

Mike - please try updating your bioperl-live (the core) to the latest code 
(revision 16761 or so).
CommandExts is a work in progress; from the stack errors it looks like you've 
got an older version.
Try it then ping us back, if you would--
Thanks
Mark
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 10:41 AM
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error 
running blastn


Dear all,

I am attempting to blast some primers against the mouse genome. I have created a 
local mouse genome blast database and I can search against it using 'blastn' at 
the command line.

I have perl code that creates an array of bioperl sequence objects called 
@primers

I then create a StandAloneBlastPlus factory using the following code?

my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_dir => '/Users/stubbing/localBlast/',
-db_name => 'MouseGenome'
);

and then attempt to blast my primers using this?

my @shortPrimers;
my $count=1;
foreach (@primers) {
my $currentSeq = $_;
print "Checking primer $count/$primerNumber ";
if ($_->length < 40) {
push(@shortPrimers,$_);
print "Too short!\n";
}
else {
print "BLASTing...";
my $blastResult = $blastFactory->blastn(-query => $currentSeq);
}
$count++;
}

This fails with the following error?

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

Line 63 in my code is (as you might expect) the one that calls blastn on my 
factory object.

I'd appreciate any help you might be able to provide to shed light on this.

Thanks in advance,

Mike


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mike.stubbington at bbsrc.ac.uk  Thu Jan 28 11:18:12 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Thu, 28 Jan 2010 16:18:12 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
In-Reply-To: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
	<56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
Message-ID: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>

Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> line 532.

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code 
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've 
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error 
> running blastn
> 
> 
> Dear all,
> 
> I am attempting to blast some primers against the mouse genome. I have created a 
> local mouse genome blast database and I can search against it using 'blastn' at 
> the command line.
> 
> I have perl code that creates an array of bioperl sequence objects called 
> @primers
> 
> I then create a StandAloneBlastPlus factory using the following code?
> 
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
> 
> and then attempt to blast my primers using this?
> 
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
> 
> This fails with the following error?
> 
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> 
> line 532.
> 
> STACK Bio::Tools::Run::WrapperBase::_run 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
> 
> Line 63 in my code is (as you might expect) the one that calls blastn on my 
> factory object.
> 
> I'd appreciate any help you might be able to provide to shed light on this.
> 
> Thanks in advance,
> 
> Mike
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Thu Jan 28 11:28:52 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 11:28:52 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
	<56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
Message-ID: <C7FF329BCA044F19B3D690FE67319192@NewLife>

Thanks Mike-- will have a look asap- cheers MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 11:18 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
> running blastn
>
>
> Dear all,
>
> I am attempting to blast some primers against the mouse genome. I have created 
> a
> local mouse genome blast database and I can search against it using 'blastn' 
> at
> the command line.
>
> I have perl code that creates an array of bioperl sequence objects called
> @primers
>
> I then create a StandAloneBlastPlus factory using the following code?
>
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
>
> and then attempt to blast my primers using this?
>
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
>
> This fails with the following error?
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> Line 63 in my code is (as you might expect) the one that calls blastn on my
> factory object.
>
> I'd appreciate any help you might be able to provide to shed light on this.
>
> Thanks in advance,
>
> Mike
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Thu Jan 28 13:26:27 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 12:26:27 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
Message-ID: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu>

Russell,

Just curious, but have you tried setting the return email parameter
(-email)?  NCBI recently stated that all queries would eventually
require a return email of some sort (not sure if it's validated or not).
I think that was set for around late spring.  I'm changing the code in
svn to require it for that very purpose.

chris  


 Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today).
> 
> --Russell
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > To: Smithies, Russell
> > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > 
> > Makes me wonder if they're pushing more users towards the SOAP-based
> > services and away from eutils.
> > 
> > chris
> > 
> > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > 
> > > I've had a wide selection of errors lately:
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource
> > temporarily unavailable)
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > STACK: Bio::Tools::EUtilities::parse_data
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > STACK: Bio::Tools::EUtilities::get_ids
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > STACK: Bio::DB::EUtilities::get_ids
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > STACK: get_desc.pl:32
> > > -----------------------------------------------------------
> > >
> > > And I never get a good explanation from NCBI or suggestions on how to
> > avoid it.
> > >
> > >
> > > --Russell
> > >
> > >
> > >> -----Original Message-----
> > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > >> To: Smithies, Russell
> > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >> number?
> > >>
> > >> It's unfortunate but I have heard this problem popping up quite a bit
> > more
> > >> frequently lately.  Not to push too many buttons but NCBI isn't very
> > >> forthcoming with help these days; they have become quite insular.  Not
> > >> sure if they're short-staffed due to budget or if there are other
> > issues.
> > >>
> > >> chris
> > >>
> > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > >>
> > >>> Grrrrrr, I hate eutils!!!!
> > >>>
> > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > >> (Connection refused)
> > >>> STACK: Error::throw
> > >>> STACK: Bio::Root::Root::throw
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > >>> STACK: Bio::Tools::EUtilities::parse_data
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > >>> STACK: Bio::Tools::EUtilities::get_ids
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > >>> STACK: Bio::DB::EUtilities::get_ids
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > >>> STACK: get_desc.pl:32
> > >>> -----------------------------------------------------------
> > >>>
> > >>>
> > >>> Nice error message though :-)
> > >>>
> > >>>
> > >>> --Russell
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > >>>> To: 'Chris Fields'
> > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > bio.org'
> > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >>>> number?
> > >>>>
> > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
> > >> often
> > >>>> been finding that with large queries, chunks of the resulting data is
> > >>>> missing.
> > >>>> For example, before Xmas I was creating species-specific databases by
> > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > >> retrieving
> > >>>> the fasta sequences in chunks of 500.
> > >>>> Very regularly, in the middle of the fasta there would be a message
> > >> about
> > >>>> resource unavailable eg.
> > >>>>> test_sequence_1
> > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > >>>>> test_sequence_2
> > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > >>>>
> > >>>> Often this wasn't detected until formatdb complained about invalid
> > >>>> characters.
> > >>>> Inquiries to NCBI as to why this was happening and what to do about
> > it
> > >>>> returned stupid answers ("do each sequence manually thru the web
> > >>>> interface", or "use eUtils").
> > >>>> As we have a nice fast network connection, I now prefer to download
> > >> very
> > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > >>>>
> > >>>> I can't help but think that NCBI could solve a lot of problems if
> > they
> > >>>> gzipped the output from eUtils queries - it's something I've
> > requested
> > >>>> regularly for the last 5 years or so!!
> > >>>>
> > >>>> --Russell
> > >>>>
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > >>>>> To: Smithies, Russell
> > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > bio.org'
> > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > >>>>> number?
> > >>>>>
> > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files
> > or
> > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> > >> the
> > >>>>> details).
> > >>>>>
> > >>>>> chris
> > >>>>>
> > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > >>>>>
> > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > >>>> flakiness
> > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > >> gi_taxid_prot.zip
> > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a
> > hash
> > >>>> and
> > >>>>> do lookups.
> > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> > >>>> which
> > >>>>> lists taxids and descriptions (and synonyms)
> > >>>>>>
> > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so
> > I
> > >>>>> could do this:
> > >>>>>>
> > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > >>>>>> my $org_name = $names{$taxid};
> > >>>>>>
> > >>>>>> --Russell
> > >>>>>>
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > >> accession
> > >>>>>>> number?
> > >>>>>>>
> > >>>>>>> Bhakti,
> > >>>>>>> The following example (using EUtilities) may serve your purpose:
> > >>>>>>>
> > >>>>>>> use Bio::DB::EUtilities;
> > >>>>>>>
> > >>>>>>> my (%taxa, @taxa);
> > >>>>>>> my (%names, %idmap);
> > >>>>>>>
> > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> > >>>>>>> 'nucleotide',
> > >>>>>>> # (probably)
> > >>>>>>>
> > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > >>>>>>>
> > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > >>>>>>>                                     -db => 'taxonomy',
> > >>>>>>>                                     -dbfrom => 'protein',
> > >>>>>>>                                     -correspondence => 1,
> > >>>>>>>                                     -id => \@ids);
> > >>>>>>>
> > >>>>>>> # iterate through the LinkSet objects
> > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> @taxa = @taxa{@ids};
> > >>>>>>>
> > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > >>>>>>>      -db    => 'taxonomy',
> > >>>>>>>      -id    => \@taxa );
> > >>>>>>>
> > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> foreach (@ids) {
> > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> # %idmap is
> > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > >>>>>>> #    89318838 => undef    (this record has been removed from the
> > db)
> > >>>>>>>
> > >>>>>>> 1;
> > >>>>>>>
> > >>>>>>> You probably will need to break up your 30000 into chunks
> > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > >>>>>>>
> > >>>>>>> sleep 3;
> > >>>>>>>
> > >>>>>>> or so separating the queries.
> > >>>>>>> MAJ
> > >>>>>>> ----- Original Message -----
> > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> > >>>>> number?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > name"
> > >>>>>>> given
> > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > accession
> > >>>>>>> numbers
> > >>>>>>>> for which I need to get the source organisms.  Any kind of help
> > >> will
> > >>>>> be
> > >>>>>>>> appreciated.
> > >>>>>>>>
> > >>>>>>>> Thanks
> > >>>>>>>>
> > >>>>>>>> BD
> > >>>>>>>> _______________________________________________
> > >>>>>>>> Bioperl-l mailing list
> > >>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> Bioperl-l mailing list
> > >>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>
> > >>>>
> > =======================================================================
> > >>>>>> Attention: The information contained in this message and/or
> > >>>> attachments
> > >>>>>> from AgResearch Limited is intended only for the persons or
> > entities
> > >>>>>> to which it is addressed and may contain confidential and/or
> > >>>> privileged
> > >>>>>> material. Any review, retransmission, dissemination or other use
> > of,
> > >>>> or
> > >>>>>> taking of any action in reliance upon, this information by persons
> > or
> > >>>>>> entities other than the intended recipients is prohibited by
> > >>>> AgResearch
> > >>>>>> Limited. If you have received this message in error, please notify
> > >> the
> > >>>>>> sender immediately.
> > >>>>>>
> > >>>>
> > =======================================================================
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> Bioperl-l mailing list
> > >>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> Bioperl-l mailing list
> > >>>> Bioperl-l at lists.open-bio.org
> > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jan 28 13:47:04 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 13:47:04 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
Message-ID: <FD6E9A89F6034CCB856E22553ED893D7@NewLife>

Hi Mike,
Believe I found the real bug causing the problem (was not accounting for
the db_dir parameter). Crashes should now also throw much more helpful
errors. Please try the code at r16774, and shout back.
thanks --
MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 11:18 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
> running blastn
>
>
> Dear all,
>
> I am attempting to blast some primers against the mouse genome. I have created 
> a
> local mouse genome blast database and I can search against it using 'blastn' 
> at
> the command line.
>
> I have perl code that creates an array of bioperl sequence objects called
> @primers
>
> I then create a StandAloneBlastPlus factory using the following code?
>
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
>
> and then attempt to blast my primers using this?
>
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
>
> This fails with the following error?
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> Line 63 in my code is (as you might expect) the one that calls blastn on my
> factory object.
>
> I'd appreciate any help you might be able to provide to shed light on this.
>
> Thanks in advance,
>
> Mike
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jan 28 14:00:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:00:26 -0600
Subject: [Bioperl-l] EUtilities policy change
Message-ID: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>

All,

Per NCBI's recent change in eutils user policy (effective June 1):

http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html

Both the tool and email parameters ('-tool', '-email') are now required
when making requests.  Note this will significantly break all modules
requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
and Taxonomy stuff as well, IIRC).  This also applies to web services
(SOAP-based access).  Mark, not sure how this affects your SOAP-based
modules.

I have reconfigured Bio::DB::EUtilities to follow this policy; the
default tool setting has been 'bioperl' and will remain that way.
However, there has been no default email, therefore setting this is now
required for future requests unless we (the bioperl devs) decide there
is a safe default email to utilize.  My gut tells me, however, that
falling back to a default email opens up a can of worms for the devs and
is very likely a 'BAD IDEA'(TM).  

Regardless, be aware that, after June 1, NCBI will very likely exclude
requests with no email and will notify users who are considered to be
violating their policies.

I will likely make further changes to Bio::DB::EUtilities in the
meantime to ensure that using the tools by default will not violate
NCBI's policy (e.g. override this at your own risk).  

chris


From maj at fortinbras.us  Thu Jan 28 14:05:43 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 14:05:43 -0500
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
Message-ID: <8F49B5ED151143FA86E977B4D4F44265@NewLife>

Thanks Chris-- 
The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
I agree that a default email is a bad idea (tm) (unless maybe it's 
hilmar's...?). I'd say a warning on unset email parameters is a responsible
"there be dragons" sort of treatment.
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
Cc: "Mark A. Jensen" <maj at fortinbras.us>
Sent: Thursday, January 28, 2010 2:00 PM
Subject: EUtilities policy change


> All,
> 
> Per NCBI's recent change in eutils user policy (effective June 1):
> 
> http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
> 
> Both the tool and email parameters ('-tool', '-email') are now required
> when making requests.  Note this will significantly break all modules
> requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
> and Taxonomy stuff as well, IIRC).  This also applies to web services
> (SOAP-based access).  Mark, not sure how this affects your SOAP-based
> modules.
> 
> I have reconfigured Bio::DB::EUtilities to follow this policy; the
> default tool setting has been 'bioperl' and will remain that way.
> However, there has been no default email, therefore setting this is now
> required for future requests unless we (the bioperl devs) decide there
> is a safe default email to utilize.  My gut tells me, however, that
> falling back to a default email opens up a can of worms for the devs and
> is very likely a 'BAD IDEA'(TM).  
> 
> Regardless, be aware that, after June 1, NCBI will very likely exclude
> requests with no email and will notify users who are considered to be
> violating their policies.
> 
> I will likely make further changes to Bio::DB::EUtilities in the
> meantime to ensure that using the tools by default will not violate
> NCBI's policy (e.g. override this at your own risk).  
> 
> chris
> 
> 
>


From cjfields at illinois.edu  Thu Jan 28 14:18:22 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:18:22 -0600
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <8F49B5ED151143FA86E977B4D4F44265@NewLife>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
	<8F49B5ED151143FA86E977B4D4F44265@NewLife>
Message-ID: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu>

I think warning is fine for now.  I've reimplemented that so it occurs
lazily (warns only when a request is actually made).

Will also change the tool to 'BioPerl' (currently 'bioperl', all lc).
We'll obviously have to address this in the test suite as well in some
way, maybe ask for an email if network tests are requested.

chris 

On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote:
> Thanks Chris-- 
> The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
> I agree that a default email is a bad idea (tm) (unless maybe it's 
> hilmar's...?). I'd say a warning on unset email parameters is a responsible
> "there be dragons" sort of treatment.
> MAJ
> ----- Original Message ----- 
> From: "Chris Fields" <cjfields at illinois.edu>
> To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
> Cc: "Mark A. Jensen" <maj at fortinbras.us>
> Sent: Thursday, January 28, 2010 2:00 PM
> Subject: EUtilities policy change
> 
> 
> > All,
> > 
> > Per NCBI's recent change in eutils user policy (effective June 1):
> > 
> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
> > 
> > Both the tool and email parameters ('-tool', '-email') are now required
> > when making requests.  Note this will significantly break all modules
> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
> > and Taxonomy stuff as well, IIRC).  This also applies to web services
> > (SOAP-based access).  Mark, not sure how this affects your SOAP-based
> > modules.
> > 
> > I have reconfigured Bio::DB::EUtilities to follow this policy; the
> > default tool setting has been 'bioperl' and will remain that way.
> > However, there has been no default email, therefore setting this is now
> > required for future requests unless we (the bioperl devs) decide there
> > is a safe default email to utilize.  My gut tells me, however, that
> > falling back to a default email opens up a can of worms for the devs and
> > is very likely a 'BAD IDEA'(TM).  
> > 
> > Regardless, be aware that, after June 1, NCBI will very likely exclude
> > requests with no email and will notify users who are considered to be
> > violating their policies.
> > 
> > I will likely make further changes to Bio::DB::EUtilities in the
> > meantime to ensure that using the tools by default will not violate
> > NCBI's policy (e.g. override this at your own risk).  
> > 
> > chris
> > 
> > 
> >


From Russell.Smithies at agresearch.co.nz  Thu Jan 28 14:25:38 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 29 Jan 2010 08:25:38 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
	<1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>

Yes, I usually set the 'tool' and 'email' parameters.
I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well...

--Russell

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Friday, 29 January 2010 7:26 a.m.
> To: Smithies, Russell
> Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Russell,
> 
> Just curious, but have you tried setting the return email parameter
> (-email)?  NCBI recently stated that all queries would eventually
> require a return email of some sort (not sure if it's validated or not).
> I think that was set for around late spring.  I'm changing the code in
> svn to require it for that very purpose.
> 
> chris
> 
> 
>  Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi
> still works if you don't mind a bit of manual button clicking. It's
> handling chunks of 100,000 records OK (today).
> >
> > --Russell
> >
> > > -----Original Message-----
> > > From: Chris Fields [mailto:cjfields at illinois.edu]
> > > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > > To: Smithies, Russell
> > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > > number?
> > >
> > > Makes me wonder if they're pushing more users towards the SOAP-based
> > > services and away from eutils.
> > >
> > > chris
> > >
> > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > >
> > > > I've had a wide selection of errors lately:
> > > >
> > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11
> (Resource
> > > temporarily unavailable)
> > > > STACK: Error::throw
> > > > STACK: Bio::Root::Root::throw
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > STACK: Bio::Tools::EUtilities::parse_data
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > STACK: Bio::Tools::EUtilities::get_ids
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > STACK: Bio::DB::EUtilities::get_ids
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > STACK: get_desc.pl:32
> > > > -----------------------------------------------------------
> > > >
> > > > And I never get a good explanation from NCBI or suggestions on how
> to
> > > avoid it.
> > > >
> > > >
> > > > --Russell
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > > >> To: Smithies, Russell
> > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> > > >> number?
> > > >>
> > > >> It's unfortunate but I have heard this problem popping up quite a
> bit
> > > more
> > > >> frequently lately.  Not to push too many buttons but NCBI isn't
> very
> > > >> forthcoming with help these days; they have become quite insular.
> Not
> > > >> sure if they're short-staffed due to budget or if there are other
> > > issues.
> > > >>
> > > >> chris
> > > >>
> > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > > >>
> > > >>> Grrrrrr, I hate eutils!!!!
> > > >>>
> > > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > > >> (Connection refused)
> > > >>> STACK: Error::throw
> > > >>> STACK: Bio::Root::Root::throw
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > >>> STACK: Bio::Tools::EUtilities::parse_data
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > >>> STACK: Bio::Tools::EUtilities::get_ids
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > >>> STACK: Bio::DB::EUtilities::get_ids
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > >>> STACK: get_desc.pl:32
> > > >>> -----------------------------------------------------------
> > > >>>
> > > >>>
> > > >>> Nice error message though :-)
> > > >>>
> > > >>>
> > > >>> --Russell
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > > >>>> To: 'Chris Fields'
> > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > > bio.org'
> > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> > > >>>> number?
> > > >>>>
> > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as
> I've
> > > >> often
> > > >>>> been finding that with large queries, chunks of the resulting
> data is
> > > >>>> missing.
> > > >>>> For example, before Xmas I was creating species-specific
> databases by
> > > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > > >> retrieving
> > > >>>> the fasta sequences in chunks of 500.
> > > >>>> Very regularly, in the middle of the fasta there would be a
> message
> > > >> about
> > > >>>> resource unavailable eg.
> > > >>>>> test_sequence_1
> > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > > >>>>> test_sequence_2
> > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > > >>>>
> > > >>>> Often this wasn't detected until formatdb complained about
> invalid
> > > >>>> characters.
> > > >>>> Inquiries to NCBI as to why this was happening and what to do
> about
> > > it
> > > >>>> returned stupid answers ("do each sequence manually thru the web
> > > >>>> interface", or "use eUtils").
> > > >>>> As we have a nice fast network connection, I now prefer to
> download
> > > >> very
> > > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > > >>>>
> > > >>>> I can't help but think that NCBI could solve a lot of problems if
> > > they
> > > >>>> gzipped the output from eUtils queries - it's something I've
> > > requested
> > > >>>> regularly for the last 5 years or so!!
> > > >>>>
> > > >>>> --Russell
> > > >>>>
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > > >>>>> To: Smithies, Russell
> > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > > bio.org'
> > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > accession
> > > >>>>> number?
> > > >>>>>
> > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same
> files
> > > or
> > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD
> for
> > > >> the
> > > >>>>> details).
> > > >>>>>
> > > >>>>> chris
> > > >>>>>
> > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > > >>>>>
> > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > > >>>> flakiness
> > > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > > >> gi_taxid_prot.zip
> > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into
> a
> > > hash
> > > >>>> and
> > > >>>>> do lookups.
> > > >>>>>> In that same dir, taxdump.tar.gz contains a file called
> names.dmp
> > > >>>> which
> > > >>>>> lists taxids and descriptions (and synonyms)
> > > >>>>>>
> > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes
> so
> > > I
> > > >>>>> could do this:
> > > >>>>>>
> > > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > > >>>>>> my $org_name = $names{$taxid};
> > > >>>>>>
> > > >>>>>> --Russell
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> -----Original Message-----
> > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > >> accession
> > > >>>>>>> number?
> > > >>>>>>>
> > > >>>>>>> Bhakti,
> > > >>>>>>> The following example (using EUtilities) may serve your
> purpose:
> > > >>>>>>>
> > > >>>>>>> use Bio::DB::EUtilities;
> > > >>>>>>>
> > > >>>>>>> my (%taxa, @taxa);
> > > >>>>>>> my (%names, %idmap);
> > > >>>>>>>
> > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom
> =>
> > > >>>>>>> 'nucleotide',
> > > >>>>>>> # (probably)
> > > >>>>>>>
> > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > > >>>>>>>
> > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > > >>>>>>>                                     -db => 'taxonomy',
> > > >>>>>>>                                     -dbfrom => 'protein',
> > > >>>>>>>                                     -correspondence => 1,
> > > >>>>>>>                                     -id => \@ids);
> > > >>>>>>>
> > > >>>>>>> # iterate through the LinkSet objects
> > > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> @taxa = @taxa{@ids};
> > > >>>>>>>
> > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > > >>>>>>>      -db    => 'taxonomy',
> > > >>>>>>>      -id    => \@taxa );
> > > >>>>>>>
> > > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> foreach (@ids) {
> > > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> # %idmap is
> > > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > > >>>>>>> #    89318838 => undef    (this record has been removed from
> the
> > > db)
> > > >>>>>>>
> > > >>>>>>> 1;
> > > >>>>>>>
> > > >>>>>>> You probably will need to break up your 30000 into chunks
> > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > > >>>>>>>
> > > >>>>>>> sleep 3;
> > > >>>>>>>
> > > >>>>>>> or so separating the queries.
> > > >>>>>>> MAJ
> > > >>>>>>> ----- Original Message -----
> > > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from
> accession
> > > >>>>> number?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Hi,
> > > >>>>>>>>
> > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > > name"
> > > >>>>>>> given
> > > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > > accession
> > > >>>>>>> numbers
> > > >>>>>>>> for which I need to get the source organisms.  Any kind of
> help
> > > >> will
> > > >>>>> be
> > > >>>>>>>> appreciated.
> > > >>>>>>>>
> > > >>>>>>>> Thanks
> > > >>>>>>>>
> > > >>>>>>>> BD
> > > >>>>>>>> _______________________________________________
> > > >>>>>>>> Bioperl-l mailing list
> > > >>>>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> _______________________________________________
> > > >>>>>>> Bioperl-l mailing list
> > > >>>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>
> > > >>>>
> > >
> =======================================================================
> > > >>>>>> Attention: The information contained in this message and/or
> > > >>>> attachments
> > > >>>>>> from AgResearch Limited is intended only for the persons or
> > > entities
> > > >>>>>> to which it is addressed and may contain confidential and/or
> > > >>>> privileged
> > > >>>>>> material. Any review, retransmission, dissemination or other
> use
> > > of,
> > > >>>> or
> > > >>>>>> taking of any action in reliance upon, this information by
> persons
> > > or
> > > >>>>>> entities other than the intended recipients is prohibited by
> > > >>>> AgResearch
> > > >>>>>> Limited. If you have received this message in error, please
> notify
> > > >> the
> > > >>>>>> sender immediately.
> > > >>>>>>
> > > >>>>
> > >
> =======================================================================
> > > >>>>>>
> > > >>>>>> _______________________________________________
> > > >>>>>> Bioperl-l mailing list
> > > >>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> Bioperl-l mailing list
> > > >>>> Bioperl-l at lists.open-bio.org
> > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Thu Jan 28 14:30:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:30:12 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
	<1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>
Message-ID: <1264707012.5473.51.camel@cjfields.igb.uiuc.edu>

Russell,

Okay, just wanted to make sure.  The email/tool requirements weren't
actually enforced up until now, which is forcing us to do a bit of
re-work on the various tools that don't have it set by default (at least
warn users unaware of it).  

And I agree, gzipped archives would be nice!

chris

On Fri, 2010-01-29 at 08:25 +1300, Smithies, Russell wrote:
> Yes, I usually set the 'tool' and 'email' parameters.
> I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well...
> 
> --Russell
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Friday, 29 January 2010 7:26 a.m.
> > To: Smithies, Russell
> > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > 
> > Russell,
> > 
> > Just curious, but have you tried setting the return email parameter
> > (-email)?  NCBI recently stated that all queries would eventually
> > require a return email of some sort (not sure if it's validated or not).
> > I think that was set for around late spring.  I'm changing the code in
> > svn to require it for that very purpose.
> > 
> > chris
> > 
> > 
> >  Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi
> > still works if you don't mind a bit of manual button clicking. It's
> > handling chunks of 100,000 records OK (today).
> > >
> > > --Russell
> > >
> > > > -----Original Message-----
> > > > From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > > > To: Smithies, Russell
> > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > > > number?
> > > >
> > > > Makes me wonder if they're pushing more users towards the SOAP-based
> > > > services and away from eutils.
> > > >
> > > > chris
> > > >
> > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > > >
> > > > > I've had a wide selection of errors lately:
> > > > >
> > > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11
> > (Resource
> > > > temporarily unavailable)
> > > > > STACK: Error::throw
> > > > > STACK: Bio::Root::Root::throw
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > > STACK: Bio::Tools::EUtilities::parse_data
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > > STACK: Bio::Tools::EUtilities::get_ids
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > > STACK: Bio::DB::EUtilities::get_ids
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > > STACK: get_desc.pl:32
> > > > > -----------------------------------------------------------
> > > > >
> > > > > And I never get a good explanation from NCBI or suggestions on how
> > to
> > > > avoid it.
> > > > >
> > > > >
> > > > > --Russell
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > > > >> To: Smithies, Russell
> > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >> number?
> > > > >>
> > > > >> It's unfortunate but I have heard this problem popping up quite a
> > bit
> > > > more
> > > > >> frequently lately.  Not to push too many buttons but NCBI isn't
> > very
> > > > >> forthcoming with help these days; they have become quite insular.
> > Not
> > > > >> sure if they're short-staffed due to budget or if there are other
> > > > issues.
> > > > >>
> > > > >> chris
> > > > >>
> > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > > > >>
> > > > >>> Grrrrrr, I hate eutils!!!!
> > > > >>>
> > > > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > > > >> (Connection refused)
> > > > >>> STACK: Error::throw
> > > > >>> STACK: Bio::Root::Root::throw
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > >>> STACK: Bio::Tools::EUtilities::parse_data
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > >>> STACK: Bio::Tools::EUtilities::get_ids
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > >>> STACK: Bio::DB::EUtilities::get_ids
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > >>> STACK: get_desc.pl:32
> > > > >>> -----------------------------------------------------------
> > > > >>>
> > > > >>>
> > > > >>> Nice error message though :-)
> > > > >>>
> > > > >>>
> > > > >>> --Russell
> > > > >>>
> > > > >>>> -----Original Message-----
> > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > > > >>>> To: 'Chris Fields'
> > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > > > bio.org'
> > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >>>> number?
> > > > >>>>
> > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as
> > I've
> > > > >> often
> > > > >>>> been finding that with large queries, chunks of the resulting
> > data is
> > > > >>>> missing.
> > > > >>>> For example, before Xmas I was creating species-specific
> > databases by
> > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > > > >> retrieving
> > > > >>>> the fasta sequences in chunks of 500.
> > > > >>>> Very regularly, in the middle of the fasta there would be a
> > message
> > > > >> about
> > > > >>>> resource unavailable eg.
> > > > >>>>> test_sequence_1
> > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > > > >>>>> test_sequence_2
> > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > > > >>>>
> > > > >>>> Often this wasn't detected until formatdb complained about
> > invalid
> > > > >>>> characters.
> > > > >>>> Inquiries to NCBI as to why this was happening and what to do
> > about
> > > > it
> > > > >>>> returned stupid answers ("do each sequence manually thru the web
> > > > >>>> interface", or "use eUtils").
> > > > >>>> As we have a nice fast network connection, I now prefer to
> > download
> > > > >> very
> > > > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > > > >>>>
> > > > >>>> I can't help but think that NCBI could solve a lot of problems if
> > > > they
> > > > >>>> gzipped the output from eUtils queries - it's something I've
> > > > requested
> > > > >>>> regularly for the last 5 years or so!!
> > > > >>>>
> > > > >>>> --Russell
> > > > >>>>
> > > > >>>>
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > > > >>>>> To: Smithies, Russell
> > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > > > bio.org'
> > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > > accession
> > > > >>>>> number?
> > > > >>>>>
> > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same
> > files
> > > > or
> > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD
> > for
> > > > >> the
> > > > >>>>> details).
> > > > >>>>>
> > > > >>>>> chris
> > > > >>>>>
> > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > > > >>>>>
> > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > > > >>>> flakiness
> > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > > > >> gi_taxid_prot.zip
> > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into
> > a
> > > > hash
> > > > >>>> and
> > > > >>>>> do lookups.
> > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called
> > names.dmp
> > > > >>>> which
> > > > >>>>> lists taxids and descriptions (and synonyms)
> > > > >>>>>>
> > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes
> > so
> > > > I
> > > > >>>>> could do this:
> > > > >>>>>>
> > > > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > > > >>>>>> my $org_name = $names{$taxid};
> > > > >>>>>>
> > > > >>>>>> --Russell
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> -----Original Message-----
> > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > > >> accession
> > > > >>>>>>> number?
> > > > >>>>>>>
> > > > >>>>>>> Bhakti,
> > > > >>>>>>> The following example (using EUtilities) may serve your
> > purpose:
> > > > >>>>>>>
> > > > >>>>>>> use Bio::DB::EUtilities;
> > > > >>>>>>>
> > > > >>>>>>> my (%taxa, @taxa);
> > > > >>>>>>> my (%names, %idmap);
> > > > >>>>>>>
> > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom
> > =>
> > > > >>>>>>> 'nucleotide',
> > > > >>>>>>> # (probably)
> > > > >>>>>>>
> > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > > > >>>>>>>
> > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > > > >>>>>>>                                     -db => 'taxonomy',
> > > > >>>>>>>                                     -dbfrom => 'protein',
> > > > >>>>>>>                                     -correspondence => 1,
> > > > >>>>>>>                                     -id => \@ids);
> > > > >>>>>>>
> > > > >>>>>>> # iterate through the LinkSet objects
> > > > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > > > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> @taxa = @taxa{@ids};
> > > > >>>>>>>
> > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > > > >>>>>>>      -db    => 'taxonomy',
> > > > >>>>>>>      -id    => \@taxa );
> > > > >>>>>>>
> > > > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > > > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> foreach (@ids) {
> > > > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> # %idmap is
> > > > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > > > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > > > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > > > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > > > >>>>>>> #    89318838 => undef    (this record has been removed from
> > the
> > > > db)
> > > > >>>>>>>
> > > > >>>>>>> 1;
> > > > >>>>>>>
> > > > >>>>>>> You probably will need to break up your 30000 into chunks
> > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > > > >>>>>>>
> > > > >>>>>>> sleep 3;
> > > > >>>>>>>
> > > > >>>>>>> or so separating the queries.
> > > > >>>>>>> MAJ
> > > > >>>>>>> ----- Original Message -----
> > > > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > > > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >>>>> number?
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>> Hi,
> > > > >>>>>>>>
> > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > > > name"
> > > > >>>>>>> given
> > > > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > > > accession
> > > > >>>>>>> numbers
> > > > >>>>>>>> for which I need to get the source organisms.  Any kind of
> > help
> > > > >> will
> > > > >>>>> be
> > > > >>>>>>>> appreciated.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks
> > > > >>>>>>>>
> > > > >>>>>>>> BD
> > > > >>>>>>>> _______________________________________________
> > > > >>>>>>>> Bioperl-l mailing list
> > > > >>>>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> _______________________________________________
> > > > >>>>>>> Bioperl-l mailing list
> > > > >>>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>>>
> > > > >>>>
> > > >
> > =======================================================================
> > > > >>>>>> Attention: The information contained in this message and/or
> > > > >>>> attachments
> > > > >>>>>> from AgResearch Limited is intended only for the persons or
> > > > entities
> > > > >>>>>> to which it is addressed and may contain confidential and/or
> > > > >>>> privileged
> > > > >>>>>> material. Any review, retransmission, dissemination or other
> > use
> > > > of,
> > > > >>>> or
> > > > >>>>>> taking of any action in reliance upon, this information by
> > persons
> > > > or
> > > > >>>>>> entities other than the intended recipients is prohibited by
> > > > >>>> AgResearch
> > > > >>>>>> Limited. If you have received this message in error, please
> > notify
> > > > >> the
> > > > >>>>>> sender immediately.
> > > > >>>>>>
> > > > >>>>
> > > >
> > =======================================================================
> > > > >>>>>>
> > > > >>>>>> _______________________________________________
> > > > >>>>>> Bioperl-l mailing list
> > > > >>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>
> > > > >>>>
> > > > >>>> _______________________________________________
> > > > >>>> Bioperl-l mailing list
> > > > >>>> Bioperl-l at lists.open-bio.org
> > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 


From maj at fortinbras.us  Thu Jan 28 14:55:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 14:55:31 -0500
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu><8F49B5ED151143FA86E977B4D4F44265@NewLife>
	<1264706302.5473.48.camel@cjfields.igb.uiuc.edu>
Message-ID: <CD70565A9D3F44C4A0D7BA6462E021E0@NewLife>

Ok, SoapEU now warns on no email; passes email onto the fetch stage
during autofetch -- cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl-l" <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 2:18 PM
Subject: Re: [Bioperl-l] EUtilities policy change


>I think warning is fine for now.  I've reimplemented that so it occurs
> lazily (warns only when a request is actually made).
> 
> Will also change the tool to 'BioPerl' (currently 'bioperl', all lc).
> We'll obviously have to address this in the test suite as well in some
> way, maybe ask for an email if network tests are requested.
> 
> chris 
> 
> On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote:
>> Thanks Chris-- 
>> The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
>> I agree that a default email is a bad idea (tm) (unless maybe it's 
>> hilmar's...?). I'd say a warning on unset email parameters is a responsible
>> "there be dragons" sort of treatment.
>> MAJ
>> ----- Original Message ----- 
>> From: "Chris Fields" <cjfields at illinois.edu>
>> To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
>> Cc: "Mark A. Jensen" <maj at fortinbras.us>
>> Sent: Thursday, January 28, 2010 2:00 PM
>> Subject: EUtilities policy change
>> 
>> 
>> > All,
>> > 
>> > Per NCBI's recent change in eutils user policy (effective June 1):
>> > 
>> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
>> > 
>> > Both the tool and email parameters ('-tool', '-email') are now required
>> > when making requests.  Note this will significantly break all modules
>> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
>> > and Taxonomy stuff as well, IIRC).  This also applies to web services
>> > (SOAP-based access).  Mark, not sure how this affects your SOAP-based
>> > modules.
>> > 
>> > I have reconfigured Bio::DB::EUtilities to follow this policy; the
>> > default tool setting has been 'bioperl' and will remain that way.
>> > However, there has been no default email, therefore setting this is now
>> > required for future requests unless we (the bioperl devs) decide there
>> > is a safe default email to utilize.  My gut tells me, however, that
>> > falling back to a default email opens up a can of worms for the devs and
>> > is very likely a 'BAD IDEA'(TM).  
>> > 
>> > Regardless, be aware that, after June 1, NCBI will very likely exclude
>> > requests with no email and will notify users who are considered to be
>> > violating their policies.
>> > 
>> > I will likely make further changes to Bio::DB::EUtilities in the
>> > meantime to ensure that using the tools by default will not violate
>> > NCBI's policy (e.g. override this at your own risk).  
>> > 
>> > chris
>> > 
>> > 
>> >
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From chapmanb at 50mail.com  Thu Jan 28 15:35:05 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 28 Jan 2010 15:35:05 -0500
Subject: [Bioperl-l] OpenBio solution challenge: Project updates at BOSC 2010
Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu>

Hello all;
The BOSC 2010 organizing committee is hard at work getting prepared for this
July's meeting in Boston:

http://www.open-bio.org/wiki/BOSC_2010

One of the items we've traditionally had at the conference is a project 
update from each of the OpenBio affiliated groups. This year, we're thinking
about organizing these talks around a central theme: the OpenBio solution
challenge. We start with a biological question of general interest, and each
of the project talks would focus around how you would solve that problem 
using your toolkit and programming language.

This is meant to provide a challenge for OpenBio contributors, a nice tutorial
style overview of various projects and approaches for other programmers, and a
fun opportunity to compete and learn from other projects. Conference attendees
will vote on their favorite solution, with the winner receiving fame and
fortune (warning: fortune not guaranteed).

For this to be successful, it of course requires interest and enthusiasm from
y'all fine folks involved with the projects. Specifically:

- Is there interest from your group in participating in the challenge? You'll
  want at least a few people to work on it, and someone to give a presentation 
  at BOSC.

- Do you have suggestions on a good theme or specific biological problem to
  tackle? We'll hope to pick something in a sweet spot that is challenging 
  enough to be of interest, yet reasonable for presentation and preparation.

Let's discuss ideas and get this together. Since the schedule for BOSC is
developing rapidly, please give us an idea if you're interested by
February 12th, and copy responses to the BOSC mailing list as a central 
place for discussion.

bosc at open-bio.org

Thanks,
Brad, Michael, and the BOSC organizing committee


From markw at illuminae.com  Thu Jan 28 16:17:44 2010
From: markw at illuminae.com (Mark Wilkinson)
Date: Thu, 28 Jan 2010 13:17:44 -0800
Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project
 updates at BOSC 2010
In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
Message-ID: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>


Brad, this sounds exciting!

One thing strikes me, though - by asking for the sub-projects to propose
the "grand challenge" themselves the one thing you can guarantee is that
the "grand challenge" is solvable (or more likely, already solved!)

Other "grand challenge" kinds of meetings have an independent third party
pose the problem that has to be solved, and then all groups work toward a
solution and compare their results.  This would, IMO, be more revealing of
the "state of the art" in each Open-Bio project, and point out where the
weaknesses are that we should be focusing on...  Someone (for example,
you!) could act as the moderator to ensure that the "grand challenge" was
at least a reasonable one, within the scope of what an Open-Bio project
*should* be able to solve...

Just my CAD $0.02

Mark


On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman <chapmanb at 50mail.com>  
wrote:

> Hello all;
> The BOSC 2010 organizing committee is hard at work getting prepared for  
> this
> July's meeting in Boston:
>
> http://www.open-bio.org/wiki/BOSC_2010
>
> One of the items we've traditionally had at the conference is a project
> update from each of the OpenBio affiliated groups. This year, we're  
> thinking
> about organizing these talks around a central theme: the OpenBio solution
> challenge. We start with a biological question of general interest, and  
> each
> of the project talks would focus around how you would solve that problem
> using your toolkit and programming language.
>
> This is meant to provide a challenge for OpenBio contributors, a nice  
> tutorial
> style overview of various projects and approaches for other programmers,  
> and a
> fun opportunity to compete and learn from other projects. Conference  
> attendees
> will vote on their favorite solution, with the winner receiving fame and
> fortune (warning: fortune not guaranteed).
>
> For this to be successful, it of course requires interest and enthusiasm  
> from
> y'all fine folks involved with the projects. Specifically:
>
> - Is there interest from your group in participating in the challenge?  
> You'll
>   want at least a few people to work on it, and someone to give a  
> presentation
>   at BOSC.
>
> - Do you have suggestions on a good theme or specific biological problem  
> to
>   tackle? We'll hope to pick something in a sweet spot that is  
> challenging
>   enough to be of interest, yet reasonable for presentation and  
> preparation.
>
> Let's discuss ideas and get this together. Since the schedule for BOSC is
> developing rapidly, please give us an idea if you're interested by
> February 12th, and copy responses to the BOSC mailing list as a central
> place for discussion.
>
> bosc at open-bio.org
>
> Thanks,
> Brad, Michael, and the BOSC organizing committee
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


-- 
Mark D Wilkinson, PI Bioinformatics
Assistant Professor, Medical Genetics
The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research
Providence Heart + Lung Institute
University of British Columbia - St. Paul's Hospital
Vancouver, BC, Canada


From HWillis at scripps.edu  Thu Jan 28 20:03:10 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 Jan 2010 20:03:10 -0500
Subject: [Bioperl-l] [Biojava-dev] [MOBY-dev] OpenBio solution
 challenge: Project updates at BOSC 2010
In-Reply-To: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
	<op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu>

Brad

I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution.

Scooter


On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote:

> 
> Brad, this sounds exciting!
> 
> One thing strikes me, though - by asking for the sub-projects to propose
> the "grand challenge" themselves the one thing you can guarantee is that
> the "grand challenge" is solvable (or more likely, already solved!)
> 
> Other "grand challenge" kinds of meetings have an independent third party
> pose the problem that has to be solved, and then all groups work toward a
> solution and compare their results.  This would, IMO, be more revealing of
> the "state of the art" in each Open-Bio project, and point out where the
> weaknesses are that we should be focusing on...  Someone (for example,
> you!) could act as the moderator to ensure that the "grand challenge" was
> at least a reasonable one, within the scope of what an Open-Bio project
> *should* be able to solve...
> 
> Just my CAD $0.02
> 
> Mark
> 
> 
> 
> On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman <chapmanb at 50mail.com>  
> wrote:
> 
>> Hello all;
>> The BOSC 2010 organizing committee is hard at work getting prepared for  
>> this
>> July's meeting in Boston:
>> 
>> http://www.open-bio.org/wiki/BOSC_2010
>> 
>> One of the items we've traditionally had at the conference is a project
>> update from each of the OpenBio affiliated groups. This year, we're  
>> thinking
>> about organizing these talks around a central theme: the OpenBio solution
>> challenge. We start with a biological question of general interest, and  
>> each
>> of the project talks would focus around how you would solve that problem
>> using your toolkit and programming language.
>> 
>> This is meant to provide a challenge for OpenBio contributors, a nice  
>> tutorial
>> style overview of various projects and approaches for other programmers,  
>> and a
>> fun opportunity to compete and learn from other projects. Conference  
>> attendees
>> will vote on their favorite solution, with the winner receiving fame and
>> fortune (warning: fortune not guaranteed).
>> 
>> For this to be successful, it of course requires interest and enthusiasm  
>> from
>> y'all fine folks involved with the projects. Specifically:
>> 
>> - Is there interest from your group in participating in the challenge?  
>> You'll
>>  want at least a few people to work on it, and someone to give a  
>> presentation
>>  at BOSC.
>> 
>> - Do you have suggestions on a good theme or specific biological problem  
>> to
>>  tackle? We'll hope to pick something in a sweet spot that is  
>> challenging
>>  enough to be of interest, yet reasonable for presentation and  
>> preparation.
>> 
>> Let's discuss ideas and get this together. Since the schedule for BOSC is
>> developing rapidly, please give us an idea if you're interested by
>> February 12th, and copy responses to the BOSC mailing list as a central
>> place for discussion.
>> 
>> bosc at open-bio.org
>> 
>> Thanks,
>> Brad, Michael, and the BOSC organizing committee
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
> 
> 
> -- 
> Mark D Wilkinson, PI Bioinformatics
> Assistant Professor, Medical Genetics
> The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research
> Providence Heart + Lung Institute
> University of British Columbia - St. Paul's Hospital
> Vancouver, BC, Canada
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From biopython at maubp.freeserve.co.uk  Fri Jan 29 05:36:40 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 29 Jan 2010 10:36:40 +0000
Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project
	updates at BOSC 2010
In-Reply-To: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
	<op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com>

Hi all,

This is a great topic but should be continue it on just the one mailing list?
Is there a suitable BOSC list, or how about the general Open Bio list?

On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson <markw at illuminae.com> wrote:
>
> Brad, this sounds exciting!
>
> One thing strikes me, though - by asking for the sub-projects to propose
> the "grand challenge" themselves the one thing you can guarantee is that
> the "grand challenge" is solvable (or more likely, already solved!)
>
> Other "grand challenge" kinds of meetings have an independent third party
> pose the problem that has to be solved, and then all groups work toward a
> solution and compare their results. ?This would, IMO, be more revealing of
> the "state of the art" in each Open-Bio project, and point out where the
> weaknesses are that we should be focusing on... ?Someone (for example,
> you!) could act as the moderator to ensure that the "grand challenge" was
> at least a reasonable one, within the scope of what an Open-Bio project
> *should* be able to solve...
>
> Just my CAD $0.02
>
> Mark

One possible problem with having Brad act as moderator is his ties to
Biopython (plus it would be a shame if we'd be one man down for trying
to solve the challenges - grin). Having a project representative "sign off"
on the challenge might work - or simply the whole of the BOSC committee
which is quite balanced. Alternatively some kind of panel of challenges does
seem a good way to reduce individual project bias (as suggest by Scooter),
but there will still need to be a judging committee.

I'm curious what kind of challenges the BOSC committee had in mind -
would something like taking a newly sequence bacteria and producing
an automated annotation as a GenBank, EMBL, or GFF  file be too
ambitious for example? There are already several major projects
to do this e.g. RAST http://rast.nmpdr.org/

Peter
(@Biopython)


From mike.stubbington at bbsrc.ac.uk  Fri Jan 29 08:25:25 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Fri, 29 Jan 2010 13:25:25 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
In-Reply-To: <FD6E9A89F6034CCB856E22553ED893D7@NewLife>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
	<FD6E9A89F6034CCB856E22553ED893D7@NewLife>
Message-ID: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>

Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
	-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with 

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
> error running blastn
> 
> 
> Hi,
> 
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
> 
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
> line 532.
> 
> STACK Bio::Tools::Run::WrapperBase::_run 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
> 
> M
> 
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
> 
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
>> running blastn
>> 
>> 
>> Dear all,
>> 
>> I am attempting to blast some primers against the mouse genome. I have created 
>> a
>> local mouse genome blast database and I can search against it using 'blastn' 
>> at
>> the command line.
>> 
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>> 
>> I then create a StandAloneBlastPlus factory using the following code?
>> 
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>> 
>> and then attempt to blast my primers using this?
>> 
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>> 
>> This fails with the following error?
>> 
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
>> line 532.
>> 
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>> 
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>> 
>> I'd appreciate any help you might be able to provide to shed light on this.
>> 
>> Thanks in advance,
>> 
>> Mike
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Fri Jan 29 08:36:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 29 Jan 2010 08:36:54 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
	<FD6E9A89F6034CCB856E22553ED893D7@NewLife>
	<ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
Message-ID: <DF05D2C7E8CC4CF18E6AE56077EB738A@NewLife>

Hi Mike-
Well, at least we're getting more informative errors. I think it's
still my bad; will look again. Both of your calls should work.
(thanks for the positive control too)
Thanks for your patience and the help--
MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Brian Osborne" <bosborne11 at verizon.net>
Sent: Friday, January 29, 2010 8:25 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file 
found for nucleotide database [MouseGenome] in search path 
[/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I 
construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
> error running blastn
>
>
> Hi,
>
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, 
> <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> M
>
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
>
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
>> error
>> running blastn
>>
>>
>> Dear all,
>>
>> I am attempting to blast some primers against the mouse genome. I have 
>> created
>> a
>> local mouse genome blast database and I can search against it using 'blastn'
>> at
>> the command line.
>>
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>>
>> I then create a StandAloneBlastPlus factory using the following code?
>>
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>>
>> and then attempt to blast my primers using this?
>>
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>>
>> This fails with the following error?
>>
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, 
>> <DATA>
>> line 532.
>>
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>>
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>>
>> I'd appreciate any help you might be able to provide to shed light on this.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From maj at fortinbras.us  Fri Jan 29 08:47:48 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 29 Jan 2010 08:47:48 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife><05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk><FD6E9A89F6034CCB856E22553ED893D7@NewLife>
	<ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
Message-ID: <2B7BF6CD46AE441AB24203E169D9C503@NewLife>

Mike et al--
I've entered this as Bug #3003 on http://bugzilla.bioperl.org;
we'll do further ping-pongs on this issue via the comment facility
there--
cheers MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; <Brian at portal.open-bio.org>; "Osborne" 
<bosborne11 at verizon.net>
Sent: Friday, January 29, 2010 8:25 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file 
found for nucleotide database [MouseGenome] in search path 
[/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I 
construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
> error running blastn
>
>
> Hi,
>
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, 
> <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> M
>
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
>
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
>> error
>> running blastn
>>
>>
>> Dear all,
>>
>> I am attempting to blast some primers against the mouse genome. I have 
>> created
>> a
>> local mouse genome blast database and I can search against it using 'blastn'
>> at
>> the command line.
>>
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>>
>> I then create a StandAloneBlastPlus factory using the following code?
>>
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>>
>> and then attempt to blast my primers using this?
>>
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>>
>> This fails with the following error?
>>
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, 
>> <DATA>
>> line 532.
>>
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>>
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>>
>> I'd appreciate any help you might be able to provide to shed light on this.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From help at gmod.org  Fri Jan 29 17:03:48 2010
From: help at gmod.org (Dave Clements, GMOD Help Desk)
Date: Fri, 29 Jan 2010 14:03:48 -0800
Subject: [Bioperl-l] 2010 GMOD Summer School - Americas
In-Reply-To: <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com>
References: <71ee57c71001291351q47994b82w10dffb390dbf2837@mail.gmail.com>
	<71ee57c71001291354m68548823s3e3fbd2e49e9b332@mail.gmail.com>
	<71ee57c71001291356p5e7f1aadi2bf437c93014a393@mail.gmail.com>
	<71ee57c71001291357h67112e2fkcf835687e59f66ae@mail.gmail.com>
	<71ee57c71001291358k74781b08n232534d8895c5ec1@mail.gmail.com>
	<71ee57c71001291400y28e40eb6i112ea91df977dc67@mail.gmail.com>
	<71ee57c71001291400n6133982eh3a02293ff741900b@mail.gmail.com>
	<71ee57c71001291401y505b56baic61c11754d88a444@mail.gmail.com>
	<71ee57c71001291402s23e3f2e9w2562d6acf85bd4ae@mail.gmail.com>
	<71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com>
Message-ID: <71ee57c71001291403s19be18f3s3a1d5a314c74def@mail.gmail.com>

Hello all,

I am pleased to announce that we are now accepting applications for:

? 2010 GMOD Summer School - Americas
? ? 6-9 May 2010
? ? NESCent, Durham, NC, USA
? ? http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas

This will be a hands-on multi-day course aimed at teaching new GMOD
users/administrators how to get GMOD Components up and running. The
course will introduce participants to the GMOD project and then focus
on installation, configuration and integration of popular GMOD
Components. The course will be held May 6-9, at NESCent in Durham, NC.

These components will be covered:
? ?* Apollo - genome annotation editor
? ?* Chado - a modular and extensible database schema
? ?* Galaxy - workflow system
? ?* GBrowse - the Generic Genome Browser
? ?* GBrowse_syn - A generic synteny browser
? ?* JBrowse - genome browser
? ?* MAKER - genome annotation pipeline
? ?* Tripal - web front end for Chado

The deadline for applying is the end of Friday, February 22. Admission
is competitive and is based on the strength of the application
(especially the statement of interest). In 2009 there were over 50
applications for the 25 slots. Any applications received after the
deadline will be placed on the waiting list.

See the course page for details and an application link:
?http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas

Thanks,

Dave Clements
GMOD Help Desk

PS: We are also investigating holding a GMOD course in the
Asia/Pacific region, sometime this fall. Watch the GMOD mailing lists
and the GMOD News page/RSS feed for updates.
--
Please keep responses on the list!
http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas
http://gmod.org/wiki/GMOD_News
Was this helpful? http://gmod.org/wiki/Help_Desk_Feedback


From bhakti.dwivedi at gmail.com  Sat Jan 30 17:38:40 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sat, 30 Jan 2010 17:38:40 -0500
Subject: [Bioperl-l] how to map blast results on to the genome?
Message-ID: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>

Does anyone know how I can graphically map the blast results (m -8 format)
to the genome using bio-perl?

Thanks

Bhakti


From jason at bioperl.org  Sat Jan 30 18:56:14 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 30 Jan 2010 15:56:14 -0800
Subject: [Bioperl-l] how to map blast results on to the genome?
In-Reply-To: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>
References: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>
Message-ID: <68937A7D-291F-419A-9ED7-7A87D9B4C78A@bioperl.org>

Did you try BioGraphics and read the HOWTO on it -- http://bioperl.org/wiki/HOWTO:Graphics
On Jan 30, 2010, at 2:38 PM, Bhakti Dwivedi wrote:

> Does anyone know how I can graphically map the blast results (m -8  
> format)
> to the genome using bio-perl?
>
> Thanks
>
> Bhakti
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From David.Messina at sbc.su.se  Sun Jan 31 12:43:52 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 31 Jan 2010 18:43:52 +0100
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
	<18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
Message-ID: <BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>

Hey Rui,

My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet.

I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two.

Dave


From bluecurio at gmail.com  Sun Jan 31 22:22:37 2010
From: bluecurio at gmail.com (Daniel Renfro)
Date: Sun, 31 Jan 2010 21:22:37 -0600
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects
Message-ID: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com>

Hello all,

A colleague and I have been working on a (Bio)Perl package to compare two
Seq objects. This is in response to a need we found in our lab -- we wanted
to see the changes to GenBank files through time, but wanted an automated
way to do this. This led to what I'm calling the SeqDiff.pm package. I
thought it would be a good idea to inform the community and get some
feedback.

The package takes two Seq objects as arguments, arbitrarily called "old" and
"new." It then matches the features from the old object with the new object.
This is done based on some criteria -- in our case we decided the features
must be of the same type (have the same primary_tag) and have at least one
matching database cross-reference (db_xref) in common.  The left-over
features (ones that did not have a match) are dropped into arrays called
"lost" and "gained." The matching is done in about NlogN time, as each
matching pair are removed from subsequent searches.

The matched features and iterated through and the differences are
calculated. Each feature is examined recursively and any differences are
reported. Optionally you can give the new() method a flag so that everything
is returned (differences and similarities.) You can set callbacks for
different types of objects (like anything that isa('Bio::LocationI')) if you
want a custom comparison for specific BioPerl objects. This comparison step
is the computationally slow part, and currently everything is held in
memory. I think it'd be better to do this piece-meal, using the BioPerl-ish
next() and last() methods.

Maybe this was a little verbose, but that is the SeqDiff package in a
nutshell. I hope to soon release v1.0. If you have any questions or comments
I'd love to hear them.

-Daniel Renfro

Hu Lab Research Associate
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4055


From maj at fortinbras.us  Sun Jan 31 22:47:05 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 31 Jan 2010 22:47:05 -0500
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects
In-Reply-To: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com>
References: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com>
Message-ID: <5DC96D65B6A447C3802AF5D745FF4AA4@NewLife>

Daniel-- this sounds interesting and useful, I +1 it. Your intuition about
in-memory vs streaming sounds correct to me; features can be many, and
diffing many (MANY) sequences may bork. Maybe our feature-rich users
can chime in. (...however, I did just hear about a magic spell called 
'File::Map',
might check that out on CPAN.)
cheers- MAJ
----- Original Message ----- 
From: "Daniel Renfro" <bluecurio at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 31, 2010 10:22 PM
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects


> Hello all,
>
> A colleague and I have been working on a (Bio)Perl package to compare two
> Seq objects. This is in response to a need we found in our lab -- we wanted
> to see the changes to GenBank files through time, but wanted an automated
> way to do this. This led to what I'm calling the SeqDiff.pm package. I
> thought it would be a good idea to inform the community and get some
> feedback.
>
> The package takes two Seq objects as arguments, arbitrarily called "old" and
> "new." It then matches the features from the old object with the new object.
> This is done based on some criteria -- in our case we decided the features
> must be of the same type (have the same primary_tag) and have at least one
> matching database cross-reference (db_xref) in common.  The left-over
> features (ones that did not have a match) are dropped into arrays called
> "lost" and "gained." The matching is done in about NlogN time, as each
> matching pair are removed from subsequent searches.
>
> The matched features and iterated through and the differences are
> calculated. Each feature is examined recursively and any differences are
> reported. Optionally you can give the new() method a flag so that everything
> is returned (differences and similarities.) You can set callbacks for
> different types of objects (like anything that isa('Bio::LocationI')) if you
> want a custom comparison for specific BioPerl objects. This comparison step
> is the computationally slow part, and currently everything is held in
> memory. I think it'd be better to do this piece-meal, using the BioPerl-ish
> next() and last() methods.
>
> Maybe this was a little verbose, but that is the SeqDiff package in a
> nutshell. I hope to soon release v1.0. If you have any questions or comments
> I'd love to hear them.
>
> -Daniel Renfro
>
> Hu Lab Research Associate
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4055
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rui.faria at upf.edu  Sun Jan 31 12:17:09 2010
From: rui.faria at upf.edu (Rui Faria)
Date: Sun, 31 Jan 2010 18:17:09 +0100 (CET)
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
Message-ID: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>

Hi Dave,

we reported the bug on codeml about errors when the user gives its own tree file, some time ago. Did you have any chances to look at it?

We basically wanted to know your opinion on where the problem may be, since we are not the most experienced "perlers" on the planet :) 

I'm asking this because we have to deal with that right now. If someone could check where is the problem, to understand if it has an easy solution, that would be of great help.

Best,

Rui


-----Mensaje Original-----
De Dave Messina <David.Messina at sbc.su.se>
Enviado Jue 31/12/2009 11:55 AM
Para Rui Faria <rui.faria at upf.edu>
Cc Jason Stajich <jason at bioperl.org>; sandraneto_ at hotmail.com; bioperl-l List <bioperl-l at lists.open-bio.org>
Asunto Re: question about a PAML module

Hi Rui and Sandra,

Could you file this as a bug report at 

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

?

Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report:
- sample input files (a sequence file and a tree file, probably)
- a script which reproduces the problem
- the output (error messages) like you show below

When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this.

There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon.


Dave


From rui.faria at upf.edu  Sun Jan 31 13:56:56 2010
From: rui.faria at upf.edu (Rui Faria)
Date: Sun, 31 Jan 2010 19:56:56 +0100 (CET)
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
	<18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
	<BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>
Message-ID: <11398434.1264964216856.JavaMail.oracle@rif1.s.upf.edu>

Many thanks!

We hope one day that we become experts we can retribute!

Rui

-----Mensaje Original-----
De Dave Messina <David.Messina at sbc.su.se>
Enviado Dom 31/01/2010 06:43 PM
Para Rui Faria <rui.faria at upf.edu>
Cc Jason Stajich <jason at bioperl.org>; sandraneto_ at hotmail.com; bioperl-l List <bioperl-l at lists.open-bio.org>
Asunto Re: question about a PAML module

Hey Rui,

My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet.

I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two.

Dave


From avilella at gmail.com  Sat Jan  2 03:57:28 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sat, 2 Jan 2010 08:57:28 +0000
Subject: [Bioperl-l] Downloading from dbEST by taxon range
Message-ID: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>

Hi all and happy 2010 for those that follow the Gregorian calendar,

A question that is a bit in between bioperl and NCBI. I would like to use
bioperl to download sequences fom dbEST. For that, my idea is to use
Bio::DB::Genbank and get the sequences by gi id.

Now, I want my script to download sequences for a given NCBI taxonomy clade.

For example, if I want to download all fish (clupeocephala) sequences in dbEST,
I can browse it around with the dbEST webpage using "clupeocephala[taxonomy]",
so I am thinking there should be a way to do it programmatically.

How can I query NCBI dbEST through bioperl to give me the list of GI ids I am
looking for given a taxon id?

Thanks in advance,

Albert.


From jason at bioperl.org  Sat Jan  2 11:35:22 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 2 Jan 2010 08:35:22 -0800
Subject: [Bioperl-l] Downloading from dbEST by taxon range
In-Reply-To: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
Message-ID: <D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>

DId you try Bio::DB::Query::GenBank ?
You'd want to use -db => 'nucest' and then you just put in an Entrez  
query as per the example.  you can include dates in the query so you  
can do updates to your locally retrieved data in a script that runs  
periodically.

-jason
On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:

> Hi all and happy 2010 for those that follow the Gregorian calendar,
>
> A question that is a bit in between bioperl and NCBI. I would like  
> to use
> bioperl to download sequences fom dbEST. For that, my idea is to use
> Bio::DB::Genbank and get the sequences by gi id.
>
> Now, I want my script to download sequences for a given NCBI  
> taxonomy clade.
>
> For example, if I want to download all fish (clupeocephala)  
> sequences in dbEST,
> I can browse it around with the dbEST webpage using  
> "clupeocephala[taxonomy]",
> so I am thinking there should be a way to do it programmatically.
>
> How can I query NCBI dbEST through bioperl to give me the list of GI  
> ids I am
> looking for given a taxon id?
>
> Thanks in advance,
>
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From avilella at gmail.com  Sun Jan  3 04:08:33 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 3 Jan 2010 09:08:33 +0000
Subject: [Bioperl-l] Downloading from dbEST by taxon range
In-Reply-To: <D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>
References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
	<D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>
Message-ID: <358f4d651001030108p6a92fb27k5fa39be6bebb3a9c@mail.gmail.com>

Thanks Jason!
For the sake of completion, here is the script I needed:

---------------------
#!/usr/bin/perl
use strict;

use Bio::SeqIO;
use Bio::DB::Taxonomy;
use Bio::DB::Query::GenBank;
use Bio::DB::GenBank;
use Bio::SeqIO;
use Getopt::Long;

my $keyword_type = 'EST';
my $outdir = '.';
my $taxon_name = undef;
my $db_type = 'nucest';

GetOptions('keyword_type:s' => \$keyword_type,
           't|taxon_name:s' => \$taxon_name,
           'db_type:s' => \$db_type,
           'outdir:s' => \$outdir);

my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]";
my $db = Bio::DB::Query::GenBank->new
  (-db => $db_type,
   -query => $query_string,
   -mindate => '2007',
   -maxdate => '2010');

my $taxon_name_string = $taxon_name; $taxon_name_string =~ s/\ /\_/g;
my $outfile = $outdir . "/" . $taxon_name_string . ".". $db_type . ".fasta";
my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta');

print $db->count,"\n";
my $gb = Bio::DB::GenBank->new();
my $stream = $gb->get_Stream_by_query($db);
while (my $seq = $stream->next_seq) {
  # Filtering reads shorter than 800
  next unless (length($seq->seq) > 800);
  $out->write_seq($seq);
}
$out->close;
---------------------

On Sat, Jan 2, 2010 at 4:35 PM, Jason Stajich <jason at bioperl.org> wrote:
> DId you try Bio::DB::Query::GenBank ?
> You'd want to use -db => 'nucest' and then you just put in an Entrez query
> as per the example. ?you can include dates in the query so you can do
> updates to your locally retrieved data in a script that runs periodically.
>
> -jason
> On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:
>
>> Hi all and happy 2010 for those that follow the Gregorian calendar,
>>
>> A question that is a bit in between bioperl and NCBI. I would like to use
>> bioperl to download sequences fom dbEST. For that, my idea is to use
>> Bio::DB::Genbank and get the sequences by gi id.
>>
>> Now, I want my script to download sequences for a given NCBI taxonomy
>> clade.
>>
>> For example, if I want to download all fish (clupeocephala) sequences in
>> dbEST,
>> I can browse it around with the dbEST webpage using
>> "clupeocephala[taxonomy]",
>> so I am thinking there should be a way to do it programmatically.
>>
>> How can I query NCBI dbEST through bioperl to give me the list of GI ids I
>> am
>> looking for given a taxon id?
>>
>> Thanks in advance,
>>
>> Albert.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From Jean-Marc.Frigerio at pierroton.inra.fr  Mon Jan  4 09:12:18 2010
From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA)
Date: Mon, 04 Jan 2010 15:12:18 +0100
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
Message-ID: <4B41F742.2030209@pierroton.inra.fr>

> Message: 1
> Date: Thu, 31 Dec 2009 11:26:45 +1800
> From: Peng Yu <pengyu.ut at gmail.com>
> Subject: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: bioperl-l at lists.open-bio.org
> Message-ID:
> 	<366c6f340912300926k5af5cc88nc3c3babda541fd1 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 30 Dec 2009 13:04:53 -0500
> From: Sean Davis <sdavis2 at mail.nih.gov>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: "bioperl-l at lists.open-bio.org" <bioperl-l at lists.open-bio.org>
> Message-ID:
> 	<264855a00912301004t396e0d4fwf9d291c5d82c3fb9 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
> 
> In perl, you can use an array to store the records.  You could also
> use a hash if you have reasonable keys for the entries.
> 
> Sean
> 
> 
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 30 Dec 2009 11:58:54 -0800
> From: Jason Stajich <jason at bioperl.org>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: BioPerl List <bioperl-l at lists.open-bio.org>
> Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B at bioperl.org>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> 
> or use a database object so you can retrieve sequences that have a  
> particular id. See Bio::DB::Fasta
> On Dec 30, 2009, at 10:04 AM, Sean Davis wrote:
> 
>> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> With Bio::SeqIO, I can only read in the records in a fasta file one  
>>> by
>>> one. This is preferable if there are many records in a file.
>>>
>>> But I also want to read all the records in. I could use a while loop
>>> to read all records in. But could somebody let me know if there is a
>>> function in bioperl that can read in all the record at once and  
>>> return
>>> me an object?
>> In perl, you can use an array to store the records.  You could also
>> use a hash if you have reasonable keys for the entries.
>>
>> Sean
>>
>>
>>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Wed, 30 Dec 2009 16:20:31 -0500
> From: "Mark A. Jensen" <maj at fortinbras.us>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: "Peng Yu" <pengyu.ut at gmail.com>, <bioperl-l at lists.open-bio.org>
> Message-ID: <2646F627E6D14AADB412A6E6B51E24DA at NewLife>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
> 
> I think you might want Bio::AlignIO:
> 
> $alnio = Bio::AlignIO->new(-file=> 'my.fas' );
> $aln = $alnio->next_aln;
> @seqs = $aln->each_seqs;
> 
> MAJ
> ----- Original Message ----- 
> From: "Peng Yu" <pengyu.ut at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 30, 2009 12:26 PM
> Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
> 
> 
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
>>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Hi,

I wrote and currently use a module I named Bio::SeqIO::multifasta, which 
is basically a copy of Bio::SeqIO::fasta plus a few methods:
get_by_id(), get_by_order(), first_seq() and previous_seq()

It would need review, validation etc. Do I submit it to Bugzilla ?

	-- jmf


From jason at bioperl.org  Mon Jan  4 11:03:45 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 4 Jan 2010 08:03:45 -0800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <4B41F742.2030209@pierroton.inra.fr>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
	<4B41F742.2030209@pierroton.inra.fr>
Message-ID: <16D7C8C1-E4BE-406F-9D60-379876178CAB@bioperl.org>

We typically think of SeqIO as parsing a stream of data, not being  
reliant on it being a file which is what these methods would be  
implying I think. Sounds a lot like a database - does Bio::DB::Fasta  
not provide some of the functionality you need by these methods?  I  
realize there isn't a by_order() but the get_by_id() is implemented to  
allow random access.

-jason

>
> Hi,
>
> I wrote and currently use a module I named Bio::SeqIO::multifasta,  
> which is basically a copy of Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
>
> It would need review, validation etc. Do I submit it to Bugzilla ?
>
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From avilella at gmail.com  Mon Jan  4 15:00:24 2010
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 4 Jan 2010 20:00:24 +0000
Subject: [Bioperl-l] indexed fastq files
Message-ID: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>

Hi all,

What is the best way to index fastq files, so that once clustered, I
can provide a list of seq_ids and get
them back in fastq format from the indexed db?

Cheers,

Albert.


From cjfields at illinois.edu  Mon Jan  4 16:59:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 4 Jan 2010 15:59:50 -0600
Subject: [Bioperl-l] indexed fastq files
In-Reply-To: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>
References: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>
Message-ID: <07EBA105-6A34-490C-B0B9-4772DF386CBA@illinois.edu>

Bio::Index::Fastq, maybe?  To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work.

chris

On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote:

> Hi all,
> 
> What is the best way to index fastq files, so that once clustered, I
> can provide a list of seq_ids and get
> them back in fastq format from the indexed db?
> 
> Cheers,
> 
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan  4 22:54:03 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 4 Jan 2010 21:54:03 -0600
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <4B41F742.2030209@pierroton.inra.fr>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
	<4B41F742.2030209@pierroton.inra.fr>
Message-ID: <1BAE5508-0DB7-41B4-92E3-49256582131F@illinois.edu>

Jean-Marc,

You can do that, yes.  Just curious, but have you looked at the various flat file indexing modules for FASTA?  Bio::DB::Fasta and Bio::Index::Fasta are commonly used and allow lookups by primary ID (and I think in some cases secondary IDs).

chris

On Jan 4, 2010, at 8:12 AM, Jean-Marc Frigerio INRA wrote:

> ...
> 
> Hi,
> 
> I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
> 
> It would need review, validation etc. Do I submit it to Bugzilla ?
> 
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fs5 at sanger.ac.uk  Wed Jan  6 17:16:13 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 06 Jan 2010 22:16:13 +0000
Subject: [Bioperl-l] Bio::DB::Sam strange behaviour for read pairs
Message-ID: <4B450BAD.3050807@sanger.ac.uk>

I'm trying to extract paired reads from a BAM file that span a given 
region. I would then like to get the two read ends of the sequenced 
clone that spans the region.
I use Bio::DB::Sam->get_features_by_location for this and it does give 
me the correct read pairs as a region match but it doesn't give me both 
read pairs in all cases.

Here is the script:

#!/usr/bin/perl
use Bio::DB::Sam;

my $usage = "usage: $0 BAMFILE CHROMOSOME STARTPOS ENDPOS\n" ;
my ($bam_file,$chrom,$start,$end) = @ARGV ;
die $usage unless $bam_file && $chrom && $start && $end;

my $bam = Bio::DB::Sam->new(-bam => $bam_file);

my @pairs = $bam->get_features_by_location(
    -type   => 'read_pair',
    -seq_id => $chrom,
    -start  => $start,
    -end    => $end);

print "region: $chrom:$start..$end\n" ;
foreach my $pair (@pairs) {
  print "  pair: id: ".$pair->id.", start".$pair->start.', 
end:'.$pair->end."\n";
  my ($first_mate,$second_mate) = $pair->get_SeqFeatures;
  print "    first_mate: start:".$first_mate->start.', 
end:'.$first_mate->end."\n";
  if ($second_mate){
    print "    second_mate: start:".$second_mate->start.', 
end:'.$second_mate->end."\n";
  } else {
    print "    no second mate\n";
  }
}

And here are the matching pairs that it produces with one of my files 
for the region tal12:22479..29232:
region: 
tal12:22479..29232                                                                                                                          

  pair: id: tal-2446c08, start17496, 
end:29423                                                                                                      

    first_mate: start:28540, 
end:29423                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2463d10, start23534, 
end:31363                                                                                                      

    first_mate: start:23534, 
end:24448                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2371c09, start20860, 
end:28230                                                                                                      

    first_mate: start:27604, 
end:28230                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2440b06, start19232, 
end:27099                                                                                                      

    first_mate: start:26025, 
end:27099                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2327g09, start18909, 
end:26129                                                                                                      

    first_mate: start:25354, 
end:26129                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2381b05, start25658, 
end:35054                                                                                                      

    first_mate: start:25658, 
end:26295                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2377c11, start20898, 
end:28230                                                                                                      

    first_mate: start:27473, 
end:28230                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2426e12, start21975, 
end:27562                                                                                                      

    first_mate: start:21975, 
end:23008                                                                                                               

    second_mate: start:26396, 
end:27562                                                                                                              

  pair: id: tal-2365h10, start22843, 
end:31944                                                                                                      

    first_mate: start:22843, 
end:23184                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2388h09, start19016, 
end:28238                                                                                                      

    first_mate: start:27475, 
end:28238                                                                                                               

    no second mate                   


So it finds a lot of pairs that span the region and the start/end from 
the pair is also correct but it only gives me both individual mates in 
one case:
  pair: id: tal-2426e12, start21975, 
end:27562                                                                                                      

    first_mate: start:21975, 
end:23008                                                                                                               

    second_mate: start:26396, end:27562

In this case, both pairs are actually inside the query region (at least 
partially) whereas in the other cases, one of the mates is not inside, 
e.g. this one:

  pair: id: tal-2388h09, start19016, 
end:28238                                                                                                      

    first_mate: start:27475, 
end:28238                                                                                                               

    no second mate
  
 > get this read pair from the BAM file:
$ samtools view clones.bam | grep tal-2388h09

tal-2388h09    99      tal12  19016   205     
36H9M1D14M1D664M1D16M1D21M1D28M1D15M1D10M1D12M1D7M1D8M1D5M      =       
27475   9223    
CTTTGGATGAAATAGTTTTTAAATAATACTTATTAAATATTAAATATATAACACATAAATAAGTATTGATGCAAATTTTAAAGTATTATAGAAAACTAGGTTTGATTATATTGTTATACTGTACTTTAAGAGGAGAGAGATAAGATATCTTTGCTCTTTTAATATATAAATTTAGATAAATATTCGTTAAATTTTCTACATAGTTATTTTTTATCTTATATATTATACTGCTATAGTTATCAATGTATATACATTCAAATAATTTATTAAAAATTCTATATTATATTAATTCTATGATAAAATAATCCTGTTTGTGATTTAAAAAATGATGATTCAATAAAAACTAATAATATAATACGAGTTAATATGGAATAATAAAATGGCATTTAACATGAATTTAGTCTTTAACCTTTTCTTTGTTTGTCAAGTTTTTTAAAACATAAAACCACACATTTCAAAATGGATTTTTAGCAAATATATAAAAATTATACATTTATAATGTATTGTTATGCGTCTTTTCGATAGAATCAATATTTAATTATATGAAGTTTCCACAATAAAATAATATTTAATATTATTTATTAGTAGAGTATTTGATTATATATATAGGCATATAATAATAACTCTAGTTCTATCTACCATATTATTTATAATTATTATAACAAAATGTGATATGAAATTTTATTATATACTTATATTATTTTTTTAACTATTTTAAAATATATTTATTTATACCTCAAAACTATAAAATTGAAATTATTAATAATAATCTAATATATACCTTTATAAAAATAAACGTATAAACTAAT   
 ><:4/+1+*)+4>BEH=9-,,66IIIIIIIIEDA>>>>A at DDFFIHHHHHITIIIIIHIIHHHHHHIYYYYYTTTYDDDHDDDDDDDIIINNTNHHHHHIYYYYIIIIIINNNNTTIIIIIIIIIIITTNTTTTTYYYTTTTTTYYNNNNNNLLLLLLLLLLLNNNNNNTTTTTTTTTTTTTNNTNNNTTTYYTLLLLLLTTTTTTTTYTTTNNNNNNTTTTTTTNNTTNNTTTTTTTTTTYYTTTTTTTNNNNNNTTTTTTTYYTTTTTTYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTNNINTTYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYTTTTTOOOIFFFIFIIOICC>>II@>>>>>>C>>>>>>CIBECCCHIIOOOOOOOOTTTIIFDDEIQQA:55839AA>99>@IIIIII>>::;;I;>>CC>>>>>@III<::=>AAA<>>>>I>:>>99:>842225006824855;5>68844//.//00:>::338:99<:/-+*-./0)((((+00+..,++(((+-()(*((((()*)***))3)''')*..+*++((*1++--+*''''((+/)*42.((***)+,+('*'''*((''''((,'%%''''''''(     
AS:i:614        MS:i:50
tal-2388h09    147     tal12  27475   205     1H764M40H       =       
19016   -9223   
ATTAAATCGGTATCGCCAACACAATGAGTATAATCATTGTCAAATATGCGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTAATTTTTGTGAGTATAATCATTGGAACG  
(((0))*,-1-../2((())03---03266300271+*.-0-*''''+*,+/+))*-05330+)..4>7=77273911**((+20+03688633:93036<8;::5:<99379>>::>>>:57:<:7--)))1435::333228>::>II>::>A>>3/.958677AA=AA:>:==IIII8338<>A>>>>IIIIIIIIYYYYYKKYYYMIFFFFEIIIMI::4..8AIIC>9>=EIQQQMCAAAAAACIIIIAICIIIOOYTIIIMOQQMIIIIC>>AAABCCCCCEAI>C>>IQQIIIIIIIIIIKKYYYYYYYYYYYYYYYYYYYYYTIIIIIIYYYYTNINNNTYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYSSYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTYYYYYTTTTTTYYYTTNNNNTTYYYYYYYYYTTTTLLTTNNTTTTYTTTTTTYYYYYYYYYYYTTOOKKKLKOOTYYYYYYYYYYYYYYYYTNNNNNNNNNTTTNYNNNNTNNNNTTYYYYYYYYTTNNNNTTYNNNNNITTTTTYYYYYYYYYYTTNNIIIIIDIIIIHTNNNNTTYYYYTNNNIIIIIITTTINIIIINNNNTTTYYYYIHHHDDHHDDIHDDGDFFFTIIINTTYYYYTTTTHHHHCCIIIHIHHHHCAI9:++**1168>ACCIIDDDDDDI>>>>>?NNN  
AS:i:688        MS:i:50

So the read in the first line starts before the start of the query 
region and is not accessible via $pair->get_SeqFeatures although this is 
a valid pair.
Am I doing something wrong, is this the desired behaviour or is it a bug?

Thanks for your help!


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From hlapp at drycafe.net  Thu Jan  7 11:55:00 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 7 Jan 2010 11:55:00 -0500
Subject: [Bioperl-l] Data missing into Annotation object using
	Bio::SeqIO (Genbank)
In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
References: <4B28EB44.3080006@pasteur.fr>
	<29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
Message-ID: <240F198A-83FA-4304-ACA8-80A702A68D8C@drycafe.net>

I don't know to what extent this was followed up on further and I  
guess it's too long ago to be of much help, but if it hasn't been  
mentioned before I wanted to point out  
Bio::SeqFeature::AnnotationAdaptor which integrates tag/value  
annotation and Bio::Annotation annotation into one  
AnnotationCollection, so it doesn't matter whether something is  
attached as a tag or as an annotation object.

	-hilmar

On Dec 16, 2009, at 10:09 AM, Chris Fields wrote:

> Emmanuel,
>
> The previous behavior in the 1.5.x series was to store feature tags  
> as Bio::Annotation.  The problem had been the way this was  
> implemented was considered unsatisfactory for various reasons, so we  
> reverted back to using simple tag-value pairs as the default.  You  
> can get at the data this way (from the Feature/Annotation HOWTO):
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>    print "primary tag: ", $feat_object->primary_tag, "\n";
>    for my $tag ($feat_object->get_all_tags) {
>        print "  tag: ", $tag, "\n";
>        for my $value ($feat_object->get_tag_values($tag)) {
>            print "    value: ", $value, "\n";
>        }
>    }
> }
>
> You can also convert all the tag-value data into a  
> Bio::Annotation::Collection using the  
> Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.
>
> chris
>
> On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote:
>
>> Hi,
>>
>> I've wrote a small Genbank parser few months ago before BioPerl  
>> release 1.6.0.
>> I tried to use my code once again but now the output of my parser  
>> is empty.
>> It looks like Annotation from seqfeatures is not filled anymore.
>>
>> Here is the code I used previously:
>>
>> while(my $seq = $streamer->next_seq()){
>>
>>   #We only want to retrieve CDS features...
>>   foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq- 
>> >get_SeqFeatures()){
>>       print $ofh join("#",
>>                       $feat->annotation()- 
>> >get_Annotations('locus_tag'),    # Acc num
>>                       $feat->annotation()->get_Annotations('gene')
>>                         ? $feat->annotation()- 
>> >get_Annotations('gene')      # Gene name
>>                         : $feat->annotation()- 
>> >get_Annotations('locus_tag'),
>>                       $feat->annotation()- 
>> >get_Annotations('product'),      # Description
>>                      ),"\n";
>>   }
>> }
>>
>> $feat is a Bio::SeqFeature::Generic object
>>
>> If I print Dumper($feat->annotation()) here is the output :
>>
>> $VAR1 = bless( {
>>                '_typemap' => bless( {
>>                                       '_type' => {
>>                                                    'comment' =>  
>> 'Bio::Annotation::Comment',
>>                                                    'reference' =>  
>> 'Bio::Annotation::Reference',
>>                                                    'dblink' =>  
>> 'Bio::Annotation::DBLink'
>>                                                  }
>>                                     },  
>> 'Bio::Annotation::TypeManager' ),
>>                '_annotation' => {}
>>              }, 'Bio::Annotation::Collection' );
>>
>> Have some changes been made into the way annotation object is  
>> populated?
>>
>> Thanks for any clue and sorry if my question look stupid
>>
>> Regards
>>
>> Emmanuel
>>
>> -- 
>> -------------------------
>> Emmanuel Quevillon
>> Biological Software and Databases Group
>> Institut Pasteur
>> +33 1 44 38 95 98
>> tuco at_ pasteur dot fr
>> -------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From rtbio.2009 at gmail.com  Fri Jan  8 10:00:21 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 8 Jan 2010 16:00:21 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
Message-ID: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>

Hello all,

I was trying Remote blast using Bioperl. My input data is a Trypanosoma
brucei sequence in Fasta format. When I was trying to submit to BLAST using
the step
$r=$factory->submit_blast($input)
It was not returning anything which I checked by debugging the code. It is
not blasting my input sequence even though I mentioned all the parameters.I
would paste the code below.

Please help me in solving put this problem. It is very urgent.

Regards
Roopa.

#!/usr/bin/perl

#path for extra camel module
use lib "/srv/www/htdocs/rain/RNAi/";
use Roopablast;


use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds. Roopa";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";


open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
 This page will automatically reload in 30 seconds Roopa <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

#$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= 'Trypanosoma Brucei';

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

            # open(OUTFILE,'>',$debugfile);
             #  print OUTFILE @params;
             # close(OUTFILE);


 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => 'Trypanosoma Brucei' );


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);    #The program stops here it
does not return any value and it does not enter the While loop,Please help
me in this regard.#
                open(OUTFILE,'>',$debugfile);
                print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
               print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
               print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
               print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
               print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);

}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

    print OUTFILE substr ($in{'Inputseq'}, $i, 1);

    if ( ($i+1)%10==0){
        print OUTFILE " ";
    }
    if ( ($i+1)%60==0){
        print OUTFILE "<br>\n";
    }
}


print OUTFILE "</font> <p>";

$z=@compseqs;

for($k=1;$k<$z;$k++) {
    print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
Sequence: <br>";

    for ($i=0; $i<length ($compseqs[$k]); $i++) {

        print OUTFILE substr ($compseqs[$k], $i, 1);

        if ( ($i+1)%10==0){
            print OUTFILE " ";
        }
        if ( ($i+1)%60==0){
            print OUTFILE "<br>\n";
        }
    }
    print OUTFILE "<p></font>";
}

print OUTFILE "<p>
Window: <br>$in{'Windowsize'}
<p>
<p>
Threshold: <br>$in{'Threshold'}
<p>";
my $j=0;

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

    if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
        if ($out[$i]->{similar}<=$in{'Threshold'}){
            $j=$in{'Windowsize'};
        }
        $height=$out[$i]->{similar}*5;
    }

    if ($j>0) {
        print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
height=\"5\">";
        $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
1)."</font>";
        $j--;
    }
    else {
        print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
height=\"5\">";
        $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
1)."</font>";
    }

    if ( ($i+1)%10==0){
        $outstring .= " ";
    }
    if ( ($i+1)%60==0){
        $outstring .= "<br>\n";

    }
    if ( ($i+1)%800==0){
        print OUTFILE "<br><br>\n";

    }
}

print OUTFILE "<br><br><font face=\"Courier, monospace font
set\">$outstring</font>";

#foreach (@out) {
#print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
#if ($_->{similar}<=$in{'Threshold'}){

#    }
#}

print OUTFILE "</BODY>\n</HTML>\n";

close OUTFILE;

#nameprint();

sub parse_form {
    local ($buffer, @pairs, $pair, $name, $value);
    # Read in text
    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
    if ($ENV{'REQUEST_METHOD'} eq "POST")
    {
        read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    }
    else
    {
        $buffer = $ENV{'QUERY_STRING'};
    }
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs)
    {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $in{$name} = $value;
    }
}


From maj at fortinbras.us  Fri Jan  8 10:36:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 8 Jan 2010 10:36:41 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
Message-ID: <F19004692A4A4350856B23DF25E09074@NewLife>

Hi Roopa--

I got your code to work with the following changes:

+# the input should be a valid FASTA file...
 ...
 open(NUC,'>',$nuc);
+print NUC ">seq (need a name line for valid fasta)\n";
 print NUC $inpu1, "\n";
 close(NUC);
...

+# you can set these header parms in the call itself...
- my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
+ my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => 
''Trypanosoma Brucei[ORGN]');

  #change a paramter
+# commented this out...
+# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma 
Brucei[ORGN]';

MAJ
----- Original Message ----- 
From: "Roopa Raghuveer" <rtbio.2009 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 08, 2010 10:00 AM
Subject: [Bioperl-l] Regarding blast in Bioperl


> Hello all,
>
> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
> brucei sequence in Fasta format. When I was trying to submit to BLAST using
> the step
> $r=$factory->submit_blast($input)
> It was not returning anything which I checked by debugging the code. It is
> not blasting my input sequence even though I mentioned all the parameters.I
> would paste the code below.
>
> Please help me in solving put this problem. It is very urgent.
>
> Regards
> Roopa.
>
> #!/usr/bin/perl
>
> #path for extra camel module
> use lib "/srv/www/htdocs/rain/RNAi/";
> use Roopablast;
>
>
> use Bio::SearchIO;
> use Bio::Search::Result::BlastResult;
> use Bio::Perl;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> $serverpath = "/srv/www/htdocs/rain/RNAi";
> $serverurl = "http://141.84.66.66/rain/RNAi";
> $outfile = $serverpath."/rnairesult_".time().".html";
> $nuc = $serverpath."/nuc".time().".txt";
> $debugfile = $serverpath."/debug_".time().".txt";
> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>
> my $outstring ="";
>
> &parse_form;
>
> print "Content-type: text/html\n\n";
> print "<HTML>\n";
> print "<head><title>RNAi Result</title>";
> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl/rnairesult_".time().".html\"> \n";
> print "</head>\n";
> print "<body>\n";
> print " Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>";
> print " Please be patient, runtime can be up to 5 minutes<br>";
> print " This page will automatically reload in 30 seconds. Roopa";
> print "</BODY>\n";
> print "</HTML>\n";
>
> defined(my $pid = fork) or die "Can't fork: $!";
> exit if $pid;
> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>
>
>
> open(OUTFILE, '>',$outfile);
>
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl//rnairesult_".time().".html\"> \n
> <meta http-equiv=\"expires\" content=\"0\">
> </head>\n
> <body>\n
>  Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>
>  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
> This page will automatically reload in 30 seconds Roopa <br>
> </BODY>\n
> </HTML>\n";
>
> close(OUTFILE);
>
>
> @compseqs = blastcode($in{'Inputseq'});
>
> $in{'Inputseq'} =~ s/>.*$//m;
> $in{'Inputseq'} =~ s/[^TAGC]//gim;
> $in{'Inputseq'} =~ tr/actg/ACTG/;
>
> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
> $in{'Threshold'});
>
>
> sub blastcode
> {
>
> $inpu1= $_[0];
>
> #$organ= $_[1];
>
> open(NUC,'>',$nuc);
> print NUC $inpu1;
> close(NUC);
>
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= 'Trypanosoma Brucei';
>
> $gb = new Bio::DB::GenBank;
>
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>         '-Organism'   => $organism );
>
>            # open(OUTFILE,'>',$debugfile);
>             #  print OUTFILE @params;
>             # close(OUTFILE);
>
>
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>
>  #change a paramter
>
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
>
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>
>  my $v = 1;
>  #$v is just to turn on and off the messages
>
> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => 'Trypanosoma Brucei' );
>
>
> while (my $input = $str->next_seq())
> {
>   #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
>
>             open(OUTFILE,'>',$debugfile);
>               print OUTFILE $input;
>              close(OUTFILE);
>
>
>   my $r = $factory->submit_blast($input);    #The program stops here it
> does not return any value and it does not enter the While loop,Please help
> me in this regard.#
>                open(OUTFILE,'>',$debugfile);
>                print OUTFILE $r;
>                close(OUTFILE);
>
>
>   print STDERR "waiting...." if($v>0);
>
>  while ( my @rids = $factory->each_rid ) {
>      open(OUTFILE,'>',$debugfile);
>               print OUTFILE "while entered";
>              close(OUTFILE);
>     foreach my $rid ( @rids ) {
>
>               open(OUTFILE,'>',$debugfile);
>               print OUTFILE "foreach entered";
>              close(OUTFILE);
>
>        my $rc = $factory->retrieve_blast($rid);
>
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         open(OUTFILE,'>',$debugfile);
>               print OUTFILE "if entered";
>              close(OUTFILE);
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>              open(OUTFILE,'>',$debugfile);
>               print OUTFILE "else entered";
>              close(OUTFILE);
>
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>
>          open(BLASTDEBUGFILE,'>',$blastdebugfile);
>          print BLASTDEBUGFILE $result->next_hit();
>          close(BLASTDEBUGFILE);
>
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
>
>         $factory->save_output($filename);
>
>       # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
>
>       $factory->remove_rid($rid);
>
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
>
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
>
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
>
>   while ( my $hit = $result->next_hit ) {
>
>            next unless ( $v > 0);
>
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
>
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
>
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
>
> return(@seqs);
>
> }
>
> open(OUTFILE, '>',$outfile) || die ;
>
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <meta http-equiv=\"expires\" content=\"0\"></head>\n
> <body>\n
> <p><font face=\"Courier, monospace font set\">
> Inputsequence: <br>";
>
> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>
>    print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>
>    if ( ($i+1)%10==0){
>        print OUTFILE " ";
>    }
>    if ( ($i+1)%60==0){
>        print OUTFILE "<br>\n";
>    }
> }
>
>
>
> print OUTFILE "</font> <p>";
>
> $z=@compseqs;
>
> for($k=1;$k<$z;$k++) {
>    print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
> Sequence: <br>";
>
>    for ($i=0; $i<length ($compseqs[$k]); $i++) {
>
>        print OUTFILE substr ($compseqs[$k], $i, 1);
>
>        if ( ($i+1)%10==0){
>            print OUTFILE " ";
>        }
>        if ( ($i+1)%60==0){
>            print OUTFILE "<br>\n";
>        }
>    }
>    print OUTFILE "<p></font>";
> }
>
> print OUTFILE "<p>
> Window: <br>$in{'Windowsize'}
> <p>
> <p>
> Threshold: <br>$in{'Threshold'}
> <p>";
> my $j=0;
>
> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>
>    if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>        if ($out[$i]->{similar}<=$in{'Threshold'}){
>            $j=$in{'Windowsize'};
>        }
>        $height=$out[$i]->{similar}*5;
>    }
>
>    if ($j>0) {
>        print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
> height=\"5\">";
>        $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
> 1)."</font>";
>        $j--;
>    }
>    else {
>        print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
> height=\"5\">";
>        $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
> 1)."</font>";
>    }
>
>    if ( ($i+1)%10==0){
>        $outstring .= " ";
>    }
>    if ( ($i+1)%60==0){
>        $outstring .= "<br>\n";
>
>    }
>    if ( ($i+1)%800==0){
>        print OUTFILE "<br><br>\n";
>
>    }
> }
>
> print OUTFILE "<br><br><font face=\"Courier, monospace font
> set\">$outstring</font>";
>
> #foreach (@out) {
> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
> #if ($_->{similar}<=$in{'Threshold'}){
>
> #    }
> #}
>
> print OUTFILE "</BODY>\n</HTML>\n";
>
> close OUTFILE;
>
> #nameprint();
>
> sub parse_form {
>    local ($buffer, @pairs, $pair, $name, $value);
>    # Read in text
>    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>    if ($ENV{'REQUEST_METHOD'} eq "POST")
>    {
>        read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>    }
>    else
>    {
>        $buffer = $ENV{'QUERY_STRING'};
>    }
>    @pairs = split(/&/, $buffer);
>    foreach $pair (@pairs)
>    {
>        ($name, $value) = split(/=/, $pair);
>        $value =~ tr/+/ /;
>        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>        $in{$name} = $value;
>    }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From julian.onions at gmail.com  Fri Jan  8 11:53:50 2010
From: julian.onions at gmail.com (Julian Onions)
Date: Fri, 8 Jan 2010 16:53:50 +0000
Subject: [Bioperl-l] Cladogram construction
Message-ID: <cbeabfd41001080853m50c75779q4155cd02af17670a@mail.gmail.com>

Does anyone have any sample code for building cladograms based on Pars (one
of Phylip tools) type format (or any other format actually)
I've got something sort of working but I get no weights on the tree -
everything appears as nan. I'd also like to set one of the species to be an
outgroup. This is the closest sample I've found so far.


#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
use Bio::Tree::DistanceFactory;
use Bio::Align::ProteinStatistics;
use Bio::TreeIO;
use Bio::Tree::Draw::Cladogram;
my $alnfile = shift @ARGV || die "need a file to run";

my $input= Bio::AlignIO->new(-format => 'fasta',
    -file    => $alnfile);

if( my $aln = $input->next_aln ) {
 my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ');
 my $stats = Bio::Align::ProteinStatistics->new;
 my $distmat = $stats->distance(-align => $aln,
         -method => 'Kimura');
 my $treeout = Bio::TreeIO->new(-format => 'newick');
 my $tree = $dfactory->make_tree($distmat);
 $treeout->write_tree($tree);
  my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree    => $tree,
                                             -compact => 0);
  $obj1->print(-file => "tree.eps");
} else {
 die "could not find any alignments in the file $alnfile";
}


Pars input looks like
3 4
Robin   101
Blackbird 100
Sparrow 100


Thanks,
Julian.


From rtbio.2009 at gmail.com  Sat Jan  9 11:57:09 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Sat, 9 Jan 2010 17:57:09 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <F19004692A4A4350856B23DF25E09074@NewLife>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<F19004692A4A4350856B23DF25E09074@NewLife>
Message-ID: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>

Hello all,

Thanks alot for your reply Mark. It was working for Trypanosoma brucei as
the organism parameter,but when I tried to use the Organism parameter from
the user,it was not working i.e., I was unable to get the target sequences.
Please help me in this regard. My code is

#!/usr/bin/perl

#path for extra camel module
use lib "/srv/www/htdocs/rain/RNAi/";
use Roopablast;


use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds. Roopa";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";

open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
 This page will automatically reload in 30 seconds Roopa <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'},$in{'Organism'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1,"\n";
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
        '-Organism'   => $organism );

             open(OUTFILE,'>',$debugfile);
             print OUTFILE $inpu1;
              close(OUTFILE);


my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
'$organ[ORGN]');

 #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => $organ );


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             #open(OUTFILE,'>',$debugfile);
              # print OUTFILE $input;
              #close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);

   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
   #   open(OUTFILE,'>',$debugfile);
    #           print OUTFILE "while entered";
     #         close(OUTFILE);
     foreach my $rid ( @rids ) {

      #         open(OUTFILE,'>',$debugfile);
       #        print OUTFILE "foreach entered";
        #      close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
         #      print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          #    open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "else entered";
            #  close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);
  # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);

}

Regards,
Roopa.


On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> Hi Roopa--
>
> I got your code to work with the following changes:
>
> +# the input should be a valid FASTA file...
> ...
> open(NUC,'>',$nuc);
> +print NUC ">seq (need a name line for valid fasta)\n";
> print NUC $inpu1, "\n";
> close(NUC);
> ...
>
> +# you can set these header parms in the call itself...
> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
> ''Trypanosoma Brucei[ORGN]');
>
>  #change a paramter
> +# commented this out...
> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
>
> MAJ
> ----- Original Message ----- From: "Roopa Raghuveer" <rtbio.2009 at gmail.com
> >
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 08, 2010 10:00 AM
> Subject: [Bioperl-l] Regarding blast in Bioperl
>
>
>  Hello all,
>>
>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
>> brucei sequence in Fasta format. When I was trying to submit to BLAST
>> using
>> the step
>> $r=$factory->submit_blast($input)
>> It was not returning anything which I checked by debugging the code. It is
>> not blasting my input sequence even though I mentioned all the
>> parameters.I
>> would paste the code below.
>>
>> Please help me in solving put this problem. It is very urgent.
>>
>> Regards
>> Roopa.
>>
>> #!/usr/bin/perl
>>
>> #path for extra camel module
>> use lib "/srv/www/htdocs/rain/RNAi/";
>> use Roopablast;
>>
>>
>> use Bio::SearchIO;
>> use Bio::Search::Result::BlastResult;
>> use Bio::Perl;
>> use Bio::Tools::Run::RemoteBlast;
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::DB::GenBank;
>>
>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>> $serverurl = "http://141.84.66.66/rain/RNAi";
>> $outfile = $serverpath."/rnairesult_".time().".html";
>> $nuc = $serverpath."/nuc".time().".txt";
>> $debugfile = $serverpath."/debug_".time().".txt";
>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>
>> my $outstring ="";
>>
>> &parse_form;
>>
>> print "Content-type: text/html\n\n";
>> print "<HTML>\n";
>> print "<head><title>RNAi Result</title>";
>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>> print "</head>\n";
>> print "<body>\n";
>> print " Your results will appear <a
>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>> print " Please be patient, runtime can be up to 5 minutes<br>";
>> print " This page will automatically reload in 30 seconds. Roopa";
>> print "</BODY>\n";
>> print "</HTML>\n";
>>
>> defined(my $pid = fork) or die "Can't fork: $!";
>> exit if $pid;
>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>
>>
>>
>> open(OUTFILE, '>',$outfile);
>>
>> print OUTFILE "<HTML>\n
>> <head><title>RNAi Result</title>
>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>> URL=$serverurl//rnairesult_".time().".html\"> \n
>> <meta http-equiv=\"expires\" content=\"0\">
>> </head>\n
>> <body>\n
>>  Your results will appear <a
>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>  Please be patient, runtime can be up to 5 minutes wait wait
>> wait......<br>
>> This page will automatically reload in 30 seconds Roopa <br>
>> </BODY>\n
>> </HTML>\n";
>>
>> close(OUTFILE);
>>
>>
>> @compseqs = blastcode($in{'Inputseq'});
>>
>> $in{'Inputseq'} =~ s/>.*$//m;
>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>
>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>> $in{'Threshold'});
>>
>>
>> sub blastcode
>> {
>>
>> $inpu1= $_[0];
>>
>> #$organ= $_[1];
>>
>> open(NUC,'>',$nuc);
>> print NUC $inpu1;
>> close(NUC);
>>
>> my $prog = 'blastn';
>> my $db   = 'refseq_rna';
>> my $e_val= '1e-10';
>> my $organism= 'Trypanosoma Brucei';
>>
>> $gb = new Bio::DB::GenBank;
>>
>> my @params = ( '-prog' => $prog,
>>        '-data' => $db,
>>        '-expect' => $e_val,
>>        '-readmethod' => 'SearchIO',
>>        '-Organism'   => $organism );
>>
>>           # open(OUTFILE,'>',$debugfile);
>>            #  print OUTFILE @params;
>>            # close(OUTFILE);
>>
>>
>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>
>>  #change a paramter
>>
>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>> Brucei[ORGN]';
>>
>> #change a paramter
>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>>
>>  my $v = 1;
>>  #$v is just to turn on and off the messages
>>
>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>> '-organism' => 'Trypanosoma Brucei' );
>>
>>
>> while (my $input = $str->next_seq())
>> {
>>  #Blast a sequence against a database:
>>   #Alternatively, you could  pass in a file with many
>>   #sequences rather than loop through sequence one at a time
>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>   #and swap the two lines below for an example of that.
>>
>>            open(OUTFILE,'>',$debugfile);
>>              print OUTFILE $input;
>>             close(OUTFILE);
>>
>>
>>  my $r = $factory->submit_blast($input);    #The program stops here it
>> does not return any value and it does not enter the While loop,Please help
>> me in this regard.#
>>               open(OUTFILE,'>',$debugfile);
>>               print OUTFILE $r;
>>               close(OUTFILE);
>>
>>
>>  print STDERR "waiting...." if($v>0);
>>
>>  while ( my @rids = $factory->each_rid ) {
>>     open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "while entered";
>>             close(OUTFILE);
>>    foreach my $rid ( @rids ) {
>>
>>              open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "foreach entered";
>>             close(OUTFILE);
>>
>>       my $rc = $factory->retrieve_blast($rid);
>>
>>       if( !ref($rc) )
>>       {
>>       if( $rc < 0 )
>>       {
>>       $factory->remove_rid($rid);
>>       }
>>        open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "if entered";
>>             close(OUTFILE);
>>        print STDERR "." if ( $v > 0 );
>>        sleep 5;
>>       }
>>      else {
>>             open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "else entered";
>>             close(OUTFILE);
>>
>>         my $result = $rc->next_result();
>>        #save the output
>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>
>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>         print BLASTDEBUGFILE $result->next_hit();
>>         close(BLASTDEBUGFILE);
>>
>>       my $filename =
>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>
>>        # open(DEBUGFILE,'>',$debugfile);
>>        # open(new,'>',$filename);
>>        # @arra=<new>;
>>        # print DEBUGFILE @arra;
>>        # close(DEBUGFILE);
>>        # close(new);
>>
>>        $factory->save_output($filename);
>>
>>      # open(BLASTDEBUGFILE,'>',$debugfile);
>>      # print BLASTDEBUGFILE  "Hello $rid";
>>      # close(BLASTDEBUGFILE);
>>
>>      $factory->remove_rid($rid);
>>
>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>      print BLASTDEBUGFILE  $organism;
>>       close(BLASTDEBUGFILE);
>>
>>   # open(OUTFILE,'>',$outfile);
>>   # print OUTFILE "Test2 $result->database_name()";
>>   # close(OUTFILE);
>>
>> #$hit = $result->next_hit;
>> #open(new,'>',$debugfile);
>> #print $hit;
>> #close(new);
>>
>>  while ( my $hit = $result->next_hit ) {
>>
>>           next unless ( $v > 0);
>>
>>         #     open(OUTFILE,'>',$debugfile);
>>          #    print OUTFILE "$hit in while hits";
>>           #  close(OUTFILE);
>>
>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>          my $dna = $sequ->seq();        # get the sequence as a string
>>                 push(@seqs,$dna);
>>         }
>>       }
>>     }
>>   }
>>  }
>>
>>  #open(OUTFILE,'>',$debugfile);
>>  #print OUTFILE $seqs[0];
>>  #close(OUTFILE);
>>
>> return(@seqs);
>>
>> }
>>
>> open(OUTFILE, '>',$outfile) || die ;
>>
>> print OUTFILE "<HTML>\n
>> <head><title>RNAi Result</title>
>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>> <body>\n
>> <p><font face=\"Courier, monospace font set\">
>> Inputsequence: <br>";
>>
>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>
>>   print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>
>>   if ( ($i+1)%10==0){
>>       print OUTFILE " ";
>>   }
>>   if ( ($i+1)%60==0){
>>       print OUTFILE "<br>\n";
>>   }
>> }
>>
>>
>>
>> print OUTFILE "</font> <p>";
>>
>> $z=@compseqs;
>>
>> for($k=1;$k<$z;$k++) {
>>   print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
>> Sequence: <br>";
>>
>>   for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>
>>       print OUTFILE substr ($compseqs[$k], $i, 1);
>>
>>       if ( ($i+1)%10==0){
>>           print OUTFILE " ";
>>       }
>>       if ( ($i+1)%60==0){
>>           print OUTFILE "<br>\n";
>>       }
>>   }
>>   print OUTFILE "<p></font>";
>> }
>>
>> print OUTFILE "<p>
>> Window: <br>$in{'Windowsize'}
>> <p>
>> <p>
>> Threshold: <br>$in{'Threshold'}
>> <p>";
>> my $j=0;
>>
>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>
>>   if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>       if ($out[$i]->{similar}<=$in{'Threshold'}){
>>           $j=$in{'Windowsize'};
>>       }
>>       $height=$out[$i]->{similar}*5;
>>   }
>>
>>   if ($j>0) {
>>       print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>> height=\"5\">";
>>       $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
>> 1)."</font>";
>>       $j--;
>>   }
>>   else {
>>       print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>> height=\"5\">";
>>       $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
>> 1)."</font>";
>>   }
>>
>>   if ( ($i+1)%10==0){
>>       $outstring .= " ";
>>   }
>>   if ( ($i+1)%60==0){
>>       $outstring .= "<br>\n";
>>
>>   }
>>   if ( ($i+1)%800==0){
>>       print OUTFILE "<br><br>\n";
>>
>>   }
>> }
>>
>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>> set\">$outstring</font>";
>>
>> #foreach (@out) {
>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
>> #if ($_->{similar}<=$in{'Threshold'}){
>>
>> #    }
>> #}
>>
>> print OUTFILE "</BODY>\n</HTML>\n";
>>
>> close OUTFILE;
>>
>> #nameprint();
>>
>> sub parse_form {
>>   local ($buffer, @pairs, $pair, $name, $value);
>>   # Read in text
>>   $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>   if ($ENV{'REQUEST_METHOD'} eq "POST")
>>   {
>>       read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>   }
>>   else
>>   {
>>       $buffer = $ENV{'QUERY_STRING'};
>>   }
>>   @pairs = split(/&/, $buffer);
>>   foreach $pair (@pairs)
>>   {
>>       ($name, $value) = split(/=/, $pair);
>>       $value =~ tr/+/ /;
>>       $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>       $in{$name} = $value;
>>   }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>


From maj at fortinbras.us  Sat Jan  9 13:05:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 9 Jan 2010 13:05:41 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com><F19004692A4A4350856B23DF25E09074@NewLife>
	<c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
Message-ID: <4C2E8133F916495B876628EF3E8FCBB2@NewLife>

I see it immediately (from making same bug many times) :

 my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
- '$organ[ORGN]');
+"$organ[ORGN]");

MAJ

----- Original Message ----- 
From: "Roopa Raghuveer" <rtbio.2009 at gmail.com>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Saturday, January 09, 2010 11:57 AM
Subject: Re: [Bioperl-l] Regarding blast in Bioperl


> Hello all,
> 
> Thanks alot for your reply Mark. It was working for Trypanosoma brucei as
> the organism parameter,but when I tried to use the Organism parameter from
> the user,it was not working i.e., I was unable to get the target sequences.
> Please help me in this regard. My code is
> 
> #!/usr/bin/perl
> 
> #path for extra camel module
> use lib "/srv/www/htdocs/rain/RNAi/";
> use Roopablast;
> 
> 
> use Bio::SearchIO;
> use Bio::Search::Result::BlastResult;
> use Bio::Perl;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> 
> $serverpath = "/srv/www/htdocs/rain/RNAi";
> $serverurl = "http://141.84.66.66/rain/RNAi";
> $outfile = $serverpath."/rnairesult_".time().".html";
> $nuc = $serverpath."/nuc".time().".txt";
> $debugfile = $serverpath."/debug_".time().".txt";
> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
> my $outstring ="";
> 
> &parse_form;
> 
> print "Content-type: text/html\n\n";
> print "<HTML>\n";
> print "<head><title>RNAi Result</title>";
> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl/rnairesult_".time().".html\"> \n";
> print "</head>\n";
> print "<body>\n";
> print " Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>";
> print " Please be patient, runtime can be up to 5 minutes<br>";
> print " This page will automatically reload in 30 seconds. Roopa";
> print "</BODY>\n";
> print "</HTML>\n";
> 
> defined(my $pid = fork) or die "Can't fork: $!";
> exit if $pid;
> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
> 
> open(OUTFILE, '>',$outfile);
> 
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl//rnairesult_".time().".html\"> \n
> <meta http-equiv=\"expires\" content=\"0\">
> </head>\n
> <body>\n
>  Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>
>  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
> This page will automatically reload in 30 seconds Roopa <br>
> </BODY>\n
> </HTML>\n";
> 
> close(OUTFILE);
> 
> 
> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
> 
> $in{'Inputseq'} =~ s/>.*$//m;
> $in{'Inputseq'} =~ s/[^TAGC]//gim;
> $in{'Inputseq'} =~ tr/actg/ACTG/;
> 
> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
> $in{'Threshold'});
> 
> 
> sub blastcode
> {
> 
> $inpu1= $_[0];
> 
> $organ= $_[1];
> 
> open(NUC,'>',$nuc);
> print NUC $inpu1,"\n";
> close(NUC);
> 
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= $organ;
> 
> $gb = new Bio::DB::GenBank;
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>        '-Organism'   => $organism );
> 
>             open(OUTFILE,'>',$debugfile);
>             print OUTFILE $inpu1;
>              close(OUTFILE);
> 
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
> '$organ[ORGN]');
> 
> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
>  #change a paramter
> 
> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
> 
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
> 
>  my $v = 1;
>  #$v is just to turn on and off the messages
> 
> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => $organ );
> 
> 
> while (my $input = $str->next_seq())
> {
>   #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
> 
>             #open(OUTFILE,'>',$debugfile);
>              # print OUTFILE $input;
>              #close(OUTFILE);
> 
> 
>   my $r = $factory->submit_blast($input);
> 
>                open(OUTFILE,'>',$debugfile);
>             #   print OUTFILE $r;
>                close(OUTFILE);
> 
>   print STDERR "waiting...." if($v>0);
> 
>  while ( my @rids = $factory->each_rid ) {
>   #   open(OUTFILE,'>',$debugfile);
>    #           print OUTFILE "while entered";
>     #         close(OUTFILE);
>     foreach my $rid ( @rids ) {
> 
>      #         open(OUTFILE,'>',$debugfile);
>       #        print OUTFILE "foreach entered";
>        #      close(OUTFILE);
> 
>        my $rc = $factory->retrieve_blast($rid);
> 
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         open(OUTFILE,'>',$debugfile);
>         #      print OUTFILE "if entered";
>              close(OUTFILE);
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>          #    open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "else entered";
>            #  close(OUTFILE);
> 
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>          open(BLASTDEBUGFILE,'>',$blastdebugfile);
>          print BLASTDEBUGFILE $result->next_hit();
>          close(BLASTDEBUGFILE);
> 
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
> 
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
> 
>         $factory->save_output($filename);
>  # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
> 
>       $factory->remove_rid($rid);
> 
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
> 
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> 
>   while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
> 
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
> 
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
> 
> return(@seqs);
> 
> }
> 
> Regards,
> Roopa.
> 
> 
> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> 
>> Hi Roopa--
>>
>> I got your code to work with the following changes:
>>
>> +# the input should be a valid FASTA file...
>> ...
>> open(NUC,'>',$nuc);
>> +print NUC ">seq (need a name line for valid fasta)\n";
>> print NUC $inpu1, "\n";
>> close(NUC);
>> ...
>>
>> +# you can set these header parms in the call itself...
>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
>> ''Trypanosoma Brucei[ORGN]');
>>
>>  #change a paramter
>> +# commented this out...
>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>> Brucei[ORGN]';
>>
>> MAJ
>> ----- Original Message ----- From: "Roopa Raghuveer" <rtbio.2009 at gmail.com
>> >
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 08, 2010 10:00 AM
>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>
>>
>>  Hello all,
>>>
>>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
>>> brucei sequence in Fasta format. When I was trying to submit to BLAST
>>> using
>>> the step
>>> $r=$factory->submit_blast($input)
>>> It was not returning anything which I checked by debugging the code. It is
>>> not blasting my input sequence even though I mentioned all the
>>> parameters.I
>>> would paste the code below.
>>>
>>> Please help me in solving put this problem. It is very urgent.
>>>
>>> Regards
>>> Roopa.
>>>
>>> #!/usr/bin/perl
>>>
>>> #path for extra camel module
>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>> use Roopablast;
>>>
>>>
>>> use Bio::SearchIO;
>>> use Bio::Search::Result::BlastResult;
>>> use Bio::Perl;
>>> use Bio::Tools::Run::RemoteBlast;
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>> use Bio::DB::GenBank;
>>>
>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>> $nuc = $serverpath."/nuc".time().".txt";
>>> $debugfile = $serverpath."/debug_".time().".txt";
>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>
>>> my $outstring ="";
>>>
>>> &parse_form;
>>>
>>> print "Content-type: text/html\n\n";
>>> print "<HTML>\n";
>>> print "<head><title>RNAi Result</title>";
>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>> print "</head>\n";
>>> print "<body>\n";
>>> print " Your results will appear <a
>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>> print " This page will automatically reload in 30 seconds. Roopa";
>>> print "</BODY>\n";
>>> print "</HTML>\n";
>>>
>>> defined(my $pid = fork) or die "Can't fork: $!";
>>> exit if $pid;
>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>
>>>
>>>
>>> open(OUTFILE, '>',$outfile);
>>>
>>> print OUTFILE "<HTML>\n
>>> <head><title>RNAi Result</title>
>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>> <meta http-equiv=\"expires\" content=\"0\">
>>> </head>\n
>>> <body>\n
>>>  Your results will appear <a
>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>> wait......<br>
>>> This page will automatically reload in 30 seconds Roopa <br>
>>> </BODY>\n
>>> </HTML>\n";
>>>
>>> close(OUTFILE);
>>>
>>>
>>> @compseqs = blastcode($in{'Inputseq'});
>>>
>>> $in{'Inputseq'} =~ s/>.*$//m;
>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>
>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>> $in{'Threshold'});
>>>
>>>
>>> sub blastcode
>>> {
>>>
>>> $inpu1= $_[0];
>>>
>>> #$organ= $_[1];
>>>
>>> open(NUC,'>',$nuc);
>>> print NUC $inpu1;
>>> close(NUC);
>>>
>>> my $prog = 'blastn';
>>> my $db   = 'refseq_rna';
>>> my $e_val= '1e-10';
>>> my $organism= 'Trypanosoma Brucei';
>>>
>>> $gb = new Bio::DB::GenBank;
>>>
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO',
>>>        '-Organism'   => $organism );
>>>
>>>           # open(OUTFILE,'>',$debugfile);
>>>            #  print OUTFILE @params;
>>>            # close(OUTFILE);
>>>
>>>
>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>
>>>  #change a paramter
>>>
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>> Brucei[ORGN]';
>>>
>>> #change a paramter
>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>>>
>>>  my $v = 1;
>>>  #$v is just to turn on and off the messages
>>>
>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>> '-organism' => 'Trypanosoma Brucei' );
>>>
>>>
>>> while (my $input = $str->next_seq())
>>> {
>>>  #Blast a sequence against a database:
>>>   #Alternatively, you could  pass in a file with many
>>>   #sequences rather than loop through sequence one at a time
>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>   #and swap the two lines below for an example of that.
>>>
>>>            open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE $input;
>>>             close(OUTFILE);
>>>
>>>
>>>  my $r = $factory->submit_blast($input);    #The program stops here it
>>> does not return any value and it does not enter the While loop,Please help
>>> me in this regard.#
>>>               open(OUTFILE,'>',$debugfile);
>>>               print OUTFILE $r;
>>>               close(OUTFILE);
>>>
>>>
>>>  print STDERR "waiting...." if($v>0);
>>>
>>>  while ( my @rids = $factory->each_rid ) {
>>>     open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "while entered";
>>>             close(OUTFILE);
>>>    foreach my $rid ( @rids ) {
>>>
>>>              open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "foreach entered";
>>>             close(OUTFILE);
>>>
>>>       my $rc = $factory->retrieve_blast($rid);
>>>
>>>       if( !ref($rc) )
>>>       {
>>>       if( $rc < 0 )
>>>       {
>>>       $factory->remove_rid($rid);
>>>       }
>>>        open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "if entered";
>>>             close(OUTFILE);
>>>        print STDERR "." if ( $v > 0 );
>>>        sleep 5;
>>>       }
>>>      else {
>>>             open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "else entered";
>>>             close(OUTFILE);
>>>
>>>         my $result = $rc->next_result();
>>>        #save the output
>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>
>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>         print BLASTDEBUGFILE $result->next_hit();
>>>         close(BLASTDEBUGFILE);
>>>
>>>       my $filename =
>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>
>>>        # open(DEBUGFILE,'>',$debugfile);
>>>        # open(new,'>',$filename);
>>>        # @arra=<new>;
>>>        # print DEBUGFILE @arra;
>>>        # close(DEBUGFILE);
>>>        # close(new);
>>>
>>>        $factory->save_output($filename);
>>>
>>>      # open(BLASTDEBUGFILE,'>',$debugfile);
>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>      # close(BLASTDEBUGFILE);
>>>
>>>      $factory->remove_rid($rid);
>>>
>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>      print BLASTDEBUGFILE  $organism;
>>>       close(BLASTDEBUGFILE);
>>>
>>>   # open(OUTFILE,'>',$outfile);
>>>   # print OUTFILE "Test2 $result->database_name()";
>>>   # close(OUTFILE);
>>>
>>> #$hit = $result->next_hit;
>>> #open(new,'>',$debugfile);
>>> #print $hit;
>>> #close(new);
>>>
>>>  while ( my $hit = $result->next_hit ) {
>>>
>>>           next unless ( $v > 0);
>>>
>>>         #     open(OUTFILE,'>',$debugfile);
>>>          #    print OUTFILE "$hit in while hits";
>>>           #  close(OUTFILE);
>>>
>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>                 push(@seqs,$dna);
>>>         }
>>>       }
>>>     }
>>>   }
>>>  }
>>>
>>>  #open(OUTFILE,'>',$debugfile);
>>>  #print OUTFILE $seqs[0];
>>>  #close(OUTFILE);
>>>
>>> return(@seqs);
>>>
>>> }
>>>
>>> open(OUTFILE, '>',$outfile) || die ;
>>>
>>> print OUTFILE "<HTML>\n
>>> <head><title>RNAi Result</title>
>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>> <body>\n
>>> <p><font face=\"Courier, monospace font set\">
>>> Inputsequence: <br>";
>>>
>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>
>>>   print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>
>>>   if ( ($i+1)%10==0){
>>>       print OUTFILE " ";
>>>   }
>>>   if ( ($i+1)%60==0){
>>>       print OUTFILE "<br>\n";
>>>   }
>>> }
>>>
>>>
>>>
>>> print OUTFILE "</font> <p>";
>>>
>>> $z=@compseqs;
>>>
>>> for($k=1;$k<$z;$k++) {
>>>   print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
>>> Sequence: <br>";
>>>
>>>   for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>
>>>       print OUTFILE substr ($compseqs[$k], $i, 1);
>>>
>>>       if ( ($i+1)%10==0){
>>>           print OUTFILE " ";
>>>       }
>>>       if ( ($i+1)%60==0){
>>>           print OUTFILE "<br>\n";
>>>       }
>>>   }
>>>   print OUTFILE "<p></font>";
>>> }
>>>
>>> print OUTFILE "<p>
>>> Window: <br>$in{'Windowsize'}
>>> <p>
>>> <p>
>>> Threshold: <br>$in{'Threshold'}
>>> <p>";
>>> my $j=0;
>>>
>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>
>>>   if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>       if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>           $j=$in{'Windowsize'};
>>>       }
>>>       $height=$out[$i]->{similar}*5;
>>>   }
>>>
>>>   if ($j>0) {
>>>       print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>> height=\"5\">";
>>>       $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
>>> 1)."</font>";
>>>       $j--;
>>>   }
>>>   else {
>>>       print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>> height=\"5\">";
>>>       $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
>>> 1)."</font>";
>>>   }
>>>
>>>   if ( ($i+1)%10==0){
>>>       $outstring .= " ";
>>>   }
>>>   if ( ($i+1)%60==0){
>>>       $outstring .= "<br>\n";
>>>
>>>   }
>>>   if ( ($i+1)%800==0){
>>>       print OUTFILE "<br><br>\n";
>>>
>>>   }
>>> }
>>>
>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>> set\">$outstring</font>";
>>>
>>> #foreach (@out) {
>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>
>>> #    }
>>> #}
>>>
>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>
>>> close OUTFILE;
>>>
>>> #nameprint();
>>>
>>> sub parse_form {
>>>   local ($buffer, @pairs, $pair, $name, $value);
>>>   # Read in text
>>>   $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>   if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>   {
>>>       read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>   }
>>>   else
>>>   {
>>>       $buffer = $ENV{'QUERY_STRING'};
>>>   }
>>>   @pairs = split(/&/, $buffer);
>>>   foreach $pair (@pairs)
>>>   {
>>>       ($name, $value) = split(/=/, $pair);
>>>       $value =~ tr/+/ /;
>>>       $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>       $in{$name} = $value;
>>>   }
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From robert.bradbury at gmail.com  Sat Jan  9 14:52:53 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sat, 9 Jan 2010 14:52:53 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<F19004692A4A4350856B23DF25E09074@NewLife>
	<c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
Message-ID: <deaa866a1001091152u4e85b1eboc99feb52a5b45b5@mail.gmail.com>

Roopa,

Mark is correct, you have to be very careful of single vs. double quotes in
perl. Double quoted strings are "interpreted" while single quoted strings
are taken literally is my current understanding.

I tried to run your script (with fixes) but without the supporting files it
appears to be impossible.

What I am curious about is what it is trying to do, I was particularly i
particularly intrigued by some apparent efforts to parse blast results into
color enhanced HTML and without thinking about the code in detail it seems
easier to simply ask what you are trying to do?  I find "classical" blast
results particularly tedious and long for blast results that display concise
information as the NCBI homologene cross-species comparisons do.
Unfortunately NCBI has deemed their methods (I have asked them) "too complex
to disclose (for a person comfortable in dealing with assembly language, or
even gate level electronics -- "too complex" is a very relative concept)".
One has the option of using NCBI with a limited number of species but good
display methodologies or Ensembl with many more species but less desirable
display methodologies (phylogenetic tree derived from cross species
comparisons).  And for the WRN protein which may play a key role in aging
(through the activity of its exonuclease domain mutating DNA sequences and
inducing microdeletions and microinsertions this gets important because it
appears that the *C. elegans* genome is missing the exonuclease domain (so
it may be useless from the perspective of studying aging), and the other 4
nematode species which have been sequenced aren't even in the NCBI nor the
Ensembl databases.  Needless to say, if we manage in the near future, given
the drop in sequencing costs, to sequence the nematodes which are
freeze/thaw tolerant (which induces DSB that have to be repaired) those
genomes will be unlikely to be in the NCBI/Ensembl databases either.  So
there is a requirement for the user to develop the ability to mix and match
public and obscure databases in creative ways to provide easy to interpret
information.

Robert Bradbury


From robert.bradbury at gmail.com  Sat Jan  9 15:27:54 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sat, 9 Jan 2010 15:27:54 -0500
Subject: [Bioperl-l] Ensembl problems
Message-ID: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>

I am trying to get the examples provided by EMBL/Ensembl to work and am
encountering problems.

For example, about 1/3 of the way through the Compara API tutorial [1] there
is what is supposed to be a completely functional script.  It does not
work.  This is in contrast to some of the earlier simple scripts (listing
the species in  Ensmbl etc.) which do work on my machine, so I have all the
libraries do dah installed correctly).

Very poor form to document scripts which do not function on a properly setup
system.

I have modified my invocation of the script slightly:
  Align.pl --set_of_species \
"Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"

which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
an undefined value at ./Align.pl line 132." (Align.pl is my slightly
modified example of the Compara Tutoraial code.)
As these are slightly modified perl scripts from the documantation, the line
numbers may be variable.

I can print out the genome_dbs, and it gives me a list of genome names (hash
tables) though it appears that is problematic in the Align.pl script.
in spite of the fact that just previously to that call I dumped "genome_dbs"
and got back some 25 hash tables (expected).  I believe this occurs whether
one is comparing "human:mouse" or the more complex species set I have
outlined above.


Has anyone else attempted to run the code documented in the Ensembl API
Tutorial?
Any suggestions as to what direction to go in would be appreciated -- when
one is trying to copy code out of a tutorial and it fails its kind of hard
to know where to go.)

There do appear to be some problems in the specifications of a Compara
version/database and there don't appear to be a lot of resources informing
one of what resources are currently available.

Robert


1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html


From ak at ebi.ac.uk  Sat Jan  9 17:01:21 2010
From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=)
Date: Sat, 9 Jan 2010 22:01:21 +0000
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
Message-ID: <20100109220121.GA9521@quux.windows.ebi.ac.uk>

On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
> I am trying to get the examples provided by EMBL/Ensembl to work and am
> encountering problems.

Hi Robert,

The ensembl-dev list is the appropriate forum for this type of questions
as it has nothing to do with bioperl.

There is also the Ensembl helpdesk.  If you send your problem to
<helpdesk at ensembl.org> I'm sure that it will be picked up by the
appropriate people (I do myself not know enough about the Compara API to
be able to diagnose this problem straight away I'm afraid).

Be sure to submit a minimal script that still exhibit the problem, and
information about what version of the APIs you're using (we will assume
that you're not mixing newer version of the API with older databases or
vice versa).

We are generally very happy to have bugs in documentation or code
pointed out to us, and will correct errors as we are made aware of them.


Kind regards,
Andreas

> For example, about 1/3 of the way through the Compara API tutorial [1] there
> is what is supposed to be a completely functional script.  It does not
> work.  This is in contrast to some of the earlier simple scripts (listing
> the species in  Ensmbl etc.) which do work on my machine, so I have all the
> libraries do dah installed correctly).
> 
> Very poor form to document scripts which do not function on a properly setup
> system.
> 
> I have modified my invocation of the script slightly:
>   Align.pl --set_of_species \
> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
> 
> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
> modified example of the Compara Tutoraial code.)
> As these are slightly modified perl scripts from the documantation, the line
> numbers may be variable.
> 
> I can print out the genome_dbs, and it gives me a list of genome names (hash
> tables) though it appears that is problematic in the Align.pl script.
> in spite of the fact that just previously to that call I dumped "genome_dbs"
> and got back some 25 hash tables (expected).  I believe this occurs whether
> one is comparing "human:mouse" or the more complex species set I have
> outlined above.
> 
> 
> 
> Has anyone else attempted to run the code documented in the Ensembl API
> Tutorial?
> Any suggestions as to what direction to go in would be appreciated -- when
> one is trying to copy code out of a tutorial and it fails its kind of hard
> to know where to go.)
> 
> There do appear to be some problems in the specifications of a Compara
> version/database and there don't appear to be a lot of resources informing
> one of what resources are currently available.
> 
> Robert
> 
> 
> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
Andreas K?h?ri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom


From cjfields at illinois.edu  Sat Jan  9 17:01:19 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Jan 2010 16:01:19 -0600
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
Message-ID: <743C998D-BBB5-4832-BA25-24D7D7288F78@illinois.edu>

Robert,

Ensembl errors probably should be redirected to the ensembl mail list.  I can't speak to the problems with it (they appear specific to the Ensembl tool set).

chris

On Jan 9, 2010, at 2:27 PM, Robert Bradbury wrote:

> I am trying to get the examples provided by EMBL/Ensembl to work and am
> encountering problems.
> 
> For example, about 1/3 of the way through the Compara API tutorial [1] there
> is what is supposed to be a completely functional script.  It does not
> work.  This is in contrast to some of the earlier simple scripts (listing
> the species in  Ensmbl etc.) which do work on my machine, so I have all the
> libraries do dah installed correctly).
> 
> Very poor form to document scripts which do not function on a properly setup
> system.
> 
> I have modified my invocation of the script slightly:
>  Align.pl --set_of_species \
> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
> 
> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
> modified example of the Compara Tutoraial code.)
> As these are slightly modified perl scripts from the documantation, the line
> numbers may be variable.
> 
> I can print out the genome_dbs, and it gives me a list of genome names (hash
> tables) though it appears that is problematic in the Align.pl script.
> in spite of the fact that just previously to that call I dumped "genome_dbs"
> and got back some 25 hash tables (expected).  I believe this occurs whether
> one is comparing "human:mouse" or the more complex species set I have
> outlined above.
> 
> 
> 
> Has anyone else attempted to run the code documented in the Ensembl API
> Tutorial?
> Any suggestions as to what direction to go in would be appreciated -- when
> one is trying to copy code out of a tutorial and it fails its kind of hard
> to know where to go.)
> 
> There do appear to be some problems in the specifications of a Compara
> version/database and there don't appear to be a lot of resources informing
> one of what resources are currently available.
> 
> Robert
> 
> 
> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robert.bradbury at gmail.com  Sun Jan 10 14:47:00 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sun, 10 Jan 2010 14:47:00 -0500
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <20100109220121.GA9521@quux.windows.ebi.ac.uk>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
	<20100109220121.GA9521@quux.windows.ebi.ac.uk>
Message-ID: <deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>

As it turns out the example from the file I cited (the compara API
tutorial does work).  The code that I started with may have been from
a "MS-WORD" document distributed with the documentation (which could
quite well be out-of-date).

But even the corrected code does not work for various uncommon
comparisons between species (which they may not have archived in
Ensembl).  I also don't understand enough about the functions yet as
to whether they are comparing the same regions from the same
chromosomes that just happen to be identical or whether they are
comparing the same region with a homologous region on a different
chromosome (i.e. conserved genes).  I'm going to have to dig into this
some more to figure out what is going on.

Thanks for the pointers, I'll refer future questions to the Ensembl
list/help-desk.

However, if anyone knows Ensembl very well, the database has in it
some of these interspecies comparisons already.  They are accessed
when one does a phylogeny tree for specific genes (and generally for
highly conserved gene you will get a tree that includes nearly all 50
species in the database).  As I don't think they are computed
on-the-fly, the information must be precomputed and stored someplace
in the database.  I would very much like to know how to access this
information.

Thanks,
Robert


On 1/9/10, Andreas K?h?ri <ak at ebi.ac.uk> wrote:
> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
>> I am trying to get the examples provided by EMBL/Ensembl to work and am
>> encountering problems.
>
> Hi Robert,
>
> The ensembl-dev list is the appropriate forum for this type of questions
> as it has nothing to do with bioperl.
>
> There is also the Ensembl helpdesk.  If you send your problem to
> <helpdesk at ensembl.org> I'm sure that it will be picked up by the
> appropriate people (I do myself not know enough about the Compara API to
> be able to diagnose this problem straight away I'm afraid).
>
> Be sure to submit a minimal script that still exhibit the problem, and
> information about what version of the APIs you're using (we will assume
> that you're not mixing newer version of the API with older databases or
> vice versa).
>
> We are generally very happy to have bugs in documentation or code
> pointed out to us, and will correct errors as we are made aware of them.
>
>
> Kind regards,
> Andreas
>
>> For example, about 1/3 of the way through the Compara API tutorial [1]
>> there
>> is what is supposed to be a completely functional script.  It does not
>> work.  This is in contrast to some of the earlier simple scripts (listing
>> the species in  Ensmbl etc.) which do work on my machine, so I have all
>> the
>> libraries do dah installed correctly).
>>
>> Very poor form to document scripts which do not function on a properly
>> setup
>> system.
>>
>> I have modified my invocation of the script slightly:
>>   Align.pl --set_of_species \
>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis
>> familiaris:Sus
>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
>>
>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs"
>> on
>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
>> modified example of the Compara Tutoraial code.)
>> As these are slightly modified perl scripts from the documantation, the
>> line
>> numbers may be variable.
>>
>> I can print out the genome_dbs, and it gives me a list of genome names
>> (hash
>> tables) though it appears that is problematic in the Align.pl script.
>> in spite of the fact that just previously to that call I dumped
>> "genome_dbs"
>> and got back some 25 hash tables (expected).  I believe this occurs
>> whether
>> one is comparing "human:mouse" or the more complex species set I have
>> outlined above.
>>
>>
>>
>> Has anyone else attempted to run the code documented in the Ensembl API
>> Tutorial?
>> Any suggestions as to what direction to go in would be appreciated -- when
>> one is trying to copy code out of a tutorial and it fails its kind of hard
>> to know where to go.)
>>
>> There do appear to be some problems in the specifications of a Compara
>> version/database and there don't appear to be a lot of resources informing
>> one of what resources are currently available.
>>
>> Robert
>>
>>
>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --
> Andreas K?h?ri, Ensembl Software Developer
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge CB10 1SD, United Kingdom
>


From Russell.Smithies at agresearch.co.nz  Sun Jan 10 15:34:39 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 11 Jan 2010 09:34:39 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>

An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. 
In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)

If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:

   my $taxid  = $gi_taxid_nucl{$accession};
   my $org_name = $names{$taxid};

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> Sent: Saturday, 26 December 2009 4:52 p.m.
> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Bhakti,
> The following example (using EUtilities) may serve your purpose:
> 
> use Bio::DB::EUtilities;
> 
> my (%taxa, @taxa);
> my (%names, %idmap);
> 
> # these are protein ids; nuc ids will work by changing -dbfrom =>
> 'nucleotide',
> # (probably)
> 
> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> 
> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>                                        -db => 'taxonomy',
>                                        -dbfrom => 'protein',
>                                        -correspondence => 1,
>                                        -id => \@ids);
> 
> # iterate through the LinkSet objects
> while (my $ds = $factory->next_LinkSet) {
>     $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> }
> 
> @taxa = @taxa{@ids};
> 
> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>         -db    => 'taxonomy',
>         -id    => \@taxa );
> 
> while (local $_ = $factory->next_DocSum) {
>     $names{($_->get_contents_by_name('TaxId'))[0]} =
> ($_->get_contents_by_name('ScientificName'))[0];
> }
> 
> foreach (@ids) {
>     $idmap{$_} = $names{$taxa{$_}};
> }
> 
> # %idmap is
> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> #    68536103 => 'Corynebacterium jeikeium K411'
> #    730439 => 'Bacillus caldolyticus'
> #    89318838 => undef    (this record has been removed from the db)
> 
> 1;
> 
> You probably will need to break up your 30000 into chunks
> (say, 1000-3000 each), and do the above on each chunk with a
> 
> sleep 3;
> 
> or so separating the queries.
> MAJ
> ----- Original Message -----
> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, December 25, 2009 9:46 PM
> Subject: [Bioperl-l] how to retrieve organism name from accession number?
> 
> 
> > Hi,
> >
> > Does anyone know how to retrieve the "Source" or the "Species name"
> given
> > the accession number using Bioperl.   I have these 30,000 accession
> numbers
> > for which I need to get the source organisms.  Any kind of help will be
> > appreciated.
> >
> > Thanks
> >
> > BD
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Sun Jan 10 15:49:40 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 10 Jan 2010 14:49:40 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
Message-ID: <F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>

One could also use Bio::DB::Taxonomy, which indexes the same files or (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the details).

chris

On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:

> An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. 
> In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)
> 
> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:
> 
>   my $taxid  = $gi_taxid_nucl{$accession};
>   my $org_name = $names{$taxid};
> 
> --Russell
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>> Sent: Saturday, 26 December 2009 4:52 p.m.
>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> Bhakti,
>> The following example (using EUtilities) may serve your purpose:
>> 
>> use Bio::DB::EUtilities;
>> 
>> my (%taxa, @taxa);
>> my (%names, %idmap);
>> 
>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>> 'nucleotide',
>> # (probably)
>> 
>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>> 
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>                                       -db => 'taxonomy',
>>                                       -dbfrom => 'protein',
>>                                       -correspondence => 1,
>>                                       -id => \@ids);
>> 
>> # iterate through the LinkSet objects
>> while (my $ds = $factory->next_LinkSet) {
>>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>> }
>> 
>> @taxa = @taxa{@ids};
>> 
>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>        -db    => 'taxonomy',
>>        -id    => \@taxa );
>> 
>> while (local $_ = $factory->next_DocSum) {
>>    $names{($_->get_contents_by_name('TaxId'))[0]} =
>> ($_->get_contents_by_name('ScientificName'))[0];
>> }
>> 
>> foreach (@ids) {
>>    $idmap{$_} = $names{$taxa{$_}};
>> }
>> 
>> # %idmap is
>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>> #    68536103 => 'Corynebacterium jeikeium K411'
>> #    730439 => 'Bacillus caldolyticus'
>> #    89318838 => undef    (this record has been removed from the db)
>> 
>> 1;
>> 
>> You probably will need to break up your 30000 into chunks
>> (say, 1000-3000 each), and do the above on each chunk with a
>> 
>> sleep 3;
>> 
>> or so separating the queries.
>> MAJ
>> ----- Original Message -----
>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, December 25, 2009 9:46 PM
>> Subject: [Bioperl-l] how to retrieve organism name from accession number?
>> 
>> 
>>> Hi,
>>> 
>>> Does anyone know how to retrieve the "Source" or the "Species name"
>> given
>>> the accession number using Bioperl.   I have these 30,000 accession
>> numbers
>>> for which I need to get the source organisms.  Any kind of help will be
>>> appreciated.
>>> 
>>> Thanks
>>> 
>>> BD
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Sun Jan 10 16:05:06 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 11 Jan 2010 10:05:06 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>

I've started to go off eUtils recently (not BioPerl's fault) as I've often been finding that with large queries, chunks of the resulting data is missing.
For example, before Xmas I was creating species-specific databases by using eUtils to get a list of GI numbers back for a taxid, then retrieving the fasta sequences in chunks of 500.
Very regularly, in the middle of the fasta there would be a message about resource unavailable eg.
  >test_sequence_1
  TACGATCATCGCTResource UnavailableTACGACTCTGCT
  >test_sequence_2
  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT

Often this wasn't detected until formatdb complained about invalid characters.
Inquiries to NCBI as to why this was happening and what to do about it returned stupid answers ("do each sequence manually thru the web interface", or "use eUtils").
As we have a nice fast network connection, I now prefer to download very large gzip files (i.e. all of refseq) and extract what I need.

I can't help but think that NCBI could solve a lot of problems if they gzipped the output from eUtils queries - it's something I've requested regularly for the last 5 years or so!!

--Russell


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Monday, 11 January 2010 9:50 a.m.
> To: Smithies, Russell
> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> One could also use Bio::DB::Taxonomy, which indexes the same files or
> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
> details).
> 
> chris
> 
> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> 
> > An alternate non-BioPerly way (that may be faster given NCBI's flakiness
> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and
> do lookups.
> > In that same dir, taxdump.tar.gz contains a file called names.dmp which
> lists taxids and descriptions (and synonyms)
> >
> > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> could do this:
> >
> >   my $taxid  = $gi_taxid_nucl{$accession};
> >   my $org_name = $names{$taxid};
> >
> > --Russell
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >> Sent: Saturday, 26 December 2009 4:52 p.m.
> >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> Bhakti,
> >> The following example (using EUtilities) may serve your purpose:
> >>
> >> use Bio::DB::EUtilities;
> >>
> >> my (%taxa, @taxa);
> >> my (%names, %idmap);
> >>
> >> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >> 'nucleotide',
> >> # (probably)
> >>
> >> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>
> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>                                       -db => 'taxonomy',
> >>                                       -dbfrom => 'protein',
> >>                                       -correspondence => 1,
> >>                                       -id => \@ids);
> >>
> >> # iterate through the LinkSet objects
> >> while (my $ds = $factory->next_LinkSet) {
> >>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >> }
> >>
> >> @taxa = @taxa{@ids};
> >>
> >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>        -db    => 'taxonomy',
> >>        -id    => \@taxa );
> >>
> >> while (local $_ = $factory->next_DocSum) {
> >>    $names{($_->get_contents_by_name('TaxId'))[0]} =
> >> ($_->get_contents_by_name('ScientificName'))[0];
> >> }
> >>
> >> foreach (@ids) {
> >>    $idmap{$_} = $names{$taxa{$_}};
> >> }
> >>
> >> # %idmap is
> >> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >> #    68536103 => 'Corynebacterium jeikeium K411'
> >> #    730439 => 'Bacillus caldolyticus'
> >> #    89318838 => undef    (this record has been removed from the db)
> >>
> >> 1;
> >>
> >> You probably will need to break up your 30000 into chunks
> >> (say, 1000-3000 each), and do the above on each chunk with a
> >>
> >> sleep 3;
> >>
> >> or so separating the queries.
> >> MAJ
> >> ----- Original Message -----
> >> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Friday, December 25, 2009 9:46 PM
> >> Subject: [Bioperl-l] how to retrieve organism name from accession
> number?
> >>
> >>
> >>> Hi,
> >>>
> >>> Does anyone know how to retrieve the "Source" or the "Species name"
> >> given
> >>> the accession number using Bioperl.   I have these 30,000 accession
> >> numbers
> >>> for which I need to get the source organisms.  Any kind of help will
> be
> >>> appreciated.
> >>>
> >>> Thanks
> >>>
> >>> BD
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From avilella at gmail.com  Sun Jan 10 16:05:13 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 10 Jan 2010 21:05:13 +0000
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
	<20100109220121.GA9521@quux.windows.ebi.ac.uk>
	<deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>
Message-ID: <358f4d651001101305q1b75cfe3q558a245ab1ab1238@mail.gmail.com>

> However, if anyone knows Ensembl very well, the database has in it
> some of these interspecies comparisons already. ?They are accessed
> when one does a phylogeny tree for specific genes (and generally for
> highly conserved gene you will get a tree that includes nearly all 50
> species in the database). ?As I don't think they are computed
> on-the-fly, the information must be precomputed and stored someplace
> in the database. ?I would very much like to know how to access this
> information.

Yes, they are. You can access the data programmatically by installing
the ensembl and ensembl-compara Perl APIs.
There are a few example scripts for the GeneTrees:

ensembl-compara/scripts/examples/homology*.pl

Cheers,

Albert.

> Thanks,
> Robert
>
>
>
>
> On 1/9/10, Andreas K?h?ri <ak at ebi.ac.uk> wrote:
>> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
>>> I am trying to get the examples provided by EMBL/Ensembl to work and am
>>> encountering problems.
>>
>> Hi Robert,
>>
>> The ensembl-dev list is the appropriate forum for this type of questions
>> as it has nothing to do with bioperl.
>>
>> There is also the Ensembl helpdesk. ?If you send your problem to
>> <helpdesk at ensembl.org> I'm sure that it will be picked up by the
>> appropriate people (I do myself not know enough about the Compara API to
>> be able to diagnose this problem straight away I'm afraid).
>>
>> Be sure to submit a minimal script that still exhibit the problem, and
>> information about what version of the APIs you're using (we will assume
>> that you're not mixing newer version of the API with older databases or
>> vice versa).
>>
>> We are generally very happy to have bugs in documentation or code
>> pointed out to us, and will correct errors as we are made aware of them.
>>
>>
>> Kind regards,
>> Andreas
>>
>>> For example, about 1/3 of the way through the Compara API tutorial [1]
>>> there
>>> is what is supposed to be a completely functional script. ?It does not
>>> work. ?This is in contrast to some of the earlier simple scripts (listing
>>> the species in ?Ensmbl etc.) which do work on my machine, so I have all
>>> the
>>> libraries do dah installed correctly).
>>>
>>> Very poor form to document scripts which do not function on a properly
>>> setup
>>> system.
>>>
>>> I have modified my invocation of the script slightly:
>>> ? Align.pl --set_of_species \
>>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
>>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
>>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis
>>> familiaris:Sus
>>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
>>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
>>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
>>>
>>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs"
>>> on
>>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
>>> modified example of the Compara Tutoraial code.)
>>> As these are slightly modified perl scripts from the documantation, the
>>> line
>>> numbers may be variable.
>>>
>>> I can print out the genome_dbs, and it gives me a list of genome names
>>> (hash
>>> tables) though it appears that is problematic in the Align.pl script.
>>> in spite of the fact that just previously to that call I dumped
>>> "genome_dbs"
>>> and got back some 25 hash tables (expected). ?I believe this occurs
>>> whether
>>> one is comparing "human:mouse" or the more complex species set I have
>>> outlined above.
>>>
>>>
>>>
>>> Has anyone else attempted to run the code documented in the Ensembl API
>>> Tutorial?
>>> Any suggestions as to what direction to go in would be appreciated -- when
>>> one is trying to copy code out of a tutorial and it fails its kind of hard
>>> to know where to go.)
>>>
>>> There do appear to be some problems in the specifications of a Compara
>>> version/database and there don't appear to be a lot of resources informing
>>> one of what resources are currently available.
>>>
>>> Robert
>>>
>>>
>>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Andreas K?h?ri, Ensembl Software Developer
>> European Bioinformatics Institute (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge CB10 1SD, United Kingdom
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From alessandra.bilardi at gmail.com  Sun Jan 10 18:21:12 2010
From: alessandra.bilardi at gmail.com (Alessandra)
Date: Mon, 11 Jan 2010 00:21:12 +0100
Subject: [Bioperl-l] GBrowse.org project
In-Reply-To: <e0996aca1001101515q8121c87o9b90310691fcd640@mail.gmail.com>
References: <e0996aca1001101515q8121c87o9b90310691fcd640@mail.gmail.com>
Message-ID: <e0996aca1001101521p30b46829p93ee75dd797829b1@mail.gmail.com>

 Hi all,

   I'm Alessandra and I run GBrowse.org.
GBrowse.org is a resource for using and setting up GBrowse genome
browsers. The site provides one location where biologists and
bioinformaticians can find:

  1. Genome browser web sites for any organism that has them. If a
species has a genome browser anywhere on the web, then we aim to link
to it.
  2. Links to sequence and annotation files that are available online.
  3. Links to genome browser configuration files, when available
  4. An FTP site containing genome annotation and configuration files
for each annotated genome that does not have its own web site.

GBrowse.org emphasizes the GBrowse genome browser in its organization,
but also links to sites that use other browser packages such as UCSC,
Ensembl, and JBrowse.

Also, we are currently conducting a survey seeking input on future
project direction. Please take a few minutes now to provide your
feedback.

   Survey link: http://gbrowse.org/survey/index.php?sid=64264&lang=en
   GBrowse.org introdution link:
http://gmod.org/wiki/August_2009_GMOD_Meeting#GBrowse.org

   Thank you for your help,

   Alessandra Bilardi.
   http://gbrowse.org/
   CRIBI Genomics, University of Padua
   http://genomics.cribi.unipd.it/


From cjfields at illinois.edu  Sun Jan 10 22:04:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 10 Jan 2010 21:04:13 -0600
Subject: [Bioperl-l] GMOD BioPerl Meeting
Message-ID: <7D72ECC2-E856-4C09-B67A-62AFFB59B377@illinois.edu>

Just a quick reminder that we're having a BioPerl satellite meeting after the PAG Conference (just prior to the GMOD Meeting).  The meeting is this Wednesday, Jan. 13, starting at 11:30am, at the Best Western Seven Seas in San Diego.  I will update the relevant BioPerl and GMOD pages with more details as they become available.  At the moment, we will be meeting in the hotel lobby prior to starting the meeting and possible hackathon.  

http://www.bioperl.org/wiki/GMOD_2010_Meeting
http://gmod.org/wiki/January_2010_GMOD_Meeting#Satellite_Meetings

Thanks!

chris


From bernd.jagla at pasteur.fr  Mon Jan 11 05:11:16 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 11 Jan 2010 11:11:16 +0100
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
Message-ID: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>

Hi,

 
First off, I am not sure if this is supposed to be addressed to the Bioperl
or Gbrowse mailing list, so apologies if this is the wrong list and please
let me know.

 
I am writing a program in Java that needs to access genome annotation data.
Since I am using Gbrowse already I was thinking that I could combine both
approaches making life eventually easier for me. I am mainly interested in
getting a gene/feature name for a given position. The position is stored in
the feature table and through linking typelist, locationlist, (maybe
sequence), and feature I can get all the information I need. Unfortunately
it seems that the feature name is stored in the object blog of the feature
table. 

 
That is a bit suspicious to me because I don't understand why searching for
a name can be so fast if it is not indexed through mysql when searching
using GBrowse.

 
So my question is how to I parse the Bio::DB::SeqFeature object in JAVA
correctly to get the name of the feature and possible also any further
information.

 
Any suggestions are greatly appreciated. Maybe there is a better solution
than parsing Perl code with Java.?

 
Thanks a lot,

 
Bernd


From biopython at maubp.freeserve.co.uk  Mon Jan 11 05:48:52 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 10:48:52 +0000
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
In-Reply-To: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
Message-ID: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>

On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla <bernd.jagla at pasteur.fr> wrote:
> Hi,
>
> First off, I am not sure if this is supposed to be addressed to the Bioperl
> or Gbrowse mailing list, so apologies if this is the wrong list and please
> let me know.
>
> I am writing a program in Java that needs to access genome annotation data.
> Since I am using Gbrowse already I was thinking that I could combine both
> approaches making life eventually easier for me. I am mainly interested in
> getting a gene/feature name for a given position. The position is stored in
> the feature table and through linking typelist, locationlist, (maybe
> sequence), and feature I can get all the information I need. Unfortunately
> it seems that the feature name is stored in the object blog of the feature
> table.

How are you storing the data in Gbrowse? There are several back ends,
and this will make a big difference for accessing the raw data.

One option would be to use Gbrowse with BioSQL as the backend.
You can then use BioJava (or BioPerl, or BioPython, etc) to access the
database. The only downside is Gbrowse isn't working 100% on top
of BioSQL right now (I'd like to see this fixed, but I don't know Perl).
There is an open bug on this [ gmod-Bugs-2168597 ].

Peter


From bernd.jagla at pasteur.fr  Mon Jan 11 05:53:20 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 11 Jan 2010 11:53:20 +0100
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
In-Reply-To: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>
References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
	<320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>
Message-ID: <9056164A8A744A77B6CD1E8E4E20B104@zillumina>

I am using bp_seqfeature_load.pl to load my features. That is using
Bio:DB:SeqFeature(Store) and MySql as a backend... That's all I
understood...

B

> -----Original Message-----
> From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On
> Behalf Of Peter
> Sent: Monday, January 11, 2010 11:49 AM
> To: Bernd Jagla
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
> 
> On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla <bernd.jagla at pasteur.fr>
> wrote:
> > Hi,
> >
> > First off, I am not sure if this is supposed to be addressed to the
> Bioperl
> > or Gbrowse mailing list, so apologies if this is the wrong list and
> please
> > let me know.
> >
> > I am writing a program in Java that needs to access genome annotation
> data.
> > Since I am using Gbrowse already I was thinking that I could combine
> both
> > approaches making life eventually easier for me. I am mainly interested
> in
> > getting a gene/feature name for a given position. The position is stored
> in
> > the feature table and through linking typelist, locationlist, (maybe
> > sequence), and feature I can get all the information I need.
> Unfortunately
> > it seems that the feature name is stored in the object blog of the
> feature
> > table.
> 
> How are you storing the data in Gbrowse? There are several back ends,
> and this will make a big difference for accessing the raw data.
> 
> One option would be to use Gbrowse with BioSQL as the backend.
> You can then use BioJava (or BioPerl, or BioPython, etc) to access the
> database. The only downside is Gbrowse isn't working 100% on top
> of BioSQL right now (I'd like to see this fixed, but I don't know Perl).
> There is an open bug on this [ gmod-Bugs-2168597 ].
> 
> Peter


From awitney at sgul.ac.uk  Mon Jan 11 07:21:07 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 11 Jan 2010 12:21:07 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
Message-ID: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>

Hi,

I am writing a script to automate the running of Phylip Pars. In the process i have to create a Bio::AlignIO object from a set of data that i have in a hash.

I could write the hash data into a phylip file and then load the Bio::AlignIO from that file, but i wondered if i could skip the writing and then reading of a temporary file ?

thanks for any help

adam


From roy.chaudhuri at gmail.com  Mon Jan 11 08:54:25 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 11 Jan 2010 13:54:25 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <4B4B2A51.9040602@gmail.com>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com>
Message-ID: <4B4B2D91.70906@gmail.com>

Actually, I guess some sample code would be more helpful:

use Bio::LocatableSeq;
use Bio::SimpleAlign;
use Bio::AlignIO;
my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, 
-end=>4);
my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, 
-end=>3);
my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, 
-end=>5);
my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);

Cheers,
Roy.


On 11/01/2010 13:40, Roy Chaudhuri wrote:
> Hi Adam,
>
> I'm guessing you actually want to create a Bio::SimpleAlign object
> (representing an alignment), rather than a Bio::AlignIO object (which is
> just for reading/writing alignment files). Bio::SimpleAlign has a
> documented new method that allows you to construct an alignment from
> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but
> include gaps and start/end coordinates to describe their relationship to
> other sequences in the alignment.
>
> Roy.
>
> On 11/01/2010 12:21, Adam Witney wrote:
>> Hi,
>>
>> I am writing a script to automate the running of Phylip Pars. In the
>> process i have to create a Bio::AlignIO object from a set of data
>> that i have in a hash.
>>
>> I could write the hash data into a phylip file and then load the
>> Bio::AlignIO from that file, but i wondered if i could skip the
>> writing and then reading of a temporary file ?
>>
>> thanks for any help
>>
>> adam _______________________________________________ Bioperl-l
>> mailing list Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From roy.chaudhuri at gmail.com  Mon Jan 11 08:40:33 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 11 Jan 2010 13:40:33 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
Message-ID: <4B4B2A51.9040602@gmail.com>

Hi Adam,

I'm guessing you actually want to create a Bio::SimpleAlign object 
(representing an alignment), rather than a Bio::AlignIO object (which is 
just for reading/writing alignment files). Bio::SimpleAlign has a 
documented new method that allows you to construct an alignment from 
Bio::LocatableSeq objects, which are similar to Bio::Seq objects but 
include gaps and start/end coordinates to describe their relationship to 
other sequences in the alignment.

Roy.

On 11/01/2010 12:21, Adam Witney wrote:
> Hi,
>
> I am writing a script to automate the running of Phylip Pars. In the
> process i have to create a Bio::AlignIO object from a set of data
> that i have in a hash.
>
> I could write the hash data into a phylip file and then load the
> Bio::AlignIO from that file, but i wondered if i could skip the
> writing and then reading of a temporary file ?
>
> thanks for any help
>
> adam _______________________________________________ Bioperl-l
> mailing list Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Mon Jan 11 09:16:45 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 14:16:45 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
Message-ID: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>

Hi,

I'm running bioperl-live from SVN, just updated to revision 16648.

$ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.0069

I am trying to get Bio::SeqIO to convert a multiple record EMBL
file into GenBank format, piping the data via stdin/stdout using
the following trivial Perl script:

#!/usr/bin/env perl
use Bio::SeqIO;
my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
my $out = Bio::SeqIO->new(-format => 'genbank');
while (my $seq = $in->next_seq) { $out->write_seq($seq) };

This only seems to find the first EMBL record in my example
files. For example, this simple file has just two contig records:
http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl

This is just the first two records taken from a much larger EMBL file
rel_con_hum_01_r102.dat downloaded and uncompressed from:
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz

Trying both these examples as input, BioPerl just gives a single
GenBank record as output (the first EMBL entry in the input).

Is this a BioPerl bug, or am I missing something?

Peter


From maj at fortinbras.us  Mon Jan 11 10:04:00 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 11 Jan 2010 10:04:00 -0500
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>

Hi Peter, 
I found the issue-- there are no SQ lines in the data, and 
having them is a key stop condition in the parser (line 438 embl.pm).
We evidently need to be more liberal in what we accept, even as we 
are strict in what we emit. Could you make a bug report?
thanks for the heads-up--
MAJ
----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "bioperl-l list" <bioperl-l at lists.open-bio.org>
Sent: Monday, January 11, 2010 9:16 AM
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records


> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz
> 
> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From biopython at maubp.freeserve.co.uk  Mon Jan 11 10:17:37 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 15:17:37 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
Message-ID: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>

On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
> Hi Peter, I found the issue-- there are no SQ lines in the data, and having
> them is a key stop condition in the parser (line 438 embl.pm).
> We evidently need to be more liberal in what we accept, even as we are
> strict in what we emit. Could you make a bug report?
> thanks for the heads-up--
> MAJ

Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982

These are EMBL contig records, so they don't have SQ lines,
but instead CO lines.

Peter


From cjfields at illinois.edu  Mon Jan 11 10:24:24 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 09:24:24 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
	<320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>
Message-ID: <CDB3F40D-0298-410B-9814-3D9721380EBA@illinois.edu>


On Jan 11, 2010, at 9:17 AM, Peter wrote:

> On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>> 
>> Hi Peter, I found the issue-- there are no SQ lines in the data, and having
>> them is a key stop condition in the parser (line 438 embl.pm).
>> We evidently need to be more liberal in what we accept, even as we are
>> strict in what we emit. Could you make a bug report?
>> thanks for the heads-up--
>> MAJ
> 
> Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982
> 
> These are EMBL contig records, so they don't have SQ lines,
> but instead CO lines.
> 
> Peter

Peter, 

Just curious, but have you tried the experimental EMBL parser 'embldriver'?  I don't think it's bound to the same strictures, but I may be mistaken.

chris


From cjfields at illinois.edu  Mon Jan 11 10:23:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 09:23:00 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <0D0D9DB5-56FA-414E-8D1D-3FE18198F7EC@illinois.edu>

Just saw that mark responded, so if possible submit a bug.  We may be doing a mini-hackathon this Wednesday, so we can probably tackle it in the process (possibly along with a few other pressing issues).

chris

On Jan 11, 2010, at 8:16 AM, Peter wrote:

> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz
> 
> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Mon Jan 11 10:55:26 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 15:55:26 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <C771056E.6204%hrh@fmi.ch>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
Message-ID: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>

On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>
> These entries form the CON data class, see:
> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
> and they don't contain any sequence information.

I know - GenBank files have a similar system with CONTIG
lines instead of sequences. I was expecting BioPerl to be
able to convert these EMBL files with CO lines into GenBank
files with CONTIG lines.

> If you take the 'expanded' entries from
> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
> your script will work.

That's a useful tip - thanks.

Peter


From hrh at fmi.ch  Mon Jan 11 10:42:22 2010
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Mon, 11 Jan 2010 16:42:22 +0100
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <C771056E.6204%hrh@fmi.ch>


On 1/11/10 3:16 PM, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz

These entries form the CON data class, see:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
and they don't contain any sequence information.

If you take the 'expanded' entries from
ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r
102.dat.gz
your script will work.


Hans


> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Mon Jan 11 11:27:15 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 11 Jan 2010 16:27:15 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <4B4B2D91.70906@gmail.com>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
Message-ID: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>


Ah excellent, thanks Roy. I was indeed thinking about it the wrong way.

In the process of writing this i have created a 

Bio::Tools::Run::Phylo::Phylip::Pars class

which is essentially just a modified copy of ProtPars. I have also fixed a few typos and possible bugs in

Bio/Tools/Run/Phylo/Phylip/Base.pm
Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm
Bio/AlignIO/phylip.pm
Bio/Tools/Run/Alignment/Clustalw.pm

I am of course happy to send these back in to the project... how would i best do this?

Cheers

adam


On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote:

> Actually, I guess some sample code would be more helpful:
> 
> use Bio::LocatableSeq;
> use Bio::SimpleAlign;
> use Bio::AlignIO;
> my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4);
> my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3);
> my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5);
> my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);
> 
> Cheers,
> Roy.
> 
> 
> On 11/01/2010 13:40, Roy Chaudhuri wrote:
>> Hi Adam,
>> 
>> I'm guessing you actually want to create a Bio::SimpleAlign object
>> (representing an alignment), rather than a Bio::AlignIO object (which is
>> just for reading/writing alignment files). Bio::SimpleAlign has a
>> documented new method that allows you to construct an alignment from
>> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but
>> include gaps and start/end coordinates to describe their relationship to
>> other sequences in the alignment.
>> 
>> Roy.
>> 
>> On 11/01/2010 12:21, Adam Witney wrote:
>>> Hi,
>>> 
>>> I am writing a script to automate the running of Phylip Pars. In the
>>> process i have to create a Bio::AlignIO object from a set of data
>>> that i have in a hash.
>>> 
>>> I could write the hash data into a phylip file and then load the
>>> Bio::AlignIO from that file, but i wondered if i could skip the
>>> writing and then reading of a temporary file ?
>>> 
>>> thanks for any help
>>> 
>>> adam _______________________________________________ Bioperl-l
>>> mailing list Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 


From Russell.Smithies at agresearch.co.nz  Mon Jan 11 22:41:02 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 12 Jan 2010 16:41:02 +1300
Subject: [Bioperl-l] BioPerl version?
In-Reply-To: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>

Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ?

--Russell

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Mon Jan 11 22:59:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 21:59:44 -0600
Subject: [Bioperl-l] BioPerl version?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
	<18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>
Message-ID: <795BD926-4AE9-4478-AAD5-E36558350745@illinois.edu>

Not dumb, but a frequently asked one: that's a FAQ question ;>

http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'

chris

On Jan 11, 2010, at 9:41 PM, Smithies, Russell wrote:

> Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ?
> 
> --Russell
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 12 11:02:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 10:02:02 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
	<320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
Message-ID: <ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>

On Jan 11, 2010, at 9:55 AM, Peter wrote:

> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>> 
>> These entries form the CON data class, see:
>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>> and they don't contain any sequence information.
> 
> I know - GenBank files have a similar system with CONTIG
> lines instead of sequences. I was expecting BioPerl to be
> able to convert these EMBL files with CO lines into GenBank
> files with CONTIG lines.

IIRC the contig information for GenBank is stored in annotation.  We can try to ensure the data is carried over to EMBL properly.

>> If you take the 'expanded' entries from
>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>> your script will work.
> 
> That's a useful tip - thanks.
> 
> Peter

NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).

chris


From biopython at maubp.freeserve.co.uk  Tue Jan 12 11:19:32 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 12 Jan 2010 16:19:32 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
	<320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
	<ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>
Message-ID: <320fb6e01001120819u50e73fa8k9bde8aa1abdf942d@mail.gmail.com>

On Tue, Jan 12, 2010 at 4:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Jan 11, 2010, at 9:55 AM, Peter wrote:
>
>> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>>>
>>> These entries form the CON data class, see:
>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>>> and they don't contain any sequence information.
>>
>> I know - GenBank files have a similar system with CONTIG
>> lines instead of sequences. I was expecting BioPerl to be
>> able to convert these EMBL files with CO lines into GenBank
>> files with CONTIG lines.
>
> IIRC the contig information for GenBank is stored in annotation.
> We can try to ensure the data is carried over to EMBL properly.

For contig records (where there is no sequence) I think we just
need to map the GenBank CONTIG lines to the EMBL CO lines,
and vice versa. At least, that's what Biopython now does (trunk
code, not yet released).

>>> If you take the 'expanded' entries from
>>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>>> your script will work.
>>
>> That's a useful tip - thanks.
>>
>> Peter
>
> NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).

Indeed. This is a useful work around for when a parser couldn't
cope with the contig version of a GenBank file for some reason, e.g.
http://bugzilla.open-bio.org/show_bug.cgi?id=2745

Peter


From maj at fortinbras.us  Tue Jan 12 12:33:30 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 12:33:30 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
Message-ID: <231A8D9473704E7697F7A486A0CDA86A@NewLife>

Hi All--

The beta of Bio::DB::SoapEUtilities is now available in the
bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
service. The system is fully WSDL based, and all eutils are
available. The best thing (IMHO) are the result adaptors, which
provide conversion and iteration of SOAP results into BioPerl
objects. Schau, mal:

 use Bio::DB::EUtilities;
 my $fac = Bio::DB::EUtilities->new(); # step 1
 my $seqio = $fac->esearch(
       -db => 'nucleotide', 
       -term => 'HIV1 and CCR5 and Brazil'
    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
 # yes, it's already done the efetch under the hood...
 while ( my $seq = $seqio->next_seq ) { # step 4
  # do something with $seq, a Bio::Seq object...
 }

or this:

 my $links = $fac->elink( -db => 'protein', 
                          -dbfrom => 'nucleotide',
                          -id => \@nucids )->run( -auto_adapt => 1 );
 
 # maybe more than one associated id...
 my @prot_0 = $links->id_map( $nucids[0] );
   
 while ( my $ls = $links->next_linkset ) {
    @ids = $ls->ids;
    @submitted_ids = $ls->submitted_ids;
    # etc.
 }

and much, much more. See

http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service

and of course, the POD, for all the details, including 
download/installation. Tests in bioperl-run/t.

cheers, 
MAJ

-- No new dependencies were added or animals mistreated 
-- during the making of these modules.


From sheldon.mckay at gmail.com  Tue Jan 12 13:02:53 2010
From: sheldon.mckay at gmail.com (Sheldon McKay)
Date: Tue, 12 Jan 2010 10:02:53 -0800
Subject: [Bioperl-l] code.open-bio.org timing out?
Message-ID: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>

Hi all,

I keep timing out trying to do an svn checkout of bioperl-live from
code.open-bio.org.  Any suggestions?

Thanks,
Sheldon

----
Sheldon McKay, PhD
Lead, iPlant Tree of Life Engagement Team;
Research Investigator
Cold Spring Harbor Laboratory
http://mckay.cshl.edu
Google Voice:  (203) 701-9204


On Tue, Nov 3, 2009 at 9:09 AM, Aaron Mackey <amackey at virginia.edu> wrote:
> [ajm6q at lc4 bioperl-live]$ svn update
> svn: Decompression of svndiff data failed
>
>
> I'll admit to not having svn updated in awhile; A clean, anonymous svn co
> failed with the same message:
>
> [...]
> A ? ?bioperl-live/Bio/Structure/StructureI.pm
> A ? ?bioperl-live/Bio/Structure/IO
> svn: Decompression of svndiff data failed
>
> -Aaron
>
> P.S. I used this command: svn co svn://
> code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From biopython at maubp.freeserve.co.uk  Tue Jan 12 13:12:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 12 Jan 2010 18:12:46 +0000
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
Message-ID: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>

On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com> wrote:
> Hi all,
>
> I keep timing out trying to do an svn checkout of bioperl-live from
> code.open-bio.org. ?Any suggestions?
>
> Thanks,
> Sheldon

The OBF team know about this (its being discussed on root-l),
hopefully they'll have it fixed before too long.

Peter


From cjfields at illinois.edu  Tue Jan 12 13:18:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 12:18:45 -0600
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
	<320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
Message-ID: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>

On Jan 12, 2010, at 12:12 PM, Peter wrote:

> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com> wrote:
>> Hi all,
>> 
>> I keep timing out trying to do an svn checkout of bioperl-live from
>> code.open-bio.org.  Any suggestions?
>> 
>> Thanks,
>> Sheldon
> 
> The OBF team know about this (its being discussed on root-l),
> hopefully they'll have it fixed before too long.
> 
> Peter

We probably need to set up some automatic syncing of our read-only code.google.com repo as a backup.  Jason had originally set that up, hopefully he'll respond.

chris


From jason at bioperl.org  Tue Jan 12 13:27:55 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 12 Jan 2010 10:27:55 -0800
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
	<320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
	<8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>
Message-ID: <C9DDBB08-DB88-4596-AED3-B3FD89893C55@bioperl.org>

Hi - I had setup the google code sync, but then the unfortunately  
realization that the revision numbers are shared among the wiki and  
the code SVN (all 1 repo) so when I added a wiki page on the site I  
screwed up the numbering and it wasn't possible to sync anymore (that  
I could figure out) without resetting it and I haven't gone back to  
that. Sorry - I wasn't sure if we had figured out what we wanted to  
for repositories so I sort of stopped worrying about it.


-jason
On Jan 12, 2010, at 10:18 AM, Chris Fields wrote:

> On Jan 12, 2010, at 12:12 PM, Peter wrote:
>
>> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com 
>> > wrote:
>>> Hi all,
>>>
>>> I keep timing out trying to do an svn checkout of bioperl-live from
>>> code.open-bio.org.  Any suggestions?
>>>
>>> Thanks,
>>> Sheldon
>>
>> The OBF team know about this (its being discussed on root-l),
>> hopefully they'll have it fixed before too long.
>>
>> Peter
>
> We probably need to set up some automatic syncing of our read-only  
> code.google.com repo as a backup.  Jason had originally set that up,  
> hopefully he'll respond.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From virajj at gmail.com  Wed Jan  6 13:20:39 2010
From: virajj at gmail.com (Vijayaraj Nagarajan)
Date: Wed, 6 Jan 2010 13:20:39 -0500
Subject: [Bioperl-l] targetp request
Message-ID: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>

Hi,

I am trying to use targetP in bioperl.
the documentation at the bioperl site is a bit confusing to me...

I would appreciate if you could give a very small example, as to how to use
"Bio::Tools::TargetP" to predict the localization of a protein sequence that
i have stored as a string.

Thanks,
Vijay


From cjfields at illinois.edu  Tue Jan 12 18:36:53 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 17:36:53 -0600
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
Message-ID: <D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>

Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API conflict with the current EUtilities tools.

chris

On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:

> Hi All--
> 
> The beta of Bio::DB::SoapEUtilities is now available in the
> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
> service. The system is fully WSDL based, and all eutils are
> available. The best thing (IMHO) are the result adaptors, which
> provide conversion and iteration of SOAP results into BioPerl
> objects. Schau, mal:
> 
> use Bio::DB::EUtilities;
> my $fac = Bio::DB::EUtilities->new(); # step 1
> my $seqio = $fac->esearch(
>       -db => 'nucleotide', 
>       -term => 'HIV1 and CCR5 and Brazil'
>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
> # yes, it's already done the efetch under the hood...
> while ( my $seq = $seqio->next_seq ) { # step 4
>  # do something with $seq, a Bio::Seq object...
> }
> 
> or this:
> 
> my $links = $fac->elink( -db => 'protein', 
>                          -dbfrom => 'nucleotide',
>                          -id => \@nucids )->run( -auto_adapt => 1 );
> 
> # maybe more than one associated id...
> my @prot_0 = $links->id_map( $nucids[0] );
> 
> while ( my $ls = $links->next_linkset ) {
>    @ids = $ls->ids;
>    @submitted_ids = $ls->submitted_ids;
>    # etc.
> }
> 
> and much, much more. See
> 
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
> 
> and of course, the POD, for all the details, including 
> download/installation. Tests in bioperl-run/t.
> 
> cheers, 
> MAJ
> 
> -- No new dependencies were added or animals mistreated 
> -- during the making of these modules.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 12 19:22:10 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 18:22:10 -0600
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
	<D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
	<5AD210CB0C444A57881BBDD34DE99149@NewLife>
Message-ID: <B536964F-8F2F-4E07-9FD3-B7D0A945253E@illinois.edu>

Okay, just making sure (I was getting a bit paranoid).  Great work on the SOAP interface, BTW!

chris

On Jan 12, 2010, at 6:08 PM, Mark A. Jensen wrote:

> Um, yeah.
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "BioPerl List" <bioperl-l at bioperl.org>
> Sent: Tuesday, January 12, 2010 6:36 PM
> Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service
> 
> 
> Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API conflict with the current EUtilities tools.
> 
> chris
> 
> On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:
> 
>> Hi All--
>> 
>> The beta of Bio::DB::SoapEUtilities is now available in the
>> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
>> service. The system is fully WSDL based, and all eutils are
>> available. The best thing (IMHO) are the result adaptors, which
>> provide conversion and iteration of SOAP results into BioPerl
>> objects. Schau, mal:
>> 
>> use Bio::DB::EUtilities;
>> my $fac = Bio::DB::EUtilities->new(); # step 1
>> my $seqio = $fac->esearch(
>>      -db => 'nucleotide',
>>      -term => 'HIV1 and CCR5 and Brazil'
>>   )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
>> # yes, it's already done the efetch under the hood...
>> while ( my $seq = $seqio->next_seq ) { # step 4
>> # do something with $seq, a Bio::Seq object...
>> }
>> 
>> or this:
>> 
>> my $links = $fac->elink( -db => 'protein',
>>                         -dbfrom => 'nucleotide',
>>                         -id => \@nucids )->run( -auto_adapt => 1 );
>> 
>> # maybe more than one associated id...
>> my @prot_0 = $links->id_map( $nucids[0] );
>> 
>> while ( my $ls = $links->next_linkset ) {
>>   @ids = $ls->ids;
>>   @submitted_ids = $ls->submitted_ids;
>>   # etc.
>> }
>> 
>> and much, much more. See
>> 
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>> 
>> and of course, the POD, for all the details, including
>> download/installation. Tests in bioperl-run/t.
>> 
>> cheers,
>> MAJ
>> 
>> -- No new dependencies were added or animals mistreated
>> -- during the making of these modules.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Tue Jan 12 19:08:12 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 19:08:12 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
	<D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
Message-ID: <5AD210CB0C444A57881BBDD34DE99149@NewLife>

Um, yeah.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Tuesday, January 12, 2010 6:36 PM
Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web 
service


Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's 
Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API 
conflict with the current EUtilities tools.

chris

On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:

> Hi All--
>
> The beta of Bio::DB::SoapEUtilities is now available in the
> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
> service. The system is fully WSDL based, and all eutils are
> available. The best thing (IMHO) are the result adaptors, which
> provide conversion and iteration of SOAP results into BioPerl
> objects. Schau, mal:
>
> use Bio::DB::EUtilities;
> my $fac = Bio::DB::EUtilities->new(); # step 1
> my $seqio = $fac->esearch(
>       -db => 'nucleotide',
>       -term => 'HIV1 and CCR5 and Brazil'
>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
> # yes, it's already done the efetch under the hood...
> while ( my $seq = $seqio->next_seq ) { # step 4
>  # do something with $seq, a Bio::Seq object...
> }
>
> or this:
>
> my $links = $fac->elink( -db => 'protein',
>                          -dbfrom => 'nucleotide',
>                          -id => \@nucids )->run( -auto_adapt => 1 );
>
> # maybe more than one associated id...
> my @prot_0 = $links->id_map( $nucids[0] );
>
> while ( my $ls = $links->next_linkset ) {
>    @ids = $ls->ids;
>    @submitted_ids = $ls->submitted_ids;
>    # etc.
> }
>
> and much, much more. See
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>
> and of course, the POD, for all the details, including
> download/installation. Tests in bioperl-run/t.
>
> cheers,
> MAJ
>
> -- No new dependencies were added or animals mistreated
> -- during the making of these modules.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Tue Jan 12 20:09:28 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 20:09:28 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP
	webservice
In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife><D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
	<5AD210CB0C444A57881BBDD34DE99149@NewLife>
Message-ID: <A5829F72FD6F469D9CBCC94FC69C068F@NewLife>

corrected:

use Bio::DB::SoapEUtilities;
my $fac = Bio::DB::SoapEUtilities->new(); # step 1
my $seqio = $fac->esearch(
       -db => 'nucleotide',
       -term => 'HIV1 and CCR5 and Brazil'
    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
 # yes, it's already done the efetch under the hood...
 while ( my $seq = $seqio->next_seq ) { # step 4
  # do something with $seq, a Bio::Seq object...
 }

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Tuesday, January 12, 2010 7:08 PM
Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP 
webservice


> Um, yeah.
> ----- Original Message ----- 
> From: "Chris Fields" <cjfields at illinois.edu>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "BioPerl List" <bioperl-l at bioperl.org>
> Sent: Tuesday, January 12, 2010 6:36 PM
> Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web 
> service
>
>
> Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's 
> Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API 
> conflict with the current EUtilities tools.
>
> chris
>
> On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:
>
>> Hi All--
>>
>> The beta of Bio::DB::SoapEUtilities is now available in the
>> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
>> service. The system is fully WSDL based, and all eutils are
>> available. The best thing (IMHO) are the result adaptors, which
>> provide conversion and iteration of SOAP results into BioPerl
>> objects. Schau, mal:
>>
>> use Bio::DB::EUtilities;
>> my $fac = Bio::DB::EUtilities->new(); # step 1
>> my $seqio = $fac->esearch(
>>       -db => 'nucleotide',
>>       -term => 'HIV1 and CCR5 and Brazil'
>>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
>> # yes, it's already done the efetch under the hood...
>> while ( my $seq = $seqio->next_seq ) { # step 4
>>  # do something with $seq, a Bio::Seq object...
>> }
>>
>> or this:
>>
>> my $links = $fac->elink( -db => 'protein',
>>                          -dbfrom => 'nucleotide',
>>                          -id => \@nucids )->run( -auto_adapt => 1 );
>>
>> # maybe more than one associated id...
>> my @prot_0 = $links->id_map( $nucids[0] );
>>
>> while ( my $ls = $links->next_linkset ) {
>>    @ids = $ls->ids;
>>    @submitted_ids = $ls->submitted_ids;
>>    # etc.
>> }
>>
>> and much, much more. See
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>>
>> and of course, the POD, for all the details, including
>> download/installation. Tests in bioperl-run/t.
>>
>> cheers,
>> MAJ
>>
>> -- No new dependencies were added or animals mistreated
>> -- during the making of these modules.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From tuco at pasteur.fr  Wed Jan 13 05:24:34 2010
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 13 Jan 2010 11:24:34 +0100
Subject: [Bioperl-l] targetp request
In-Reply-To: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
Message-ID: <4B4D9F62.5010306@pasteur.fr>

On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote:
> Hi,
>
> I am trying to use targetP in bioperl.
> the documentation at the bioperl site is a bit confusing to me...
>
> I would appreciate if you could give a very small example, as to how to use
> "Bio::Tools::TargetP" to predict the localization of a protein sequence that
> i have stored as a string.
>
> Thanks,
> Vijay
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Dear Vivay,

Bio::Tools::TargetP is not intended to run targetp on a sequence but to 
read and parse results from targetp run.

 From the Pod doc :

DESCRIPTION
        TargetP modules will provides parsed informations about protein 
localization.  It
        reads in a targetp output file.  It parses the results, and 
returns a
        Bio::SeqFeature::Generic object for each sequences found to have 
a subcellular
        localization


So to analyze your sequence, you'll first need to run targetp on your 
sequence file to create a targetp result output file. Then use 
Bio::Tools::TargetP module to parse this result file and get only 
informations you want/need from the result to be display as shown in the 
SYNOPSIS of the Pod documentation of the module.

HTH

Regards

Emmanuel


From roy.chaudhuri at gmail.com  Wed Jan 13 07:52:58 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 13 Jan 2010 12:52:58 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
Message-ID: <4B4DC22A.8080701@gmail.com>

Upload them to Bugzilla as patches, and one of the devs will review your 
changes and incorporate them into bioperl-live:
http://www.bioperl.org/wiki/HOWTO:SubmitPatch

Roy.

On 11/01/2010 16:27, Adam Witney wrote:
>
> Ah excellent, thanks Roy. I was indeed thinking about it the wrong
> way.
>
> In the process of writing this i have created a
>
> Bio::Tools::Run::Phylo::Phylip::Pars class
>
> which is essentially just a modified copy of ProtPars. I have also
> fixed a few typos and possible bugs in
>
> Bio/Tools/Run/Phylo/Phylip/Base.pm
> Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm
> Bio/Tools/Run/Alignment/Clustalw.pm
>
> I am of course happy to send these back in to the project... how
> would i best do this?
>
> Cheers
>
> adam
>
>
> On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote:
>
>> Actually, I guess some sample code would be more helpful:
>>
>> use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my
>> $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1,
>> -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two',
>> -seq=>'A--CG', -start=>1, -end=>3); my
>> $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG',
>> -start=>1, -end=>5); my
>> $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
>> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);
>>
>> Cheers, Roy.
>>
>>
>> On 11/01/2010 13:40, Roy Chaudhuri wrote:
>>> Hi Adam,
>>>
>>> I'm guessing you actually want to create a Bio::SimpleAlign
>>> object (representing an alignment), rather than a Bio::AlignIO
>>> object (which is just for reading/writing alignment files).
>>> Bio::SimpleAlign has a documented new method that allows you to
>>> construct an alignment from Bio::LocatableSeq objects, which are
>>> similar to Bio::Seq objects but include gaps and start/end
>>> coordinates to describe their relationship to other sequences in
>>> the alignment.
>>>
>>> Roy.
>>>
>>> On 11/01/2010 12:21, Adam Witney wrote:
>>>> Hi,
>>>>
>>>> I am writing a script to automate the running of Phylip Pars.
>>>> In the process i have to create a Bio::AlignIO object from a
>>>> set of data that i have in a hash.
>>>>
>>>> I could write the hash data into a phylip file and then load
>>>> the Bio::AlignIO from that file, but i wondered if i could skip
>>>> the writing and then reading of a temporary file ?
>>>>
>>>> thanks for any help
>>>>
>>>> adam _______________________________________________ Bioperl-l
>>>> mailing list Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>


From marcelo011982 at gmail.com  Wed Jan 13 13:12:04 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Wed, 13 Jan 2010 16:12:04 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
Message-ID: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>

Hi..
I have an simple Blast result, such as blastn.
Is there an  scrip  to transform such result to Clustalw format in Bioperl
?(.aln)

Thanx for any help.


From Kevin.M.Brown at asu.edu  Wed Jan 13 13:01:42 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 13 Jan 2010 11:01:42 -0700
Subject: [Bioperl-l] targetp request
In-Reply-To: <4B4D9F62.5010306@pasteur.fr>
References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
	<4B4D9F62.5010306@pasteur.fr>
Message-ID: <1A4207F8295607498283FE9E93B775B4067C133E@EX02.asurite.ad.asu.edu>

Sounds like this module might be in the wrong place then. Sounds more
like a SeqIO or AlignIO module, heheh. Also looks like the docs might
need to be cleaned up a bit for english readability (at least that
initial sentence).

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Emmanuel Quevillon
> Sent: Wednesday, January 13, 2010 3:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] targetp request
> 
> On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote:
> > Hi,
> >
> > I am trying to use targetP in bioperl.
> > the documentation at the bioperl site is a bit confusing to me...
> >
> > I would appreciate if you could give a very small example, 
> as to how to use
> > "Bio::Tools::TargetP" to predict the localization of a 
> protein sequence that
> > i have stored as a string.
> >
> > Thanks,
> > Vijay
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Dear Vivay,
> 
> Bio::Tools::TargetP is not intended to run targetp on a 
> sequence but to 
> read and parse results from targetp run.
> 
>  From the Pod doc :
> 
> DESCRIPTION
>         TargetP modules will provides parsed informations 
> about protein 
> localization.  It
>         reads in a targetp output file.  It parses the results, and 
> returns a
>         Bio::SeqFeature::Generic object for each sequences 
> found to have 
> a subcellular
>         localization
> 
> 
> So to analyze your sequence, you'll first need to run targetp on your 
> sequence file to create a targetp result output file. Then use 
> Bio::Tools::TargetP module to parse this result file and get only 
> informations you want/need from the result to be display as 
> shown in the 
> SYNOPSIS of the Pod documentation of the module.
> 
> HTH
> 
> Regards
> 
> Emmanuel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Wed Jan 13 13:44:36 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 13 Jan 2010 13:44:36 -0500
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
Message-ID: <C85EC8A05E884B328AFDAA055341E9E2@NewLife>

Marcelo-
Yes-- look at the code snip at
http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
combined with the snip at 
http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
(using -format => 'clustalw')
cheers MAJ
----- Original Message ----- 
From: "Marcelo Iwata" <marcelo011982 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, January 13, 2010 1:12 PM
Subject: [Bioperl-l] Blast to Clustalw Format


> Hi..
> I have an simple Blast result, such as blastn.
> Is there an  scrip  to transform such result to Clustalw format in Bioperl
> ?(.aln)
> 
> Thanx for any help.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From dan.kortschak at adelaide.edu.au  Wed Jan 13 23:26:46 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 14:56:46 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
Message-ID: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>

Hi All,

I'm having a stupid problem that for some reason I just can't figure
out. I'm putting together a B:A:IO:bowtie module to wrap around the
B:A:IO:sam module so bowtie output can be used as an assembly start
point.

For some reason that is escaping me I can't create tempfiles!

What should be the relevant code in the module:

package Bio::Assembly::IO::bowtie;
use strict;
use warnings;

# Object preamble - inherits from Bio::Root::Root

use Bio::SeqIO;
use Bio::Tools::Run::Samtools;
use Bio::Assembly::IO;
use Carp;
use Bio::Root::Root;
use Bio::Root::IO;
use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );


and the line (there are a couple of others that are like to fail in the
same way, but I've not got that far)

my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
$self->tempdir(), -suffix => '.sam' );

Which dies with:
Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.

Relevant environment vars:
  DB<10> x @ISA 
0  'Bio::Root::Root'
1  'Bio::Root::IO'
2  'Bio::Assembly::IO'

DB<11> x $self
0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
   '_no_head' => undef
   '_no_sq' => undef
   '_root_verbose' => 0


Can someone suggest what I'm missing?

cheers
Dan


From maj at fortinbras.us  Thu Jan 14 00:11:01 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:11:01 -0500
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <84196F01FF584C64A79B89FECE2DD86F@NewLife>

Hey Dan-- what does your constructor look like? I wonder if something's getting 
lost in new() and _initialize() chaining spaghetti- MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, January 13, 2010 11:26 PM
Subject: [Bioperl-l] not able to use Bio::Root::IO method


> Hi All,
>
> I'm having a stupid problem that for some reason I just can't figure
> out. I'm putting together a B:A:IO:bowtie module to wrap around the
> B:A:IO:sam module so bowtie output can be used as an assembly start
> point.
>
> For some reason that is escaping me I can't create tempfiles!
>
> What should be the relevant code in the module:
>
> package Bio::Assembly::IO::bowtie;
> use strict;
> use warnings;
>
> # Object preamble - inherits from Bio::Root::Root
>
> use Bio::SeqIO;
> use Bio::Tools::Run::Samtools;
> use Bio::Assembly::IO;
> use Carp;
> use Bio::Root::Root;
> use Bio::Root::IO;
> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>
>
> and the line (there are a couple of others that are like to fail in the
> same way, but I've not got that far)
>
> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
> $self->tempdir(), -suffix => '.sam' );
>
> Which dies with:
> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>
> Relevant environment vars:
>  DB<10> x @ISA
> 0  'Bio::Root::Root'
> 1  'Bio::Root::IO'
> 2  'Bio::Assembly::IO'
>
> DB<11> x $self
> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>   '_no_head' => undef
>   '_no_sq' => undef
>   '_root_verbose' => 0
>
>
>
> Can someone suggest what I'm missing?
>
> cheers
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Thu Jan 14 00:35:35 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 16:05:35 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <84196F01FF584C64A79B89FECE2DD86F@NewLife>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
Message-ID: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>

Thanks Mark, I'm not sure about that since @ISA still includes
Bio::Root:IO when it's at the call, but it might be.

cheers
Dan

Here is the entirety of the code (it reasonably short):

package Bio::Assembly::IO::bowtie;
use strict;
use warnings;

# Object preamble - inherits from Bio::Root::Root

use Bio::SeqIO;
use Bio::Tools::Run::Samtools;
use Bio::Assembly::IO;
use Carp;
use Bio::Root::Root;
use Bio::Root::IO;
use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );

our $HD = "\@HD\tVN:1.0\tSO:unsorted\n";
our $PG = "\@PG\tID=Bowtie\n";

our $HAVE_IO_UNCOMPRESS;
BEGIN {
# check requirements
    unless ( eval "require Bio::Tools::Run::Bowtie;") {
	Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index.");
    }
    unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") {
	Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand.");
    }
}

sub new {
	my $class = shift;
	my @args = @_;
	my $self = $class->SUPER::new(@args);
	my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args);
	$file =~ s/^<//;
	$self->{'_no_head'} = $no_head;
	$self->{'_no_sq'} = $no_sq;
	# get the sequence so samtools can work with it
	my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' );
	my $refdb = $inspector->run($index);
	my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb));
	my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' );
	return $sam;
}

sub _bowtie_to_sam {
	my ($self, $file, $refdb) = @_;

	$self->throw("'$file' does not exist or is not readable.")
		unless ( -e $file && -r $file );
	my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file);
	$self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/;

	my %SQ;
	my $mapq = 255;
	my $in_pair;
	my @mate_line;
	my $mlen;

	if ($file =~ m/\.gz[^.]*$/) {
		unless ($HAVE_IO_UNCOMPRESS) {
			croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" );
		}
		my ($tfh, $tf) = $self->io->tempfile;
		my $z = IO::Uncompress::Gunzip->new($_);
		while (<$z>) { print $tfh $_ }
		close $tfh;
		$file = $tf;
	}

        open(my $fh, $file) or
		$self->throw("Can not open '$file' for reading: $!");
            
	# create temp file for working
	my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
	
	while ($fh) {
		chomp;
		my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_);
		$SQ{$rname} = 1;
		
		my $paired_f =  ($qname =~ m#/[12]#) ? 0x03 : 0;
		my $strand_f = ($strand eq '-') ? 0x10 : 0;
		my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0;
		my $first_f =  ($qname =~ m#/1#) ? 0x40 : 0;
		my $second_f = ($qname =~ m#/2#) ? 0x80 : 0;
		my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f;

		$pos++;
		my $len = length $seq;
		die unless $len == length $qual;
		my $cigar = $len.'M';
		my @detail = split(',',$details);
		my $dist = 'NM:i:'.scalar @detail;
		
		my @mismatch;
		my $last_pos = 0;
		for (@detail) {
			m/(\d+):(\w)>\w/;
			my $err = ($1-$last_pos);
			$last_pos = $1+1;
			push @mismatch,($err,$2);
		}
		push @mismatch, $len-$last_pos;
		@mismatch = reverse @mismatch if $strand eq '-';
		my $mismatch = join('',('MD:Z:', at mismatch));

		if ($paired_f) {
			my $mrnm = '=';
			if ($in_pair) {
				my $mpos = $mate_line[3];
				$mate_line[7] = $pos;
				my $isize = $mpos-$pos-$len;
				$mate_line[8] = -$isize;
				print $sam_tmp_h join("\t", at mate_line),"\n";
				print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
				$in_pair = 0;
			} else {
				$mlen = $len;
				@mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist);
				$in_pair = 1;
			}
		} else {
			my $mrnm = '*';
			my $mpos = 0;
			my $isize = 0;
			print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
		}
	}

	close($fh);
	$sam_tmp_h->close;
	
	return $sam_tmp_f if $self->{'_no_head'};

	my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );

	# print header
	print $samh $HD;
	
	# print sequence dictionary
	unless ($self->{'_no_sq'}) {
		my $db  = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' );
		while ( my $seq = $db->next_seq() ) {
			$SQ{$seq->id} = $seq->length if $SQ{$seq->id};
		}
	
		map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ;
	}
	
	# print program
	print $samh $PG;
	
	open($sam_tmp_h, $sam_tmp_f) or
		$self->throw("Can not open '$sam_tmp_f' for reading: $!");

	print $samh $_ while ($sam_tmp_h);
	
	close($sam_tmp_h);
	$samh->close;
	
	return $samf;
}

sub _make_bam {
	my ($self, $file) = @_;
	
	$self->throw("'$file' does not exist or is not readable")
		unless ( -e $file && -r $file );

	# make a sorted bam file from a sam file input
	my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' );
	my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' );
	$_->close for ($bamh, $srth);
	
	my $samt = Bio::Tools::Run::Samtools->new( -command => 'view',
						   -sam_input => 1,
						   -bam_output => 1 );

	$samt->run( -bam => $file, -out => $bamf );

	$samt = Bio::Tools::Run::Samtools->new( -command => 'sort' );

	$samt->run( -bam => $bamf, -pfx => $srtf);

	return $srtf.'.bam'
}

1;


On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote:
> Hey Dan-- what does your constructor look like? I wonder if
> something's getting 
> lost in new() and _initialize() chaining spaghetti- MAJ
> 


From dan.kortschak at adelaide.edu.au  Thu Jan 14 00:35:48 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 16:05:48 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au>
	<B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
Message-ID: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>

I've had a bit of a play with that, but no luck.

Dan

On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote:
> I've found that rearranging the items in the 'use base' array can
> sometimes 
> recover
> lost methods. I don't know enough of the arcana to know why it works. 
> (Sometimes,
> java starts looking pretty good from here...)
> 


From maj at fortinbras.us  Thu Jan 14 00:38:00 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:38:00 -0500
Subject: [Bioperl-l] Fw:  not able to use Bio::Root::IO method
Message-ID: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>

up to list
----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
Sent: Thursday, January 14, 2010 12:36 AM
Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method


> Aha-- check out the pod for Bio::Root::IO:
> 
> "This module provides methods that will usually be needed for any sort
> of file- or stream-related input/output, e.g., keeping track of a file
> handle, transient printing and reading from the file handle, a close
> method, automatically closing the handle on garbage collection, etc.
> 
> To use this for your own code you will either want to inherit from
> this module, or instantiate an object for every file or stream you are
> dealing with. In the first case this module will most likely not be
> the first class off which your class inherits; therefore you need to
> call _initialize_io() with the named parameters in order to set file
> handle, open file, etc automatically."
> 
> I think you're wanting a call to $self->_initialize_io(). (There is no io() 
> method explicitly defined in any of the base classes.)
> MAJ
> ----- Original Message ----- 
> From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, January 13, 2010 11:26 PM
> Subject: [Bioperl-l] not able to use Bio::Root::IO method
> 
> 
>> Hi All,
>> 
>> I'm having a stupid problem that for some reason I just can't figure
>> out. I'm putting together a B:A:IO:bowtie module to wrap around the
>> B:A:IO:sam module so bowtie output can be used as an assembly start
>> point.
>> 
>> For some reason that is escaping me I can't create tempfiles!
>> 
>> What should be the relevant code in the module:
>> 
>> package Bio::Assembly::IO::bowtie;
>> use strict;
>> use warnings;
>> 
>> # Object preamble - inherits from Bio::Root::Root
>> 
>> use Bio::SeqIO;
>> use Bio::Tools::Run::Samtools;
>> use Bio::Assembly::IO;
>> use Carp;
>> use Bio::Root::Root;
>> use Bio::Root::IO;
>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>> 
>> 
>> and the line (there are a couple of others that are like to fail in the
>> same way, but I've not got that far)
>> 
>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
>> $self->tempdir(), -suffix => '.sam' );
>> 
>> Which dies with:
>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>> 
>> Relevant environment vars:
>>  DB<10> x @ISA 
>> 0  'Bio::Root::Root'
>> 1  'Bio::Root::IO'
>> 2  'Bio::Assembly::IO'
>> 
>> DB<11> x $self
>> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>>   '_no_head' => undef
>>   '_no_sq' => undef
>>   '_root_verbose' => 0
>> 
>> 
>> 
>> Can someone suggest what I'm missing?
>> 
>> cheers
>> Dan
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>


From maj at fortinbras.us  Thu Jan 14 00:50:11 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:50:11 -0500
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au>
	<B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
	<1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <82BFF47099684EF496DB3875D39DCA14@NewLife>

For the benefit of the list, I categorically deny ever making the 
statement about java below....
MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 12:35 AM
Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method


> I've had a bit of a play with that, but no luck.
> 
> Dan
> 
> On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote:
>> I've found that rearranging the items in the 'use base' array can
>> sometimes 
>> recover
>> lost methods. I don't know enough of the arcana to know why it works. 
>> (Sometimes,
>> java starts looking pretty good from here...)
>> 
> 
>


From cjfields at illinois.edu  Thu Jan 14 02:23:41 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 01:23:41 -0600
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>

You can remove separate 'use' directives if they are declared with 'use base' (they will be imported then).  Also, Bio::Root::IO inherits Bio::Root::Root, and Bio::Assembly::IO should inherit from Bio::Root::IO, so the only base module you should need is Bio::Assembly::IO.  It's possible having all three is confusing the interpreter.

chris

On Jan 13, 2010, at 11:35 PM, Dan Kortschak wrote:

> Thanks Mark, I'm not sure about that since @ISA still includes
> Bio::Root:IO when it's at the call, but it might be.
> 
> cheers
> Dan
> 
> Here is the entirety of the code (it reasonably short):
> 
> package Bio::Assembly::IO::bowtie;
> use strict;
> use warnings;
> 
> # Object preamble - inherits from Bio::Root::Root
> 
> use Bio::SeqIO;
> use Bio::Tools::Run::Samtools;
> use Bio::Assembly::IO;
> use Carp;
> use Bio::Root::Root;
> use Bio::Root::IO;
> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
> 
> our $HD = "\@HD\tVN:1.0\tSO:unsorted\n";
> our $PG = "\@PG\tID=Bowtie\n";
> 
> our $HAVE_IO_UNCOMPRESS;
> BEGIN {
> # check requirements
>    unless ( eval "require Bio::Tools::Run::Bowtie;") {
> 	Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index.");
>    }
>    unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") {
> 	Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand.");
>    }
> }
> 
> sub new {
> 	my $class = shift;
> 	my @args = @_;
> 	my $self = $class->SUPER::new(@args);
> 	my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args);
> 	$file =~ s/^<//;
> 	$self->{'_no_head'} = $no_head;
> 	$self->{'_no_sq'} = $no_sq;
> 	# get the sequence so samtools can work with it
> 	my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' );
> 	my $refdb = $inspector->run($index);
> 	my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb));
> 	my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' );
> 	return $sam;
> }
> 
> sub _bowtie_to_sam {
> 	my ($self, $file, $refdb) = @_;
> 
> 	$self->throw("'$file' does not exist or is not readable.")
> 		unless ( -e $file && -r $file );
> 	my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file);
> 	$self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/;
> 
> 	my %SQ;
> 	my $mapq = 255;
> 	my $in_pair;
> 	my @mate_line;
> 	my $mlen;
> 
> 	if ($file =~ m/\.gz[^.]*$/) {
> 		unless ($HAVE_IO_UNCOMPRESS) {
> 			croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" );
> 		}
> 		my ($tfh, $tf) = $self->io->tempfile;
> 		my $z = IO::Uncompress::Gunzip->new($_);
> 		while (<$z>) { print $tfh $_ }
> 		close $tfh;
> 		$file = $tf;
> 	}
> 
>        open(my $fh, $file) or
> 		$self->throw("Can not open '$file' for reading: $!");
> 
> 	# create temp file for working
> 	my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
> 	
> 	while ($fh) {
> 		chomp;
> 		my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_);
> 		$SQ{$rname} = 1;
> 		
> 		my $paired_f =  ($qname =~ m#/[12]#) ? 0x03 : 0;
> 		my $strand_f = ($strand eq '-') ? 0x10 : 0;
> 		my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0;
> 		my $first_f =  ($qname =~ m#/1#) ? 0x40 : 0;
> 		my $second_f = ($qname =~ m#/2#) ? 0x80 : 0;
> 		my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f;
> 
> 		$pos++;
> 		my $len = length $seq;
> 		die unless $len == length $qual;
> 		my $cigar = $len.'M';
> 		my @detail = split(',',$details);
> 		my $dist = 'NM:i:'.scalar @detail;
> 		
> 		my @mismatch;
> 		my $last_pos = 0;
> 		for (@detail) {
> 			m/(\d+):(\w)>\w/;
> 			my $err = ($1-$last_pos);
> 			$last_pos = $1+1;
> 			push @mismatch,($err,$2);
> 		}
> 		push @mismatch, $len-$last_pos;
> 		@mismatch = reverse @mismatch if $strand eq '-';
> 		my $mismatch = join('',('MD:Z:', at mismatch));
> 
> 		if ($paired_f) {
> 			my $mrnm = '=';
> 			if ($in_pair) {
> 				my $mpos = $mate_line[3];
> 				$mate_line[7] = $pos;
> 				my $isize = $mpos-$pos-$len;
> 				$mate_line[8] = -$isize;
> 				print $sam_tmp_h join("\t", at mate_line),"\n";
> 				print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
> 				$in_pair = 0;
> 			} else {
> 				$mlen = $len;
> 				@mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist);
> 				$in_pair = 1;
> 			}
> 		} else {
> 			my $mrnm = '*';
> 			my $mpos = 0;
> 			my $isize = 0;
> 			print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
> 		}
> 	}
> 
> 	close($fh);
> 	$sam_tmp_h->close;
> 	
> 	return $sam_tmp_f if $self->{'_no_head'};
> 
> 	my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
> 
> 	# print header
> 	print $samh $HD;
> 	
> 	# print sequence dictionary
> 	unless ($self->{'_no_sq'}) {
> 		my $db  = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' );
> 		while ( my $seq = $db->next_seq() ) {
> 			$SQ{$seq->id} = $seq->length if $SQ{$seq->id};
> 		}
> 	
> 		map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ;
> 	}
> 	
> 	# print program
> 	print $samh $PG;
> 	
> 	open($sam_tmp_h, $sam_tmp_f) or
> 		$self->throw("Can not open '$sam_tmp_f' for reading: $!");
> 
> 	print $samh $_ while ($sam_tmp_h);
> 	
> 	close($sam_tmp_h);
> 	$samh->close;
> 	
> 	return $samf;
> }
> 
> sub _make_bam {
> 	my ($self, $file) = @_;
> 	
> 	$self->throw("'$file' does not exist or is not readable")
> 		unless ( -e $file && -r $file );
> 
> 	# make a sorted bam file from a sam file input
> 	my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' );
> 	my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' );
> 	$_->close for ($bamh, $srth);
> 	
> 	my $samt = Bio::Tools::Run::Samtools->new( -command => 'view',
> 						   -sam_input => 1,
> 						   -bam_output => 1 );
> 
> 	$samt->run( -bam => $file, -out => $bamf );
> 
> 	$samt = Bio::Tools::Run::Samtools->new( -command => 'sort' );
> 
> 	$samt->run( -bam => $bamf, -pfx => $srtf);
> 
> 	return $srtf.'.bam'
> }
> 
> 1;
> 
> 
> On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote:
>> Hey Dan-- what does your constructor look like? I wonder if
>> something's getting 
>> lost in new() and _initialize() chaining spaghetti- MAJ
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jan 14 02:25:05 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 01:25:05 -0600
Subject: [Bioperl-l] Fw:  not able to use Bio::Root::IO method
In-Reply-To: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>
References: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>
Message-ID: <1DB926E1-9C6F-4B96-8D7E-28317DD7DE42@illinois.edu>

Yes, that's true.  The call to an io() is a Bio::Tools::Run::WrapperBase thing (the io() is a Bio::Root::IO instance).

chris

On Jan 13, 2010, at 11:38 PM, Mark A. Jensen wrote:

> up to list
> ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
> Sent: Thursday, January 14, 2010 12:36 AM
> Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method
> 
> 
>> Aha-- check out the pod for Bio::Root::IO:
>> "This module provides methods that will usually be needed for any sort
>> of file- or stream-related input/output, e.g., keeping track of a file
>> handle, transient printing and reading from the file handle, a close
>> method, automatically closing the handle on garbage collection, etc.
>> To use this for your own code you will either want to inherit from
>> this module, or instantiate an object for every file or stream you are
>> dealing with. In the first case this module will most likely not be
>> the first class off which your class inherits; therefore you need to
>> call _initialize_io() with the named parameters in order to set file
>> handle, open file, etc automatically."
>> I think you're wanting a call to $self->_initialize_io(). (There is no io() method explicitly defined in any of the base classes.)
>> MAJ
>> ----- Original Message ----- From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, January 13, 2010 11:26 PM
>> Subject: [Bioperl-l] not able to use Bio::Root::IO method
>>> Hi All,
>>> I'm having a stupid problem that for some reason I just can't figure
>>> out. I'm putting together a B:A:IO:bowtie module to wrap around the
>>> B:A:IO:sam module so bowtie output can be used as an assembly start
>>> point.
>>> For some reason that is escaping me I can't create tempfiles!
>>> What should be the relevant code in the module:
>>> package Bio::Assembly::IO::bowtie;
>>> use strict;
>>> use warnings;
>>> # Object preamble - inherits from Bio::Root::Root
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::Samtools;
>>> use Bio::Assembly::IO;
>>> use Carp;
>>> use Bio::Root::Root;
>>> use Bio::Root::IO;
>>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>>> and the line (there are a couple of others that are like to fail in the
>>> same way, but I've not got that far)
>>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
>>> $self->tempdir(), -suffix => '.sam' );
>>> Which dies with:
>>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
>>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>>> Relevant environment vars:
>>> DB<10> x @ISA 0  'Bio::Root::Root'
>>> 1  'Bio::Root::IO'
>>> 2  'Bio::Assembly::IO'
>>> DB<11> x $self
>>> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>>>  '_no_head' => undef
>>>  '_no_sq' => undef
>>>  '_root_verbose' => 0
>>> Can someone suggest what I'm missing?
>>> cheers
>>> Dan
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Thu Jan 14 02:59:20 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 18:29:20 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
	<B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>
Message-ID: <1263455960.4630.3.camel@epistle>

Thanks Chris,

I've done that, and since the inheritance is direct (rather than being a
constructed attribute in the object hash) the calls are $obj->temp*
rather than the $obj->io->temp* that I was using.

It works now and is much clearer having gotten rid of much of the
declarations.

cheers
Dan

On Thu, 2010-01-14 at 01:23 -0600, Chris Fields wrote:
> You can remove separate 'use' directives if they are declared with
> 'use base' (they will be imported then).  Also, Bio::Root::IO inherits
> Bio::Root::Root, and Bio::Assembly::IO should inherit from
> Bio::Root::IO, so the only base module you should need is
> Bio::Assembly::IO.  It's possible having all three is confusing the
> interpreter.
> 
> chris


From marcelo011982 at gmail.com  Thu Jan 14 08:44:25 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Thu, 14 Jan 2010 11:44:25 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <C85EC8A05E884B328AFDAA055341E9E2@NewLife>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
	<C85EC8A05E884B328AFDAA055341E9E2@NewLife>
Message-ID: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>

Thanks Mark.
I think that most of you already know it.
But , i'll put it for new users:


#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
use Bio::AlignIO;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => '
../../fontes/exemplos/blat/teste2/output.blast ');
my $aln;
my $alnIO;
$alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
while ( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while ( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while ( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
      $aln = $hsp->get_aln;
      $alnIO->write_aln($aln);

    }
  }
}


On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> Marcelo-
> Yes-- look at the code snip at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
> combined with the snip at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> (using -format => 'clustalw')
> cheers MAJ
> ----- Original Message ----- From: "Marcelo Iwata" <
> marcelo011982 at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, January 13, 2010 1:12 PM
> Subject: [Bioperl-l] Blast to Clustalw Format
>
>
>  Hi..
>> I have an simple Blast result, such as blastn.
>> Is there an  scrip  to transform such result to Clustalw format in Bioperl
>> ?(.aln)
>>
>> Thanx for any help.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>


From marcelo011982 at gmail.com  Thu Jan 14 08:46:21 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Thu, 14 Jan 2010 11:46:21 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
	<C85EC8A05E884B328AFDAA055341E9E2@NewLife>
	<1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
Message-ID: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>

Sorry , the correct code is:


#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
use Bio::AlignIO;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => '
../../fontes/exemplos/blat/teste2/output.blast ');
my $aln;
my $alnIO;
$alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
while ( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while ( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while ( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
      $aln = $hsp->get_aln;
      $alnIO->write_aln($aln);

    }
  }
}


On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata <marcelo011982 at gmail.com>wrote:

> Thanks Mark.
> I think that most of you already know it.
> But , i'll put it for new users:
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
> use Bio::AlignIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
>                            -file   => '
> ../../fontes/exemplos/blat/teste2/output.blast ');
> my $aln;
> my $alnIO;
> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
> while ( my $result = $in->next_result ) {
>   ## $result is a Bio::Search::Result::ResultI compliant object
>   while ( my $hit = $result->next_hit ) {
>     ## $hit is a Bio::Search::Hit::HitI compliant object
>     while ( my $hsp = $hit->next_hsp ) {
>       ## $hsp is a Bio::Search::HSP::HSPI compliant object
>       $aln = $hsp->get_aln;
>       $alnIO->write_aln($aln);
>
>
>     }
>   }
> }
>
>
> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>> Marcelo-
>> Yes-- look at the code snip at
>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>> combined with the snip at
>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>> (using -format => 'clustalw')
>> cheers MAJ
>> ----- Original Message ----- From: "Marcelo Iwata" <
>> marcelo011982 at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, January 13, 2010 1:12 PM
>> Subject: [Bioperl-l] Blast to Clustalw Format
>>
>>
>>  Hi..
>>> I have an simple Blast result, such as blastn.
>>> Is there an  scrip  to transform such result to Clustalw format in
>>> Bioperl
>>> ?(.aln)
>>>
>>> Thanx for any help.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>


From maj at fortinbras.us  Thu Jan 14 08:54:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 08:54:31 -0500
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com><C85EC8A05E884B328AFDAA055341E9E2@NewLife><1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
	<1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>
Message-ID: <1B8891488AA746F49BCAAB531FBE4D0B@NewLife>

Thanks Marcelo-- code snips always appreciated! MAJ
----- Original Message ----- 
From: "Marcelo Iwata" <marcelo011982 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 8:46 AM
Subject: Re: [Bioperl-l] Blast to Clustalw Format


> Sorry , the correct code is:
>
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
> use Bio::AlignIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
>                           -file   => '
> ../../fontes/exemplos/blat/teste2/output.blast ');
> my $aln;
> my $alnIO;
> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
> while ( my $result = $in->next_result ) {
>  ## $result is a Bio::Search::Result::ResultI compliant object
>  while ( my $hit = $result->next_hit ) {
>    ## $hit is a Bio::Search::Hit::HitI compliant object
>    while ( my $hsp = $hit->next_hsp ) {
>      ## $hsp is a Bio::Search::HSP::HSPI compliant object
>      $aln = $hsp->get_aln;
>      $alnIO->write_aln($aln);
>
>    }
>  }
> }
>
>
> On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata 
> <marcelo011982 at gmail.com>wrote:
>
>> Thanks Mark.
>> I think that most of you already know it.
>> But , i'll put it for new users:
>>
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>> use Bio::SearchIO;
>> use Bio::AlignIO;
>>
>> my $in = new Bio::SearchIO(-format => 'blast',
>>                            -file   => '
>> ../../fontes/exemplos/blat/teste2/output.blast ');
>> my $aln;
>> my $alnIO;
>> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
>> while ( my $result = $in->next_result ) {
>>   ## $result is a Bio::Search::Result::ResultI compliant object
>>   while ( my $hit = $result->next_hit ) {
>>     ## $hit is a Bio::Search::Hit::HitI compliant object
>>     while ( my $hsp = $hit->next_hsp ) {
>>       ## $hsp is a Bio::Search::HSP::HSPI compliant object
>>       $aln = $hsp->get_aln;
>>       $alnIO->write_aln($aln);
>>
>>
>>     }
>>   }
>> }
>>
>>
>> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>> Marcelo-
>>> Yes-- look at the code snip at
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>>> combined with the snip at
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>> (using -format => 'clustalw')
>>> cheers MAJ
>>> ----- Original Message ----- From: "Marcelo Iwata" <
>>> marcelo011982 at gmail.com>
>>> To: <bioperl-l at lists.open-bio.org>
>>> Sent: Wednesday, January 13, 2010 1:12 PM
>>> Subject: [Bioperl-l] Blast to Clustalw Format
>>>
>>>
>>>  Hi..
>>>> I have an simple Blast result, such as blastn.
>>>> Is there an  scrip  to transform such result to Clustalw format in
>>>> Bioperl
>>>> ?(.aln)
>>>>
>>>> Thanx for any help.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From sidd.basu at gmail.com  Thu Jan 14 14:15:04 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 13:15:04 -0600
Subject: [Bioperl-l] reading blast report
Message-ID: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>

Hi, 
I have a script that reads a tblastn report(13000 records) and loads in
a chado database(Bio::Chado::Schema module),  however the machine runs of memory. I am trying to figure 
out other than loading the database stuff 
if it the reading of SearchIO module could consume a lot of memory. So,
when i am reading a blast file and getting the result object ....

while (my $result = $searchio->next_result)

* Does the searchio object loads a huge chunk of file in the memory or
  for each iteration it only reads a part of the result.

* Does doing an index on blast report and then reading from it be much
  faster and why. And is there any way i could iterate through each
  record in the index,  will that be helpful.

-siddhartha


From jason at bioperl.org  Thu Jan 14 14:53:29 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 11:53:29 -0800
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
Message-ID: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>

What aspects of the report are you loading?  You might consider the  
blast report as tab-delimited (-m 8 format) if you only are interested  
in start/end positions and scores of ailgnments which is a simpler and  
reduced dataset that has lower memory footprint by the parser.

Searchio (default) -format => blast - you can try the BLAST -format =>  
blast_pull instead which lazy parses to create objects and will reduce  
memory consumption.

-jason
On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:

> Hi,
> I have a script that reads a tblastn report(13000 records) and loads  
> in
> a chado database(Bio::Chado::Schema module),  however the machine  
> runs of memory. I am trying to figure
> out other than loading the database stuff
> if it the reading of SearchIO module could consume a lot of memory.  
> So,
> when i am reading a blast file and getting the result object ....
>
> while (my $result = $searchio->next_result)
>
> * Does the searchio object loads a huge chunk of file in the memory or
>  for each iteration it only reads a part of the result.
>
> * Does doing an index on blast report and then reading from it be much
>  faster and why. And is there any way i could iterate through each
>  record in the index,  will that be helpful.
>
> -siddhartha
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From sidd.basu at gmail.com  Thu Jan 14 15:15:45 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 14:15:45 -0600
Subject: [Bioperl-l]  Re: reading blast report
In-Reply-To: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
Message-ID: <4b4f7b74.5744f10a.7087.4813@mx.google.com>

On Thu, 14 Jan 2010, Jason Stajich wrote:

> What aspects of the report are you loading?  You might consider the blast 
> report as tab-delimited (-m 8 format) if you only are interested in 
> start/end positions and scores of ailgnments which is a simpler and reduced 
> dataset that has lower memory footprint by the parser.

I think this would be a better approach i am mostly interested in
start/end/score data only.

>
> Searchio (default) -format => blast - you can try the BLAST -format => 
> blast_pull instead which lazy parses to create objects and will reduce 
> memory consumption.

It's another good option though. But just out of curosity,  so the
regular blast parser do load the entire file in the memory consider the
output consist of multiple Results concatenated together into a
single file. Could anybody clarify.

thanks, 
-siddhartha


>
> -jason
> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
>
> > Hi,
> > I have a script that reads a tblastn report(13000 records) and loads in
> > a chado database(Bio::Chado::Schema module),  however the machine runs of 
> > memory. I am trying to figure
> > out other than loading the database stuff
> > if it the reading of SearchIO module could consume a lot of memory. So,
> > when i am reading a blast file and getting the result object ....
> >
> > while (my $result = $searchio->next_result)
> >
> > * Does the searchio object loads a huge chunk of file in the memory or
> >  for each iteration it only reads a part of the result.
> >
> > * Does doing an index on blast report and then reading from it be much
> >  faster and why. And is there any way i could iterate through each
> >  record in the index,  will that be helpful.
> >
> > -siddhartha
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>


From jason at bioperl.org  Thu Jan 14 16:28:29 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 13:28:29 -0800
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
Message-ID: <CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>


On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
>
>> What aspects of the report are you loading?  You might consider the  
>> blast
>> report as tab-delimited (-m 8 format) if you only are interested in
>> start/end positions and scores of ailgnments which is a simpler and  
>> reduced
>> dataset that has lower memory footprint by the parser.
>
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
>
>>
>> Searchio (default) -format => blast - you can try the BLAST -format  
>> =>
>> blast_pull instead which lazy parses to create objects and will  
>> reduce
>> memory consumption.
>
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider  
> the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.
>
> thanks,
> -siddhartha

Each result is parsed (1 result per query) and all the hits and HSPs  
are parsed and brought into memory with the standard (non-pull)  
approach.
The SearchIO iterates at the level of result - that is why you call  
next_result which parses each one at a time.

>
>
>>
>> -jason
>> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
>>
>>> Hi,
>>> I have a script that reads a tblastn report(13000 records) and  
>>> loads in
>>> a chado database(Bio::Chado::Schema module),  however the machine  
>>> runs of
>>> memory. I am trying to figure
>>> out other than loading the database stuff
>>> if it the reading of SearchIO module could consume a lot of  
>>> memory. So,
>>> when i am reading a blast file and getting the result object ....
>>>
>>> while (my $result = $searchio->next_result)
>>>
>>> * Does the searchio object loads a huge chunk of file in the  
>>> memory or
>>> for each iteration it only reads a part of the result.
>>>
>>> * Does doing an index on blast report and then reading from it be  
>>> much
>>> faster and why. And is there any way i could iterate through each
>>> record in the index,  will that be helpful.
>>>
>>> -siddhartha
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From sidd.basu at gmail.com  Thu Jan 14 16:40:42 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 15:40:42 -0600
Subject: [Bioperl-l]  Re: reading blast report
In-Reply-To: <CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
	<CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>
Message-ID: <4b4f8f5d.5644f10a.2be2.47dc@mx.google.com>

Thanks jason for clarification.

On Thu, 14 Jan 2010, Jason Stajich wrote:

>
> On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:
>
> > On Thu, 14 Jan 2010, Jason Stajich wrote:
> >
> >> What aspects of the report are you loading?  You might consider the blast
> >> report as tab-delimited (-m 8 format) if you only are interested in
> >> start/end positions and scores of ailgnments which is a simpler and 
> >> reduced
> >> dataset that has lower memory footprint by the parser.
> >
> > I think this would be a better approach i am mostly interested in
> > start/end/score data only.
> >
> >>
> >> Searchio (default) -format => blast - you can try the BLAST -format =>
> >> blast_pull instead which lazy parses to create objects and will reduce
> >> memory consumption.
> >
> > It's another good option though. But just out of curosity,  so the
> > regular blast parser do load the entire file in the memory consider the
> > output consist of multiple Results concatenated together into a
> > single file. Could anybody clarify.
> >
> > thanks,
> > -siddhartha
>
> Each result is parsed (1 result per query) and all the hits and HSPs are 
> parsed and brought into memory with the standard (non-pull) approach.
> The SearchIO iterates at the level of result - that is why you call 
> next_result which parses each one at a time.
>
> >
> >
> >>
> >> -jason
> >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
> >>
> >>> Hi,
> >>> I have a script that reads a tblastn report(13000 records) and loads in
> >>> a chado database(Bio::Chado::Schema module),  however the machine runs 
> >>> of
> >>> memory. I am trying to figure
> >>> out other than loading the database stuff
> >>> if it the reading of SearchIO module could consume a lot of memory. So,
> >>> when i am reading a blast file and getting the result object ....
> >>>
> >>> while (my $result = $searchio->next_result)
> >>>
> >>> * Does the searchio object loads a huge chunk of file in the memory or
> >>> for each iteration it only reads a part of the result.
> >>>
> >>> * Does doing an index on blast report and then reading from it be much
> >>> faster and why. And is there any way i could iterate through each
> >>> record in the index,  will that be helpful.
> >>>
> >>> -siddhartha
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at gmail.com
> >> jason at bioperl.org
> >> http://fungalgenomes.org/
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>


From SMarkel at accelrys.com  Thu Jan 14 17:58:06 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Thu, 14 Jan 2010 14:58:06 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>

We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
from our customers.  Due to network irregularities (not sure what else
to call it) users see the getting of remote BLAST results as somewhat
random.  When results come back the hits are fine, but sometimes no
information comes back at all.  Retrying helps.

In looking at RemoteBlast.pm there are four "return -1" cases.

* $status eq 'ERROR'      (return on line 614)
* $line =~ /ERROR/I       (return on line 628)
* !$got_content           (return on line 648)
* !$response->is_success  (return on line 655)

In the case of no content we'd like to retry remote BLAST.  We're happy
to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
module, but we only want to retry in that case, not the other three.

What would happen if that third "return -1" changed to a different
return value?

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


From nickjd at gmail.com  Wed Jan 13 08:18:12 2010
From: nickjd at gmail.com (NickJD)
Date: Wed, 13 Jan 2010 05:18:12 -0800 (PST)
Subject: [Bioperl-l] Parsing PSI-BLAST results with SearchIO
Message-ID: <65554589-081b-4297-ab68-9ddfbd3d9944@c34g2000yqn.googlegroups.com>

I am trying to parse PSI-BLAST results using SearchIO and some very
basic code just to read the number of hits, number of hsps, etc.  I
have done 10 rounds on 1 input sequence and parsed it but it seems to
treat each round as a separate result, so round/iteration is always 1
and new_hits its always the total list not the ones that are new to
that round.  Does anyone have any experience of this?

Thanks,

Nick


From dsidote at waksman.rutgers.edu  Wed Jan 13 10:08:48 2010
From: dsidote at waksman.rutgers.edu (David J Sidote)
Date: Wed, 13 Jan 2010 10:08:48 -0500
Subject: [Bioperl-l] Bioinformatician position - Waksman Institute
Message-ID: <4b42af671001130708i703ecce0u47348484321714f@mail.gmail.com>

Bioinformatician ? Research Assistant Professor


The Waksman Institute of Microbiology located on the New Brunswick campus of
Rutgers University is seeking a highly motivated and talented bioinformatics
scientist for an Research Assistant Professor appointment.  The successful
candidate will analyze genome, transcriptome, and epigenome data generated
on the Life Sciences 454, Illumina, and AB SOLiD high-throughput sequencing
platforms. Excellent communication and teamwork skills are essential as the
successful candidate will work closely with individual research groups to
develop software to facilitate the visualization, quantification, and
interpretation of the data. The successful candidate will be expected to
contribute to the publication of scientific literature and to present at
seminars and conferences.


Qualifications:


-       PhD in molecular biology, genetics, bioinformatics, systems biology
or other related fields; candidates with a PhD in physics, mathematics, or
computer science with some working knowledge of biology and experience are
encouraged to apply.

-       Demonstrated scientific track record

-       Highly proficient in perl, python, or ruby programming, linux/unix
scripting, and SQL.

-       Experience with R is desirable but not required

-       Experience with high-throughput sequencing, microarrays, or other
high-throughput biological platforms

-       Excellent communication and organizational skills


How to Apply:


Please send a cover letter stating your current research interests, why you
are interested in this position, and how your skill set complements this
position along with a curriculum vitae, and the names and contact
information of three references to hr at waksman.rutgers.edu. Please include
"Bioinformatics Assistant Research Professor" in the subject line. Rutgers
is an equal opportunity employer.


For more information about this position please contact:

Dr. David Sidote (dsidote at waksman.rutgers.edu)


From albezg at gmail.com  Wed Jan 13 20:57:27 2010
From: albezg at gmail.com (albezg)
Date: Wed, 13 Jan 2010 20:57:27 -0500
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with
 negative PDB ranges
In-Reply-To: <49C405F0.5050100@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com>
Message-ID: <4B4E7A07.7070805@gmail.com>

Hi all,

I have a problem using AlignIO to read Pfam database:
ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
The database is in STOCKHOLM 1.0 format. AlignIO can read the alignment 
OK until the alignment PF00331.13. There it crashes with the following 
message:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: '1-344' is not an integer.

STACK: Error::throw
STACK: Bio::Root::Root::throw 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
STACK: Bio::Range::end 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
STACK: Bio::Annotation::Target::new 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
STACK: Bio::AlignIO::stockholm::next_aln 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
STACK: /home/albezg/scripts/pfam2fasta.pl:22
-----------------------------------------------------------

It appears this is caused by this entry:
#=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;

I don't care about residues in PDB, so I have just removed minus signs 
from the ranges. This seems to have fixed the crashing.

Is it a known problem? Is there a solution for it?

Thanks,
Alexandr


On 03/20/2009 05:09 PM, albezg wrote:
>
> I'm trying to change FASTA header(display_id) for a sequence in an
> alignment(SimpleAlign).
>
> There are no issues when I print it, however when I use AlignIO to write
> the alignment to a FASTA file, it does not work. Is this behavior intended?
>
> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>
> The error:
> ------------- EXCEPTION -------------
> MSG: No sequence with name [1/1-11]
> STACK Bio::SimpleAlign::displayname
> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
> STACK Bio::AlignIO::fasta::write_aln
> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
> STACK toplevel ./demo.pl:14
> -------------------------------------
>
> Alexandr


From mitch_skinner at berkeley.edu  Thu Jan 14 17:10:53 2010
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Thu, 14 Jan 2010 14:10:53 -0800
Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory
Message-ID: <4B4F966D.3030300@berkeley.edu>

Hi,

Some people haven't been getting all of the features in their GFF3 into 
JBrowse, and a nice test case that James Casbon posted to the list 
helped me track it down.

Here's an example of the behavior I was seeing with BioPerl 1.6.1 (using 
Devel::REPL):

==============
$ use Bio::DB::SeqFeature::Store

$ my $db = Bio::DB::SeqFeature::Store->new(-adaptor=>"memory", 
-dsn=>"casbon.gff3")
$Bio_DB_SeqFeature_Store_memory1 = 
Bio::DB::SeqFeature::Store::memory=HASH(0xa27ceec);

$ $db->features(-seq_id=>"CYP2C8")
$ARRAY1 = [
             Feature:src(41),
             region(CYP2C8),
             Feature:src(37),
             Feature:src(39),
             Feature:src(42),
             Feature:src(40),
             Feature:src(38)
           ];
==============

I expected to also see the features with IDs 43 and 44 (the gff3 file is 
attached).

I think there's a problem in the filter_by_location method.  If start 
and end parameters aren't passed to the method, it sets default start 
and end values that lead it to examine all of the bins in its index.  
But the end value that it creates is at the beginning of the last bin, 
and I think it should be at the end of the last bin instead.  The 
attached patch changes it to be at the end of the last bin.

Regards,
Mitch
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: casbon.gff3
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100114/8494aaa7/attachment-0006.pl>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bdsfsm-filter_by_location.patch
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100114/8494aaa7/attachment-0007.pl>

From jason at bioperl.org  Thu Jan 14 19:20:43 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 16:20:43 -0800
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <4B4E7A07.7070805@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
Message-ID: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>

Seems like improper data really -- "-1" is an improper coordinate as  
far as the parser is concerned. You may want to tell Pfam that there  
is possible error in the dumper since that was the only record that  
had this problem?

-jason
On Jan 13, 2010, at 5:57 PM, albezg wrote:

> Hi all,
>
> I have a problem using AlignIO to read Pfam database:
> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
> The database is in STOCKHOLM 1.0 format. AlignIO can read the  
> alignment OK until the alignment PF00331.13. There it crashes with  
> the following message:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: '1-344' is not an integer.
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/ 
> 5.10.0/Bio/Root/Root.pm:368
> STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/ 
> Range.pm:228
> STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/ 
> 5.10.0/Bio/Annotation/Target.pm:82
> STACK:  
> Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/ 
> albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
> GenericAlignHandler.pm:293
> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler / 
> home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
> GenericAlignHandler.pm:73
> STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/ 
> site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
> STACK: /home/albezg/scripts/pfam2fasta.pl:22
> -----------------------------------------------------------
>
> It appears this is caused by this entry:
> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>
> I don't care about residues in PDB, so I have just removed minus  
> signs from the ranges. This seems to have fixed the crashing.
>
> Is it a known problem? Is there a solution for it?
>
> Thanks,
> Alexandr
>
>
> On 03/20/2009 05:09 PM, albezg wrote:
>>
>> I'm trying to change FASTA header(display_id) for a sequence in an
>> alignment(SimpleAlign).
>>
>> There are no issues when I print it, however when I use AlignIO to  
>> write
>> the alignment to a FASTA file, it does not work. Is this behavior  
>> intended?
>>
>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>
>> The error:
>> ------------- EXCEPTION -------------
>> MSG: No sequence with name [1/1-11]
>> STACK Bio::SimpleAlign::displayname
>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>> STACK Bio::AlignIO::fasta::write_aln
>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>> STACK toplevel ./demo.pl:14
>> -------------------------------------
>>
>> Alexandr
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From maj at fortinbras.us  Thu Jan 14 21:00:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 21:00:31 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
Message-ID: <CD613D33411040F8921DE3098FD6DF41@NewLife>

How about returning 1, 2, 4 for the non-zero cases, with some
error constants set for convenience? MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 5:58 PM
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Thu Jan 14 19:42:31 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 18:42:31 -0600
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
Message-ID: <0B76CCA7-C37C-4E24-BBDF-C8FD805DBBF2@illinois.edu>


On Jan 14, 2010, at 2:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
> 
>> What aspects of the report are you loading?  You might consider the blast 
>> report as tab-delimited (-m 8 format) if you only are interested in 
>> start/end positions and scores of ailgnments which is a simpler and reduced 
>> dataset that has lower memory footprint by the parser.
> 
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
> 
>> Searchio (default) -format => blast - you can try the BLAST -format => 
>> blast_pull instead which lazy parses to create objects and will reduce 
>> memory consumption.
> 
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.

Yes, the original SearchIO parsers all load the data into objects.  This was based on the presumption that one wouldn't want very large BLAST reports, but this assumption probably isn't amenable today.  The pull parser is one aswer to that, in it pulls the data only upon request (creates them on the fly), so it should be more amenable to parsing very large BLAST reports.

> thanks, 
> -siddhartha
> 
>> -jason

chris


From cjfields at illinois.edu  Fri Jan 15 01:33:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Jan 2010 00:33:50 -0600
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
Message-ID: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>

Scott,

I think this is fine (to change the third condition and retry with a specific code).  The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance).

One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3.  Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.

chris

PS - BTW, nice to finally meet you at GMOD!

On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:

> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields1 at gmail.com  Fri Jan 15 01:35:35 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Fri, 15 Jan 2010 00:35:35 -0600
Subject: [Bioperl-l] filter_by_location in
	Bio::DB::SeqFeature::Store::memory
In-Reply-To: <4B4F966D.3030300@berkeley.edu>
References: <4B4F966D.3030300@berkeley.edu>
Message-ID: <992796AC-B85B-4555-88A1-36000C0A2002@gmail.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100115/b772ee67/attachment-0003.html>

From David.Messina at sbc.su.se  Fri Jan 15 10:17:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 15 Jan 2010 16:17:14 +0100
Subject: [Bioperl-l] getting/setting species names with Bio::Species
Message-ID: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>

Hi everybody,

I'm having a little trouble with names in Bio::Species objects.

According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method:

my $my_species_obj = Bio::Species->new();
$my_species_obj->species('Homo sapiens');

print $my_species_obj->species;     # 'Homo sapiens'


That works fine if I create the Bio::Species object myself.

But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back:

my $io = Bio::SeqIO->new('-format' => 'genbank',
                         '-file'   => 'hoxa2.gb');
my $seq_obj = $io->next_seq;
my $io_species_obj = $seq_obj->species;

print $io_species_obj->species;     # 'sapiens'


I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately.

Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object:

print $my_species_obj->binomial;    # 'Homosapiens'
print $io_species_obj->binomial;    # 'Homo sapiens'


I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way?

If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite.


Thanks,
Dave


From maj at fortinbras.us  Fri Jan 15 10:31:16 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 10:31:16 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
Message-ID: <C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>

I'm not that familiar with Bio::Species either, but this looks
like conflicting semantics betwen Bio::Species and Bio::SeqIO.
Bio::SeqIO sets the species accessor to the 'species' element of
the lineage array, I believe.
FWIW, I'd prefer "binomial" = "genus" . "species"
MAJ
----- Original Message ----- 
From: "Dave Messina" <David.Messina at sbc.su.se>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:17 AM
Subject: [Bioperl-l] getting/setting species names with Bio::Species


> Hi everybody,
>
> I'm having a little trouble with names in Bio::Species objects.
>
> According to the Bio::Species documentation, if I have a species name as a 
> string, like "Homo sapiens", I can get and set that using the species method:
>
> my $my_species_obj = Bio::Species->new();
> $my_species_obj->species('Homo sapiens');
>
> print $my_species_obj->species;     # 'Homo sapiens'
>
>
> That works fine if I create the Bio::Species object myself.
>
> But if I try to get that string back out from a BIo::Species object created by 
> SeqIO from a genbank file, I get just 'sapiens' back:
>
> my $io = Bio::SeqIO->new('-format' => 'genbank',
>                         '-file'   => 'hoxa2.gb');
> my $seq_obj = $io->next_seq;
> my $io_species_obj = $seq_obj->species;
>
> print $io_species_obj->species;     # 'sapiens'
>
>
> I think that happens because genbank records have more taxonomic info about 
> the species name, like the genus (and in fact the whole taxonomic 
> categorization: kingdom phylum order, etc). So the genus is stored separately.
>
> Poking around a bit more in Bio::Species, I turned up the method 'binomial', 
> which appears to do the right thing, returning genus and species in both 
> cases. Except, as you can see, the space is stripped out for my 
> species-name-is-just-a-string object:
>
> print $my_species_obj->binomial;    # 'Homosapiens'
> print $io_species_obj->binomial;    # 'Homo sapiens'
>
>
> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I 
> using it correctly above, or is there a better way?
>
> If not, this kinda looks like a bug to me. I've got a patch which works and 
> passes the BioPerl test suite.
>
>
> Thanks,
> Dave
>
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 10:24:06 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 10:24:06 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
	<E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
Message-ID: <F1C8FA379C5746FB8987C1D41905C3F3@NewLife>

True-- blast+ allows remote dbs. I just commited a patch that makes
this easy in StandAloneBlastPlus: specify '-remote => 1' in the
factory, and downstream command calls will take care of it-
MAJ

# ex...
use Bio::Tools::Run::StandAloneBlastPlus;
use Bio::Seq;

$ENV{BLASTPLUSDIR} = $where_it_is;
my $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
    -db_name => 'wgs',
    -remote => 1
    );
my $result = $fac->blastn(
    -query => 
Bio::Seq->new(-seq=>'ggcaacaaacctggtaaagaagacggcaacaagcctggtaaagaagatggcaacaagcct',
       -id=>"proteinA")
    );


1;

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Markel" <smarkel at accelrys.com>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 1:33 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From SMarkel at accelrys.com  Fri Jan 15 10:40:31 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Fri, 15 Jan 2010 07:40:31 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
	<E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>

Chris,

It was nice meeting you and Scott C., too.  And seeing Jason again.

If you and Mark

> How about returning 1, 2, 4 for the non-zero cases, with some
> error constants set for convenience? MAJ

are okay with adding more return values, that works best for us in
Pipeline Pilot.

I'll add a Bugzilla entry.

Scott


-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu] 
Sent: Thursday, 14 January 2010 10:34 PM
To: Scott Markel
Cc: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes

Scott,

I think this is fine (to change the third condition and retry with a specific code).  The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance).

One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3.  Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.

chris

PS - BTW, nice to finally meet you at GMOD!

On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:

> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Fri Jan 15 11:00:21 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Jan 2010 10:00:21 -0600
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
	<C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
Message-ID: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>

> FWIW, I'd prefer "binomial" = "genus" . "species"


That's the way Bio::Species is supposed to work, at least when it was refactored by Sendu.  But just a note: Bio::Species was considered deprecated (scheduled for the 1.7 release IIRC) for many very good reasons in favor of Bio::Taxon.  First and foremost among these is the fact we cannot consistently parse out the genus/species/strain/variant/etc for every organism in GenBank w/o knowing it's full lineage, which means including some taxonomic information.  And even then it's highly problematic.

We've had several heated discussions on list about how to handle this in a somewhat backwards-compatible way, and the main solution was to forego compatibility issues altogether and eventually deprecate Bio::Species altogether in favor of Bio::Taxon, a class that doesn't make the same assumptions.  Bio::Species, in the interim, is-a Bio::Taxon.  You'll note that a minimal Bio::DB::Taxonomy instance is constructed from the classification scheme in some instances, but if one had a proper DB link one could link to Entrez Taxonomy or a local flat file indexes DB and grab the info.  Bio::Taxon (correct me if I'm wrong on this Sendu, if you're out there) eschews various methods (species, etc) for simpler consistent ones based on Taxonomy, and doesn't force us to handle every exception to getting the genus/species out of a name.  That is left up to the user, at their peril.

For either one, if you are reproducing the fully qualified name, you probably should use something like node_name() for consistency.  Bio::Species also has scientific_name().  With a true Bio::Taxon one would need to be check this is performed on the species node.

chris

On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote:

> I'm not that familiar with Bio::Species either, but this looks
> like conflicting semantics betwen Bio::Species and Bio::SeqIO.
> Bio::SeqIO sets the species accessor to the 'species' element of
> the lineage array, I believe.
> FWIW, I'd prefer "binomial" = "genus" . "species"
> MAJ
> ----- Original Message ----- From: "Dave Messina" <David.Messina at sbc.su.se>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 15, 2010 10:17 AM
> Subject: [Bioperl-l] getting/setting species names with Bio::Species
> 
> 
>> Hi everybody,
>> 
>> I'm having a little trouble with names in Bio::Species objects.
>> 
>> According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method:
>> 
>> my $my_species_obj = Bio::Species->new();
>> $my_species_obj->species('Homo sapiens');
>> 
>> print $my_species_obj->species;     # 'Homo sapiens'
>> 
>> 
>> That works fine if I create the Bio::Species object myself.
>> 
>> But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back:
>> 
>> my $io = Bio::SeqIO->new('-format' => 'genbank',
>>                        '-file'   => 'hoxa2.gb');
>> my $seq_obj = $io->next_seq;
>> my $io_species_obj = $seq_obj->species;
>> 
>> print $io_species_obj->species;     # 'sapiens'
>> 
>> 
>> I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately.
>> 
>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object:
>> 
>> print $my_species_obj->binomial;    # 'Homosapiens'
>> print $io_species_obj->binomial;    # 'Homo sapiens'
>> 
>> 
>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way?
>> 
>> If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite.
>> 
>> 
>> Thanks,
>> Dave
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From SMarkel at accelrys.com  Fri Jan 15 11:10:34 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Fri, 15 Jan 2010 08:10:34 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <FE85CD2526044E8797D5A1A248AF6866@NewLife>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net><E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
	<5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
	<FE85CD2526044E8797D5A1A248AF6866@NewLife>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B30A7@EXCH1-COLO.accelrys.net>

Mark,

Thank you.

Scott


-----Original Message-----
From: Mark A. Jensen [mailto:maj at fortinbras.us] 
Sent: Friday, 15 January 2010 8:10 AM
To: Scott Markel; Chris Fields
Cc: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes

can do Scott-- cheers MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:40 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Chris,
>
> It was nice meeting you and Scott C., too.  And seeing Jason again.
>
> If you and Mark
>
>> How about returning 1, 2, 4 for the non-zero cases, with some
>> error constants set for convenience? MAJ
>
> are okay with adding more return values, that works best for us in
> Pipeline Pilot.
>
> I'll add a Bugzilla entry.
>
> Scott
>
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thursday, 14 January 2010 10:34 PM
> To: Scott Markel
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
>
> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 11:09:38 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 11:09:38 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net><E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
	<5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
Message-ID: <FE85CD2526044E8797D5A1A248AF6866@NewLife>

can do Scott-- cheers MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:40 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Chris,
>
> It was nice meeting you and Scott C., too.  And seeing Jason again.
>
> If you and Mark
>
>> How about returning 1, 2, 4 for the non-zero cases, with some
>> error constants set for convenience? MAJ
>
> are okay with adding more return values, that works best for us in
> Pipeline Pilot.
>
> I'll add a Bugzilla entry.
>
> Scott
>
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thursday, 14 January 2010 10:34 PM
> To: Scott Markel
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
>
> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 11:10:02 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 11:10:02 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se><C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
	<16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>
Message-ID: <C4C0A0697FCE4CFD897AD58FA7FD58AA@NewLife>

excellent summary--thanks!!
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 11:00 AM
Subject: Re: [Bioperl-l] getting/setting species names with Bio::Species


>> FWIW, I'd prefer "binomial" = "genus" . "species"
>
>
> That's the way Bio::Species is supposed to work, at least when it was 
> refactored by Sendu.  But just a note: Bio::Species was considered deprecated 
> (scheduled for the 1.7 release IIRC) for many very good reasons in favor of 
> Bio::Taxon.  First and foremost among these is the fact we cannot consistently 
> parse out the genus/species/strain/variant/etc for every organism in GenBank 
> w/o knowing it's full lineage, which means including some taxonomic 
> information.  And even then it's highly problematic.
>
> We've had several heated discussions on list about how to handle this in a 
> somewhat backwards-compatible way, and the main solution was to forego 
> compatibility issues altogether and eventually deprecate Bio::Species 
> altogether in favor of Bio::Taxon, a class that doesn't make the same 
> assumptions.  Bio::Species, in the interim, is-a Bio::Taxon.  You'll note that 
> a minimal Bio::DB::Taxonomy instance is constructed from the classification 
> scheme in some instances, but if one had a proper DB link one could link to 
> Entrez Taxonomy or a local flat file indexes DB and grab the info.  Bio::Taxon 
> (correct me if I'm wrong on this Sendu, if you're out there) eschews various 
> methods (species, etc) for simpler consistent ones based on Taxonomy, and 
> doesn't force us to handle every exception to getting the genus/species out of 
> a name.  That is left up to the user, at their peril.
>
> For either one, if you are reproducing the fully qualified name, you probably 
> should use something like node_name() for consistency.  Bio::Species also has 
> scientific_name().  With a true Bio::Taxon one would need to be check this is 
> performed on the species node.
>
> chris
>
> On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote:
>
>> I'm not that familiar with Bio::Species either, but this looks
>> like conflicting semantics betwen Bio::Species and Bio::SeqIO.
>> Bio::SeqIO sets the species accessor to the 'species' element of
>> the lineage array, I believe.
>> FWIW, I'd prefer "binomial" = "genus" . "species"
>> MAJ
>> ----- Original Message ----- From: "Dave Messina" <David.Messina at sbc.su.se>
>> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 15, 2010 10:17 AM
>> Subject: [Bioperl-l] getting/setting species names with Bio::Species
>>
>>
>>> Hi everybody,
>>>
>>> I'm having a little trouble with names in Bio::Species objects.
>>>
>>> According to the Bio::Species documentation, if I have a species name as a 
>>> string, like "Homo sapiens", I can get and set that using the species 
>>> method:
>>>
>>> my $my_species_obj = Bio::Species->new();
>>> $my_species_obj->species('Homo sapiens');
>>>
>>> print $my_species_obj->species;     # 'Homo sapiens'
>>>
>>>
>>> That works fine if I create the Bio::Species object myself.
>>>
>>> But if I try to get that string back out from a BIo::Species object created 
>>> by SeqIO from a genbank file, I get just 'sapiens' back:
>>>
>>> my $io = Bio::SeqIO->new('-format' => 'genbank',
>>>                        '-file'   => 'hoxa2.gb');
>>> my $seq_obj = $io->next_seq;
>>> my $io_species_obj = $seq_obj->species;
>>>
>>> print $io_species_obj->species;     # 'sapiens'
>>>
>>>
>>> I think that happens because genbank records have more taxonomic info about 
>>> the species name, like the genus (and in fact the whole taxonomic 
>>> categorization: kingdom phylum order, etc). So the genus is stored 
>>> separately.
>>>
>>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', 
>>> which appears to do the right thing, returning genus and species in both 
>>> cases. Except, as you can see, the space is stripped out for my 
>>> species-name-is-just-a-string object:
>>>
>>> print $my_species_obj->binomial;    # 'Homosapiens'
>>> print $io_species_obj->binomial;    # 'Homo sapiens'
>>>
>>>
>>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I 
>>> using it correctly above, or is there a better way?
>>>
>>> If not, this kinda looks like a bug to me. I've got a patch which works and 
>>> passes the BioPerl test suite.
>>>
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hlapp at drycafe.net  Fri Jan 15 12:04:43 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Fri, 15 Jan 2010 12:04:43 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
Message-ID: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>


On Jan 15, 2010, at 10:17 AM, Dave Messina wrote:

> According to the Bio::Species documentation, if I have a species  
> name as a string, like "Homo sapiens", I can get and set that using  
> the species method:
>
> my $my_species_obj = Bio::Species->new();
> $my_species_obj->species('Homo sapiens');


If that's really what the documentation says, it's wrong. It is the  
binomial() method that does this (as getter and setter).

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From David.Messina at sbc.su.se  Fri Jan 15 13:37:17 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 15 Jan 2010 19:37:17 +0100
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
	<2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>
Message-ID: <24798E45-CF24-47D9-AB39-E66C35A5FA8B@sbc.su.se>

Thanks guys.

Well, looks like I ignored the deprecation warnings at my own peril. :)

I'll reimplement my code using Bio::Taxon directly instead. I made a little test using the node_name() method as Chris suggested, and it seems to do the trick nicely.


> If that's really what the documentation says, it's wrong.

I'm afraid so. In the POD
>  Title   : species
>  Usage   : $self->species( $species );
>            $species = $self->species();
>  Function: Get or set the scientific species name.
>  Example : $self->species('Homo sapiens');
>  Returns : Scientific species name as string
>  Args    : Scientific species name as string

and the HOWTO 
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object
> # legible and long
> my $species_object = $seq_object->species;
> my $species_string = $species_object->species;
> 
> # Perlish
> my $species_string = $seq_object->species->species;
> # either way, $species_string is "Homo sapiens"


Unless there's objection, I'll fix both of those.


> It is the binomial() method that does this (as getter and setter).

Great, thanks for the clarification, Hilmar.


From bhakti.dwivedi at gmail.com  Sun Jan 17 11:02:47 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sun, 17 Jan 2010 11:02:47 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
Message-ID: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>

Hi

Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
&& hit1 -> query1)  from a blast table report?

Thanks

BD


From cjfields at illinois.edu  Sun Jan 17 12:45:08 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 17 Jan 2010 11:45:08 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <4FC546A8-079F-4A17-AB96-D4A0060904D6@illinois.edu>

It's probably not best to use BioPerl directly for this.  Have you tried OrthoMCL, or InParanoid? 

chris

On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote:

> Hi
> 
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?
> 
> Thanks
> 
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Sun Jan 17 16:03:24 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 17 Jan 2010 16:03:24 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <B602C24552CF42C58F80F3883198121C@NewLife>

re Chris's answer, check out this archived post:
http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
cheers MAJ
----- Original Message ----- 
From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 17, 2010 11:02 AM
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?


> Hi
> 
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?
> 
> Thanks
> 
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bhakti.dwivedi at gmail.com  Sun Jan 17 16:10:03 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sun, 17 Jan 2010 16:10:03 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <B602C24552CF42C58F80F3883198121C@NewLife>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<B602C24552CF42C58F80F3883198121C@NewLife>
Message-ID: <b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>

Thank you!


On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> re Chris's answer, check out this archived post:
> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
> cheers MAJ
> ----- Original Message ----- From: "Bhakti Dwivedi" <
> bhakti.dwivedi at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Sunday, January 17, 2010 11:02 AM
> Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
>
>
>  Hi
>>
>> Is there a Bio-perl module to parse the reciprocal best hits (query1->
>> hit1
>> && hit1 -> query1)  from a blast table report?
>>
>> Thanks
>>
>> BD
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>


From cjfields at illinois.edu  Sun Jan 17 17:00:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 17 Jan 2010 16:00:02 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<B602C24552CF42C58F80F3883198121C@NewLife>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
Message-ID: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>

OrthoMCL has updated to v2 and no longer uses BioPerl, just plain perl.  Database is available here:

http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi

Package (you'll need a few other things to get it working):

http://orthomcl.org/common/downloads/software/

chris

On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:

> Thank you!
> 
> 
> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> 
>> re Chris's answer, check out this archived post:
>> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
>> cheers MAJ
>> ----- Original Message ----- From: "Bhakti Dwivedi" <
>> bhakti.dwivedi at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Sunday, January 17, 2010 11:02 AM
>> Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
>> 
>> 
>> Hi
>>> 
>>> Is there a Bio-perl module to parse the reciprocal best hits (query1->
>>> hit1
>>> && hit1 -> query1)  from a blast table report?
>>> 
>>> Thanks
>>> 
>>> BD
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From tristan.lefebure at gmail.com  Sun Jan 17 18:12:56 2010
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Sun, 17 Jan 2010 18:12:56 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
	<392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
Message-ID: <201001171812.56238.tristan.lefebure@gmail.com>

The transition to orthoMCL v2 being a bit painful (you need 
a MySQL database), I recently switched directly to MCL and 
the accompanying mclblastline and co programs. Modular, 
simple and very fast. Following some simulations, It gives 
better results with incomplete genomes than orthoMCL v1.x 
...

http://micans.org/mcl/

--Tristan

On Sunday 17 January 2010 17:00:02 Chris Fields wrote:
> OrthoMCL has updated to v2 and no longer uses BioPerl,
>  just plain perl.  Database is available here:
> 
> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi
> 
> Package (you'll need a few other things to get it
>  working):
> 
> http://orthomcl.org/common/downloads/software/
> 
> chris
> 
> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:
> > Thank you!
> >
> > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen 
<maj at fortinbras.us> wrote:
> >> re Chris's answer, check out this archived post:
> >> http://bioperl.org/pipermail/bioperl-l/2008-March/0273
> >>57.html cheers MAJ
> >> ----- Original Message ----- From: "Bhakti Dwivedi" <
> >> bhakti.dwivedi at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Sunday, January 17, 2010 11:02 AM
> >> Subject: [Bioperl-l] Reciprocal best hits using
> >> Bioperl?
> >>
> >>
> >> Hi
> >>
> >>> Is there a Bio-perl module to parse the reciprocal
> >>> best hits (query1-> hit1
> >>> && hit1 -> query1)  from a blast table report?
> >>>
> >>> Thanks
> >>>
> >>> BD
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason at bioperl.org  Sun Jan 17 18:59:05 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 17 Jan 2010 15:59:05 -0800
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <201001171812.56238.tristan.lefebure@gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
	<392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
	<201001171812.56238.tristan.lefebure@gmail.com>
Message-ID: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>

yes - but mcl alone is something slightly different in that it doesn't  
correct for inparalogs, but for incomplete genomes this is probably  
okay.

orthomcl2 does correct the major memory hog problem and efficiencies  
in the parsing in the previous version by relying on the db for the  
indexing and looking of the reciprocal hits.

-jason
On Jan 17, 2010, at 3:12 PM, Tristan Lefebure wrote:

> The transition to orthoMCL v2 being a bit painful (you need
> a MySQL database), I recently switched directly to MCL and
> the accompanying mclblastline and co programs. Modular,
> simple and very fast. Following some simulations, It gives
> better results with incomplete genomes than orthoMCL v1.x
> ...
>
> http://micans.org/mcl/
>
> --Tristan
>
> On Sunday 17 January 2010 17:00:02 Chris Fields wrote:
>> OrthoMCL has updated to v2 and no longer uses BioPerl,
>> just plain perl.  Database is available here:
>>
>> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi
>>
>> Package (you'll need a few other things to get it
>> working):
>>
>> http://orthomcl.org/common/downloads/software/
>>
>> chris
>>
>> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:
>>> Thank you!
>>>
>>> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen
> <maj at fortinbras.us> wrote:
>>>> re Chris's answer, check out this archived post:
>>>> http://bioperl.org/pipermail/bioperl-l/2008-March/0273
>>>> 57.html cheers MAJ
>>>> ----- Original Message ----- From: "Bhakti Dwivedi" <
>>>> bhakti.dwivedi at gmail.com>
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Sunday, January 17, 2010 11:02 AM
>>>> Subject: [Bioperl-l] Reciprocal best hits using
>>>> Bioperl?
>>>>
>>>>
>>>> Hi
>>>>
>>>>> Is there a Bio-perl module to parse the reciprocal
>>>>> best hits (query1-> hit1
>>>>> && hit1 -> query1)  from a blast table report?
>>>>>
>>>>> Thanks
>>>>>
>>>>> BD
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From tristan.lefebure at gmail.com  Sun Jan 17 20:36:38 2010
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Sun, 17 Jan 2010 20:36:38 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
Message-ID: <201001172036.39032.tristan.lefebure@gmail.com>

On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
> yes - but mcl alone is something slightly different in
>  that it doesn't   correct for inparalogs, but for
>  incomplete genomes this is probably okay.

interestingly, my experience with not too divergent 
bacterial genomes (same genera) does not support the 
normalization used in the orthoMCL (which, as far as I 
understand, is a standardization of the -Log10(evalue) per 
taxa combination, including a taxa with itself). MCL, which 
does not do any normalization (just -Log10(evalue)) gives 
about the same number of false negative (i.e. missed 
orthologs), but a lot less false positive (false orthologs). 
In other words, you get many fake singletons. I don't known 
exactly if the problem lies in the normalization process or 
the fact that orthoMCLv1.x is using a very old version of 
MCL. What I do known is that many false positive are made of 
short or incomplete proteins that are very common in draft 
genomes and automatic annotations... Things might be 
completely different with more divergent and globally longer 
proteins. Testing orthoMCLv2 on the same data set would 
probably give the answer.

--Tristan


From robert.bradbury at gmail.com  Mon Jan 18 05:20:33 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Mon, 18 Jan 2010 05:20:33 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <201001172036.39032.tristan.lefebure@gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
Message-ID: <deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>

My comment might be that the problem with OrthoMCL is that it is
primarily lower organisms.  The problem with Ensembl (and some other
databases) is that it is primarliy higher organisms (though they do
include Drosophila, C. elegans and Yeast).

The problem arises when one wants to cross those boundaries.  For
example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
tRNAs, and the fundamental biochemistry (EC) proteins are homologous
all the way from the most ancient bacteria through H. sapiens.  The
only way to play in the mixed arena of prokaryotes and eukaryotes
involving fundamental vectors in evolution is to either construct ones
own databases (which presumably means getting involved with MySQL, and
probably spending some $$$ on hardware) or to develop some BioPerl
modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
using some part of the cloud.  This problem isn't going to get smaller
its only going to get larger, now that the cost of sequencing
(pseudo-resequencing) a vertebrate genome is starting to come in under
$10,000 and people are starting to seriously talk about 10,000
vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
people are going to undertake very soon.

Robert


On 1/17/10, Tristan Lefebure <tristan.lefebure at gmail.com> wrote:
> On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
>> yes - but mcl alone is something slightly different in
>>  that it doesn't   correct for inparalogs, but for
>>  incomplete genomes this is probably okay.
>
> interestingly, my experience with not too divergent
> bacterial genomes (same genera) does not support the
> normalization used in the orthoMCL (which, as far as I
> understand, is a standardization of the -Log10(evalue) per
> taxa combination, including a taxa with itself). MCL, which
> does not do any normalization (just -Log10(evalue)) gives
> about the same number of false negative (i.e. missed
> orthologs), but a lot less false positive (false orthologs).
> In other words, you get many fake singletons. I don't known
> exactly if the problem lies in the normalization process or
> the fact that orthoMCLv1.x is using a very old version of
> MCL. What I do known is that many false positive are made of
> short or incomplete proteins that are very common in draft
> genomes and automatic annotations... Things might be
> completely different with more divergent and globally longer
> proteins. Testing orthoMCLv2 on the same data set would
> probably give the answer.
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ghhu at sibs.ac.cn  Sun Jan 17 21:34:23 2010
From: ghhu at sibs.ac.cn (Guohong Hu)
Date: Mon, 18 Jan 2010 10:34:23 +0800
Subject: [Bioperl-l] Bioperl 1.6
Message-ID: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>

Hi there,

 
I was trying to install BioPerl in windows using ppm, by following the
instruction in
"http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
the repositories, and did the search of Bioperl packages. The latest version
available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
install it, a number of prerequisite modules were being installed too, which
include Bioperl 1.4. Then an error message showed up during installation:

 
"ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
BioPerl has already installed a file that package bioperl wants to install."

 
It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
wanted to install again. I don't know why bioperl 1.4 was one of the
prerequisites for 1.6.1. If I just install 1.4, it will be installed without
errors. But I need a newer version, because some modules (like

Bio::Tools::HMM) is not included in 1.4.

 
I saw on internet that somebody had the same problem when he was trying to
install BioPerl 1.5, but I didn't find the solution.

 
Anybody has a clue on that? Thank you for your time.

 
GH

 
From cjfields at illinois.edu  Mon Jan 18 10:30:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 09:30:20 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
Message-ID: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>

Guohong, 

1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first.  Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused).  Just curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I can physically remove it to prevent this from happening.

chris

On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:

> Hi there,
> 
> 
> 
> I was trying to install BioPerl in windows using ppm, by following the
> instruction in
> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
> the repositories, and did the search of Bioperl packages. The latest version
> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
> install it, a number of prerequisite modules were being installed too, which
> include Bioperl 1.4. Then an error message showed up during installation:
> 
> 
> 
> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
> BioPerl has already installed a file that package bioperl wants to install."
> 
> 
> 
> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
> wanted to install again. I don't know why bioperl 1.4 was one of the
> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
> errors. But I need a newer version, because some modules (like
> 
> Bio::Tools::HMM) is not included in 1.4.
> 
> 
> 
> I saw on internet that somebody had the same problem when he was trying to
> install BioPerl 1.5, but I didn't find the solution.
> 
> 
> 
> Anybody has a clue on that? Thank you for your time.
> 
> 
> 
> GH
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 18 11:12:08 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 10:12:08 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
	<deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
Message-ID: <B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>

(my small rant on this)

On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote:

> My comment might be that the problem with OrthoMCL is that it is
> primarily lower organisms.  The problem with Ensembl (and some other
> databases) is that it is primarliy higher organisms (though they do
> include Drosophila, C. elegans and Yeast).

OrthoMCL v2 handles both lower and higher organism; I've used it for both, with decent success.  Most other ortholog tools do as well (if I'm not mistaken, ensembl also uses MCL under the hood, unless that's changed).  I don't believe one should be completely bound to one toolset, particularly in this case (there are lots of nice ortholog clustering tools using various moeans of comparison out there), but I do think OrthoMCL is very good as an initial pass.  If anything, I would like a set of (possibly bioperl-based, definitely DB-based) modules that can deal with this information.

The more imperative issue in my opinion is that one is prisoner to the gene models for those specific organisms of interest, and this may vary widely depending on the source of those gene models (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc).  For instance, if gene models are poorly curated or rarely updated, the comparisons may be significantly flawed.  Some of these issues may also be (somewhat) alleviated once more transcriptome data is available that helps clear up gene model ambiguities, but that won't be true for all organisms, at least initially.

Note this isn't meant as a slam on any specific DBs or MODs in general, the problem is one born of the fact that there isn't a single, centralized, trusted, consistently updated source for this data, specifically something that will handle moderated third-party annotation.  That's a very difficult problem to solve effectively.  Some of these very issues crept up at the GMOD conference, and there appears to be consensus that a real attempt is needed to address this.  

I don't know, maybe it's just unicorns and rainbows.  Personally I do think the situation will improve, as there seems to be great demand for it, but it requires time, resources, manpower, money, cat herding, etc.

> The problem arises when one wants to cross those boundaries.  For
> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
> tRNAs, and the fundamental biochemistry (EC) proteins are homologous
> all the way from the most ancient bacteria through H. sapiens.  The
> only way to play in the mixed arena of prokaryotes and eukaryotes
> involving fundamental vectors in evolution is to either construct ones
> own databases (which presumably means getting involved with MySQL, and
> probably spending some $$$ on hardware) or to develop some BioPerl
> modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
> using some part of the cloud.  This problem isn't going to get smaller
> its only going to get larger, now that the cost of sequencing
> (pseudo-resequencing) a vertebrate genome is starting to come in under
> $10,000 and people are starting to seriously talk about 10,000
> vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
> people are going to undertake very soon.
> 
> Robert

They're already undertaking it now using a broad range of organisms, in and out of the cloud.  In most cases one can amend a prior recip. comparative analysis with new data fairly easily, if one takes care to do so early on (i.e. set up the BLAST databases with a specified defined size for comparative stats between separate analyses).  OrthoMCL v2 describes a procedure to do this, and I believe others have similar methodology.  

I could also see possible ways one can further optimize this, for instance in cases where two very closely-related organisms are compared, where translated seqs are 100% identical, etc.  IIRC, the OrthoMCL DB site already has a way to upload custom sets of protein data for mapping to (already pre-run) clusters.  Just the fact that the tools are available as OS, they're semi-automated, and can be generically applied to data of personal interest is a great boon.  Not sure I see the downside of that, and I'm pretty confident the scalability issues will be addressed in some way.

chris


From maj at fortinbras.us  Mon Jan 18 11:33:12 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 11:33:12 -0500
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
Message-ID: <6093E45F17B543438AC02E6C626439E1@NewLife>

this issue's come up before, see this thread
http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Guohong Hu" <ghhu at sibs.ac.cn>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 10:30 AM
Subject: Re: [Bioperl-l] Bioperl 1.6


> Guohong,
>
> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed 
> first.  Make sure the repos are set according to the Windows installation 
> instructions on the BioPerl wiki:
>
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> IIRC the actual order of the PPM repository can be critical (PPM pulls based 
> on highest version, first repo, but sometimes it gets confused).  Just curious 
> but where is the v 1.4 PPM located?  If it is local to our PPM repo I can 
> physically remove it to prevent this from happening.
>
> chris
>
> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>
>> Hi there,
>>
>>
>>
>> I was trying to install BioPerl in windows using ppm, by following the
>> instruction in
>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>> the repositories, and did the search of Bioperl packages. The latest version
>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>> install it, a number of prerequisite modules were being installed too, which
>> include Bioperl 1.4. Then an error message showed up during installation:
>>
>>
>>
>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>> BioPerl has already installed a file that package bioperl wants to install."
>>
>>
>>
>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>> wanted to install again. I don't know why bioperl 1.4 was one of the
>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>> errors. But I need a newer version, because some modules (like
>>
>> Bio::Tools::HMM) is not included in 1.4.
>>
>>
>>
>> I saw on internet that somebody had the same problem when he was trying to
>> install BioPerl 1.5, but I didn't find the solution.
>>
>>
>>
>> Anybody has a clue on that? Thank you for your time.
>>
>>
>>
>> GH
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jan 18 12:18:34 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 11:18:34 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <6093E45F17B543438AC02E6C626439E1@NewLife>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<6093E45F17B543438AC02E6C626439E1@NewLife>
Message-ID: <E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>

Mark,

Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this?  Regardless, it's problematic for me to test this out directly, at least for the next few days.  Maybe someone could try it?

Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this).

chris

On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote:

> this issue's come up before, see this thread
> http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
> MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Guohong Hu" <ghhu at sibs.ac.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Monday, January 18, 2010 10:30 AM
> Subject: Re: [Bioperl-l] Bioperl 1.6
> 
> 
>> Guohong,
>> 
>> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first.  Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki:
>> 
>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>> 
>> IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused).  Just curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I can physically remove it to prevent this from happening.
>> 
>> chris
>> 
>> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>> 
>>> Hi there,
>>> 
>>> 
>>> 
>>> I was trying to install BioPerl in windows using ppm, by following the
>>> instruction in
>>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>>> the repositories, and did the search of Bioperl packages. The latest version
>>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>>> install it, a number of prerequisite modules were being installed too, which
>>> include Bioperl 1.4. Then an error message showed up during installation:
>>> 
>>> 
>>> 
>>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>>> BioPerl has already installed a file that package bioperl wants to install."
>>> 
>>> 
>>> 
>>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>>> wanted to install again. I don't know why bioperl 1.4 was one of the
>>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>>> errors. But I need a newer version, because some modules (like
>>> 
>>> Bio::Tools::HMM) is not included in 1.4.
>>> 
>>> 
>>> 
>>> I saw on internet that somebody had the same problem when he was trying to
>>> install BioPerl 1.5, but I didn't find the solution.
>>> 
>>> 
>>> 
>>> Anybody has a clue on that? Thank you for your time.
>>> 
>>> 
>>> 
>>> GH
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From clarsen at vecna.com  Mon Jan 18 12:42:13 2010
From: clarsen at vecna.com (Chris Larsen)
Date: Mon, 18 Jan 2010 12:42:13 -0500
Subject: [Bioperl-l] Reciprocal best blast hits using BioPerl?
In-Reply-To: <B7BD7C2F-4A70-49B5-9074-7EBAF5094AE9@vecna.com>
References: <B0218AEF-3CEB-4E06-B8DF-7B302D024797@vecna.com>
	<B7BD7C2F-4A70-49B5-9074-7EBAF5094AE9@vecna.com>
Message-ID: <ED172CDA-A8C3-4488-9648-1FBA7036BAD6@vecna.com>

Bhakti, (and Chris, Mark)--

Yes there is some perl available to parse reciprocal best blast hits.

Mark's referenced / archived post was mine, we were looking to do what  
you wanted. Here we proceed with the thread.

We ended up implementing OrthoMCL 1.4 as Chris F pointed to, and then  
made a simple perl parser that would take the raw OrthoMCL output, do  
splits, and spit out a delimited table of all the orthologs in a  
group, for say Mycobacterium Genus, so you could stuff it into DBLoader.

The link to the script, SOP, and method is at:
http://www.biohealthbase.org/brcDocs/documents/BHB_ORTHOLOG_SOP.pdf

Giving e.g.:

Francisella 1 110321310
Francisella 1 110321361
Francisella 1 56707275
Francisella 1 56707366
Francisella 1 56707462

Five members of Ortholog Group 1, with just their gi number.  And you  
can see the results of that parsing, supported by a database, being  
used to load BioHealthbase with all the reciprocal best blast hits  
plus other OrthoMCL parsing, for mycobacterial PolA at:

http://www.biohealthbase.org/brc/details.do?locus=MAV_3155&decorator=mycobacterium

See? Pretty? We were just interested in making ortholog groups on the  
bais of paralog-conscious reciprocal blast stuff. Like you. This  
package and doc I've made does what you want I think, as long as you  
stay in prokaryotes. But--careful...garbage in, garbage out. We  
started with clean Genuses. (. o O Genii?). You'll get more junky HUGE  
and TINY ortholog groups if you put in different Orders of microbes.  
Its taxa sensitive. OrthoMCL author David Roos is great at it though  
and designed it in mind of higher unicellular euks too...comb the docs  
for that; sorry I was doing bacterial work at the time and cant guide  
you if thats what you want.. If you end up installing OrthMCL 1.4, you  
can pipe the output to this method and get out useable stuff.

Hope it works for you.

Cheers,

Chris L

-- 

Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525


From maj at fortinbras.us  Mon Jan 18 14:37:43 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 14:37:43 -0500
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<6093E45F17B543438AC02E6C626439E1@NewLife>
	<E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>
Message-ID: <61F331117B7C4E2282684FA240B9710F@NewLife>

I will play around with it-- in the meantime, Guohong, please look at the 
following
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
where there is a workaround for this issue, using the ppm-shell--
cheers,
Mark
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Guohong Hu" <ghhu at sibs.ac.cn>; <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 12:18 PM
Subject: Re: [Bioperl-l] Bioperl 1.6


Mark,

Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing 
this?  Regardless, it's problematic for me to test this out directly, at least 
for the next few days.  Maybe someone could try it?

Also, there is the Strawberry Perl alternative, which uses CPAN (I think 
ActiveState also supports this).

chris

On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote:

> this issue's come up before, see this thread
> http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
> MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Guohong Hu" <ghhu at sibs.ac.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Monday, January 18, 2010 10:30 AM
> Subject: Re: [Bioperl-l] Bioperl 1.6
>
>
>> Guohong,
>>
>> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed 
>> first.  Make sure the repos are set according to the Windows installation 
>> instructions on the BioPerl wiki:
>>
>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>>
>> IIRC the actual order of the PPM repository can be critical (PPM pulls based 
>> on highest version, first repo, but sometimes it gets confused).  Just 
>> curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I 
>> can physically remove it to prevent this from happening.
>>
>> chris
>>
>> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>>
>>> Hi there,
>>>
>>>
>>>
>>> I was trying to install BioPerl in windows using ppm, by following the
>>> instruction in
>>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>>> the repositories, and did the search of Bioperl packages. The latest version
>>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>>> install it, a number of prerequisite modules were being installed too, which
>>> include Bioperl 1.4. Then an error message showed up during installation:
>>>
>>>
>>>
>>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>>> BioPerl has already installed a file that package bioperl wants to install."
>>>
>>>
>>>
>>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>>> wanted to install again. I don't know why bioperl 1.4 was one of the
>>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>>> errors. But I need a newer version, because some modules (like
>>>
>>> Bio::Tools::HMM) is not included in 1.4.
>>>
>>>
>>>
>>> I saw on internet that somebody had the same problem when he was trying to
>>> install BioPerl 1.5, but I didn't find the solution.
>>>
>>>
>>>
>>> Anybody has a clue on that? Thank you for your time.
>>>
>>>
>>>
>>> GH
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Mon Jan 18 15:24:33 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Jan 2010 12:24:33 -0800
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
	<deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
	<B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>
Message-ID: <68DF70A5-63A6-428D-A7F1-7B3D01528375@bioperl.org>


On Jan 18, 2010, at 8:12 AM, Chris Fields wrote:

> (my small rant on this)
>
> On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote:
>
>> My comment might be that the problem with OrthoMCL is that it is
>> primarily lower organisms.  The problem with Ensembl (and some other
>> databases) is that it is primarliy higher organisms (though they do
>> include Drosophila, C. elegans and Yeast).
>
> OrthoMCL v2 handles both lower and higher organism; I've used it for  
> both, with decent success.  Most other ortholog tools do as well (if  
> I'm not mistaken, ensembl also uses MCL under the hood, unless  
> that's changed).  I don't believe one should be completely bound to  
> one toolset, particularly in this case (there are lots of nice  
> ortholog clustering tools using various moeans of comparison out  
> there), but I do think OrthoMCL is very good as an initial pass.  If  
> anything, I would like a set of (possibly bioperl-based, definitely  
> DB-based) modules that can deal with this information.
>
> The more imperative issue in my opinion is that one is prisoner to  
> the gene models for those specific organisms of interest, and this  
> may vary widely depending on the source of those gene models  
> (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc).  For  
> instance, if gene models are poorly curated or rarely updated, the  
> comparisons may be significantly flawed.  Some of these issues may  
> also be (somewhat) alleviated once more transcriptome data is  
> available that helps clear up gene model ambiguities, but that won't  
> be true for all organisms, at least initially.
>
> Note this isn't meant as a slam on any specific DBs or MODs in  
> general, the problem is one born of the fact that there isn't a  
> single, centralized, trusted, consistently updated source for this  
> data, specifically something that will handle moderated third-party  
> annotation.  That's a very difficult problem to solve effectively.   
> Some of these very issues crept up at the GMOD conference, and there  
> appears to be consensus that a real attempt is needed to address this.
>
> I don't know, maybe it's just unicorns and rainbows.  Personally I  
> do think the situation will improve, as there seems to be great  
> demand for it, but it requires time, resources, manpower, money, cat  
> herding, etc.
>
>> The problem arises when one wants to cross those boundaries.  For
>> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
>> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
>> tRNAs, and the fundamental biochemistry (EC) proteins are homologous
>> all the way from the most ancient bacteria through H. sapiens.  The
>> only way to play in the mixed arena of prokaryotes and eukaryotes
>> involving fundamental vectors in evolution is to either construct  
>> ones
>> own databases (which presumably means getting involved with MySQL,  
>> and
>> probably spending some $$$ on hardware) or to develop some BioPerl
>> modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
>> using some part of the cloud.  This problem isn't going to get  
>> smaller
>> its only going to get larger, now that the cost of sequencing
>> (pseudo-resequencing) a vertebrate genome is starting to come in  
>> under
>> $10,000 and people are starting to seriously talk about 10,000
>> vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
>> people are going to undertake very soon.
>>
>> Robert
>
> They're already undertaking it now using a broad range of organisms,  
> in and out of the cloud.  In most cases one can amend a prior recip.  
> comparative analysis with new data fairly easily, if one takes care  
> to do so early on (i.e. set up the BLAST databases with a specified  
> defined size for comparative stats between separate analyses).   
> OrthoMCL v2 describes a procedure to do this, and I believe others  
> have similar methodology.
>
> I could also see possible ways one can further optimize this, for  
> instance in cases where two very closely-related organisms are  
> compared, where translated seqs are 100% identical, etc.  IIRC, the  
> OrthoMCL DB site already has a way to upload custom sets of protein  
> data for mapping to (already pre-run) clusters.  Just the fact that  
> the tools are available as OS, they're semi-automated, and can be  
> generically applied to data of personal interest is a great boon.   
> Not sure I see the downside of that, and I'm pretty confident the  
> scalability issues will be addressed in some way.


I think that the approach that Paul Thomas's group at SRI http://www.ai.sri.com/esb/ 
  is doing is really what you'd want to focus on if you are only  
interested in a particular set of gene families rather than de novo  
clustering. That or the PhyloFacts approach http://phylogenomics.berkeley.edu/phylofacts/ 
  .  That is where HMMs are more appropriate, focusing on your initial  
seed set of families of proteins. HMMs for your families with some  
automated clustering initially to get better resolution.  Once you  
start throwing multiple 10^6 proteins  the unsupervised clustering  
approach may not be able to give as accurate or timely results but can  
be a good initial filtering step depending on how much initial  
knowledge you are starting with. Using HMM models won't be as  
computationally expensive either if you are compute limited.

TreeFam is also providing curated phylogenies of gene families http://www.treefam.org/ 
  that span the optisthokonts in that a few fungi are sprinkled in.

Also things like http://boinc.bio.wzw.tum.de/boincsimap/ provide ways  
to use distributed computing to calculate the matrix of similarities  
among proteins if you are interested in the exhaustive approach.

-jason

>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jay at jays.net  Mon Jan 18 18:36:20 2010
From: jay at jays.net (Jay Hannah)
Date: Mon, 18 Jan 2010 17:36:20 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <9AA13F94-3336-4CC1-89C4-249D0EB7C857@jays.net>

On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote:
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?

If all the advice and resources in this thread have not dissuaded you from writing your own, you could glance at cross_blast() here as reference:

   https://clabsvn.ist.unomaha.edu/anonsvn/user/jhannah/UNO/seqlab/seqlab/tutorial.pod

About the (abandoned) project:

   http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29

I wrote that in 2006 for clustering a few hundred proteins based on custom criteria.

Cheers,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From jay at jays.net  Mon Jan 18 19:22:48 2010
From: jay at jays.net (Jay Hannah)
Date: Mon, 18 Jan 2010 18:22:48 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
Message-ID: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>

I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do.

   http://github.com/jhannah/bio-broodcomb

It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. 

The first two functions I stuck in the framework:

Find subsequences (Bio::BroodComb::SubSeq):

   use Bio::BroodComb;
   my $bc = Bio::BroodComb->new();
   $bc->load_large_seq(file => "large_seq.fasta");
   $bc->load_small_seq(file => "small_seq.fasta");
   $bc->find_subseqs();
   print $bc->subseq_report1;

In-silico PCR (Bio::BroodComb::PCR):

  use Bio::BroodComb;
  my $bc = Bio::BroodComb->new();
  $bc->load_large_seq(file => "large_seq.fasta");
  $bc->add_primerset(
     description    => "U5/R",   # however you want it reported
     forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA',
     reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT',
  );
  $bc->find_pcr_hits();
  $bc->find_pcr_products();
  print $bc->pcr_report1;

I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever.

Suggestions, contributions welcome.   :)

   http://github.com/jhannah/bio-broodcomb

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From ocornejo at gmail.com  Mon Jan 18 19:46:10 2010
From: ocornejo at gmail.com (Omar Cornejo)
Date: Mon, 18 Jan 2010 16:46:10 -0800 (PST)
Subject: [Bioperl-l] installing bioperl for mac
Message-ID: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>

Dear People,
  I have tried to install Bioperl in my new Mac Book, which carries
the latest perl distribution (5.10.0) and for some reason I can't
(using fink) make it recognize this version or perl.
  I have tried:
fink install bioperl-pm510
fink install bioperl-pm5100

but neither one works.  Is it fine installing bioperl for perl v 5.9?

thank you,
Omar Cornejo


From jason at bioperl.org  Mon Jan 18 20:04:31 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Jan 2010 17:04:31 -0800
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <4B5502D9.2010706@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
	<4B5502D9.2010706@gmail.com>
Message-ID: <F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>

Alexandr -

Thanks for getting back to us - I am guessing the parser needs to  
recognize negative coordinates around about line 370 in Bio/AlignIO/ 
Handler/GenericAlignHandler.pm which assumes a split on '-' will be  
sufficient.

Can you post it as a bug to bugzilla along with attaching a record and  
script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/

-jason
On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote:

> I have contacted Pfam, and I have been told that The PDB file actually
> does include a reference to residue "-1":
>
> DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>
> DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>
>
> Since negative numbers are allowed in PDB, the data should probably be
> considered valid.
>
> There are quite a few records like this, so this is not an isolated  
> issue.
>
> Alexandr
>
> On 1/14/2010 7:20 PM, Jason Stajich wrote:
>> Seems like improper data really -- "-1" is an improper coordinate  
>> as far
>> as the parser is concerned. You may want to tell Pfam that there is
>> possible error in the dumper since that was the only record that had
>> this problem?
>>
>> -jason
>> On Jan 13, 2010, at 5:57 PM, albezg wrote:
>>
>>> Hi all,
>>>
>>> I have a problem using AlignIO to read Pfam database:
>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>>> alignment OK until the alignment PF00331.13. There it crashes with  
>>> the
>>> following message:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: '1-344' is not an integer.
>>>
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>>> STACK: Bio::Range::end
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>>> STACK: Bio::Annotation::Target::new
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
>>> GenericAlignHandler.pm:293
>>>
>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
>>> GenericAlignHandler.pm:73
>>>
>>> STACK: Bio::AlignIO::stockholm::next_aln
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>>> -----------------------------------------------------------
>>>
>>> It appears this is caused by this entry:
>>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>>
>>> I don't care about residues in PDB, so I have just removed minus  
>>> signs
>>> from the ranges. This seems to have fixed the crashing.
>>>
>>> Is it a known problem? Is there a solution for it?
>>>
>>> Thanks,
>>> Alexandr
>>>
>>>
>>> On 03/20/2009 05:09 PM, albezg wrote:
>>>>
>>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>>> alignment(SimpleAlign).
>>>>
>>>> There are no issues when I print it, however when I use AlignIO  
>>>> to write
>>>> the alignment to a FASTA file, it does not work. Is this behavior
>>>> intended?
>>>>
>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>>
>>>> The error:
>>>> ------------- EXCEPTION -------------
>>>> MSG: No sequence with name [1/1-11]
>>>> STACK Bio::SimpleAlign::displayname
>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>>> STACK Bio::AlignIO::fasta::write_aln
>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>>> STACK toplevel ./demo.pl:14
>>>> -------------------------------------
>>>>
>>>> Alexandr
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>>
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From cjfields at illinois.edu  Mon Jan 18 21:19:30 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 20:19:30 -0600
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
	<4B5502D9.2010706@gmail.com>
	<F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>
Message-ID: <46FD172A-69C0-436C-A005-AC38668C3347@illinois.edu>

Alexandr,

Posting the bug report would be great, should be an easy enough fix.

chris

On Jan 18, 2010, at 7:04 PM, Jason Stajich wrote:

> Alexandr -
> 
> Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient.
> 
> Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/
> 
> -jason
> On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote:
> 
>> I have contacted Pfam, and I have been told that The PDB file actually
>> does include a reference to residue "-1":
>> 
>> DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>> 
>> DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>> 
>> 
>> Since negative numbers are allowed in PDB, the data should probably be
>> considered valid.
>> 
>> There are quite a few records like this, so this is not an isolated issue.
>> 
>> Alexandr
>> 
>> On 1/14/2010 7:20 PM, Jason Stajich wrote:
>>> Seems like improper data really -- "-1" is an improper coordinate as far
>>> as the parser is concerned. You may want to tell Pfam that there is
>>> possible error in the dumper since that was the only record that had
>>> this problem?
>>> 
>>> -jason
>>> On Jan 13, 2010, at 5:57 PM, albezg wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have a problem using AlignIO to read Pfam database:
>>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>>>> alignment OK until the alignment PF00331.13. There it crashes with the
>>>> following message:
>>>> 
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: '1-344' is not an integer.
>>>> 
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>>>> STACK: Bio::Range::end
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>>>> STACK: Bio::Annotation::Target::new
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
>>>> 
>>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
>>>> 
>>>> STACK: Bio::AlignIO::stockholm::next_aln
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>>>> -----------------------------------------------------------
>>>> 
>>>> It appears this is caused by this entry:
>>>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>>> 
>>>> I don't care about residues in PDB, so I have just removed minus signs
>>>> from the ranges. This seems to have fixed the crashing.
>>>> 
>>>> Is it a known problem? Is there a solution for it?
>>>> 
>>>> Thanks,
>>>> Alexandr
>>>> 
>>>> 
>>>> On 03/20/2009 05:09 PM, albezg wrote:
>>>>> 
>>>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>>>> alignment(SimpleAlign).
>>>>> 
>>>>> There are no issues when I print it, however when I use AlignIO to write
>>>>> the alignment to a FASTA file, it does not work. Is this behavior
>>>>> intended?
>>>>> 
>>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>>> 
>>>>> The error:
>>>>> ------------- EXCEPTION -------------
>>>>> MSG: No sequence with name [1/1-11]
>>>>> STACK Bio::SimpleAlign::displayname
>>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>>>> STACK Bio::AlignIO::fasta::write_aln
>>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>>>> STACK toplevel ./demo.pl:14
>>>>> -------------------------------------
>>>>> 
>>>>> Alexandr
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> -- 
>>> Jason Stajich
>>> jason.stajich at gmail.com
>>> jason at bioperl.org
>>> http://fungalgenomes.org/
>>> 
>> 
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 18 21:20:31 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 20:20:31 -0600
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
Message-ID: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>

On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:

> Dear People,
>  I have tried to install Bioperl in my new Mac Book, which carries
> the latest perl distribution (5.10.0) and for some reason I can't
> (using fink) make it recognize this version or perl.
>  I have tried:
> fink install bioperl-pm510
> fink install bioperl-pm5100
> 
> but neither one works.  Is it fine installing bioperl for perl v 5.9?
> 
> thank you,
> Omar Cornejo

fink doesn't have a package for perl 5.10.  You can install it using CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX installation instructions on the wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

chris


From dan.kortschak at adelaide.edu.au  Mon Jan 18 21:47:47 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 19 Jan 2010 13:17:47 +1030
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie
 now available BETA
Message-ID: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>

Hi All,

A wrapper and output parser for bowtie 'ultrafast, memory-efficient
short read aligner' are now available in the bioperl-live and
bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
bioperl-run/trunk at 16726). Bowtie details are available here:

http://bowtie-bio.sourceforge.net/index.shtml

The modules can return a Bio::Assembly::Scaffold object (operating via
the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
uses large amounts of memory - the test suite works for me with >=2GB
but not with 1GB due to this. (Is there a disk file system based tool
for this for large projects?)

Bowtie (>0.12.0) can align in colour space, but this is not currently
supported by the wrapper though it should not be difficult to add. If
someone can point me to a small set of colour space reads and a
reference sequence I will be able to use these for testing.

Thanks to the core devs for helping me with many of my problems in
putting this together.

Dan


From maj at fortinbras.us  Mon Jan 18 22:31:36 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 22:31:36 -0500
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
	Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <D26A5B3DAFDA4068863C7735BAF7894B@NewLife>

Excellent Dan! Thanks for all this work-- MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 9:47 PM
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now 
available BETA


> Hi All,
>
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
>
> http://bowtie-bio.sourceforge.net/index.shtml
>
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
>
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
>
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
>
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jan 18 22:36:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 21:36:12 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
	Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <CD36CE88-DC05-4A17-86A7-17A85C14F67A@illinois.edu>

On Jan 18, 2010, at 8:47 PM, Dan Kortschak wrote:

> Hi All,
> 
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
> 
> http://bowtie-bio.sourceforge.net/index.shtml
> 
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
> 
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
> 
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
> 
> Dan

And (on behalf of the core devs) thank you for putting this together!

chris


From scott at scottcain.net  Mon Jan 18 22:41:43 2010
From: scott at scottcain.net (Scott Cain)
Date: Mon, 18 Jan 2010 22:41:43 -0500
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
Message-ID: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>

But make sure you have the developers tools installed before the first
time you run the cpan shell; it will make your life easier.

Scott


On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
>
>> Dear People,
>> ?I have tried to install Bioperl in my new Mac Book, which carries
>> the latest perl distribution (5.10.0) and for some reason I can't
>> (using fink) make it recognize this version or perl.
>> ?I have tried:
>> fink install bioperl-pm510
>> fink install bioperl-pm5100
>>
>> but neither one works. ?Is it fine installing bioperl for perl v 5.9?
>>
>> thank you,
>> Omar Cornejo
>
> fink doesn't have a package for perl 5.10. ?You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. ?See the UNIX installation instructions on the wiki:
>
> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Mon Jan 18 23:04:57 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 22:04:57 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <009801c8b957$2af4f8d0$80deea70$@ac.cn>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<009801c8b957$2af4f8d0$80deea70$@ac.cn>
Message-ID: <79D53148-1FDA-4025-99A6-77A7F124E6BD@illinois.edu>

Hmm, the trouchelle repo is the only one that had a working DB_File for perl 5.10 (not sure but I think 5.8.9 was fine).  Probably worth contacting them about this to see if they can drop the (way out-of-date) 1.4 distribution.

chris

On May 18, 2008, at 9:22 PM, Guohong Hu wrote:

> Thank for you all. The problem is solved. The bioperl 1.4 version is from
> the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I
> added all the repo according to the bioperl wiki instruction, somehow 1.4
> became a prerequisite for 1.6. But Chris's question reminded me, so I
> removed Trouchelle repo, and the installation proceeded without errors. I
> suggested we put a note in the wiki link since it looks like an odd issue
> not just for me.
> 
> Best,
> Guohong
> 
> 
> 
> _________________________________________
> ??????: Chris Fields [mailto:cjfields at illinois.edu] 
> ????????: 2010??1??18?? 23:30
> ??????: Guohong Hu
> ????: bioperl-l at lists.open-bio.org
> ????: Re: [Bioperl-l] Bioperl 1.6
> 
> Guohong, 
> 
> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed
> first.  Make sure the repos are set according to the Windows installation
> instructions on the BioPerl wiki:
> 
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> 
> IIRC the actual order of the PPM repository can be critical (PPM pulls based
> on highest version, first repo, but sometimes it gets confused).  Just
> curious but where is the v 1.4 PPM located?  If it is local to our PPM repo
> I can physically remove it to prevent this from happening.
> 
> chris
> 
> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
> 
>> Hi there,
>> 
>> 
>> 
>> I was trying to install BioPerl in windows using ppm, by following the
>> instruction in
>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>> the repositories, and did the search of Bioperl packages. The latest
> version
>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>> install it, a number of prerequisite modules were being installed too,
> which
>> include Bioperl 1.4. Then an error message showed up during installation:
>> 
>> 
>> 
>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>> BioPerl has already installed a file that package bioperl wants to
> install."
>> 
>> 
>> 
>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>> wanted to install again. I don't know why bioperl 1.4 was one of the
>> prerequisites for 1.6.1. If I just install 1.4, it will be installed
> without
>> errors. But I need a newer version, because some modules (like
>> 
>> Bio::Tools::HMM) is not included in 1.4.
>> 
>> 
>> 
>> I saw on internet that somebody had the same problem when he was trying to
>> install BioPerl 1.5, but I didn't find the solution.
>> 
>> 
>> 
>> Anybody has a clue on that? Thank you for your time.
>> 
>> 
>> 
>> GH
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From ocornejo at gmail.com  Mon Jan 18 23:18:00 2010
From: ocornejo at gmail.com (Omar Eduardo Cornejo Ordaz)
Date: Mon, 18 Jan 2010 23:18:00 -0500
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
	<4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
	<5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>
Message-ID: <ddd346a41001182018o5952415fx7930d85a9430453@mail.gmail.com>

I see.
  thank you Scott and Chris.
  I had already installed the latest version of the Xcode Developer Tools.
  I will go the cpan way then.

have a nice one,
Omar

On Mon, Jan 18, 2010 at 10:58 PM, Chris Fields <cjfields at illinois.edu>wrote:

> Yes, definitely!
>
> -c
>
> On Jan 18, 2010, at 9:41 PM, Scott Cain wrote:
>
> > But make sure you have the developers tools installed before the first
> > time you run the cpan shell; it will make your life easier.
> >
> > Scott
> >
> >
> > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
> >>
> >>> Dear People,
> >>>  I have tried to install Bioperl in my new Mac Book, which carries
> >>> the latest perl distribution (5.10.0) and for some reason I can't
> >>> (using fink) make it recognize this version or perl.
> >>>  I have tried:
> >>> fink install bioperl-pm510
> >>> fink install bioperl-pm5100
> >>>
> >>> but neither one works.  Is it fine installing bioperl for perl v 5.9?
> >>>
> >>> thank you,
> >>> Omar Cornejo
> >>
> >> fink doesn't have a package for perl 5.10.  You can install it using
> CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX
> installation instructions on the wiki:
> >>
> >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
> >>
> >> chris
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Mon Jan 18 22:58:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 21:58:36 -0600
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
	<4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
Message-ID: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>

Yes, definitely!

-c

On Jan 18, 2010, at 9:41 PM, Scott Cain wrote:

> But make sure you have the developers tools installed before the first
> time you run the cpan shell; it will make your life easier.
> 
> Scott
> 
> 
> On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
>> 
>>> Dear People,
>>>  I have tried to install Bioperl in my new Mac Book, which carries
>>> the latest perl distribution (5.10.0) and for some reason I can't
>>> (using fink) make it recognize this version or perl.
>>>  I have tried:
>>> fink install bioperl-pm510
>>> fink install bioperl-pm5100
>>> 
>>> but neither one works.  Is it fine installing bioperl for perl v 5.9?
>>> 
>>> thank you,
>>> Omar Cornejo
>> 
>> fink doesn't have a package for perl 5.10.  You can install it using CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX installation instructions on the wiki:
>> 
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From albezg at gmail.com  Mon Jan 18 19:54:49 2010
From: albezg at gmail.com (Alexandr Bezginov)
Date: Mon, 18 Jan 2010 19:54:49 -0500
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
 with negative PDB ranges
In-Reply-To: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
Message-ID: <4B5502D9.2010706@gmail.com>

I have contacted Pfam, and I have been told that The PDB file actually
does include a reference to residue "-1":

DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611

DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611


Since negative numbers are allowed in PDB, the data should probably be
considered valid.

There are quite a few records like this, so this is not an isolated issue.

Alexandr

On 1/14/2010 7:20 PM, Jason Stajich wrote:
> Seems like improper data really -- "-1" is an improper coordinate as far
> as the parser is concerned. You may want to tell Pfam that there is
> possible error in the dumper since that was the only record that had
> this problem?
> 
> -jason
> On Jan 13, 2010, at 5:57 PM, albezg wrote:
> 
>> Hi all,
>>
>> I have a problem using AlignIO to read Pfam database:
>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>> alignment OK until the alignment PF00331.13. There it crashes with the
>> following message:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: '1-344' is not an integer.
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>> STACK: Bio::Range::end
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>> STACK: Bio::Annotation::Target::new
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
>>
>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
>>
>> STACK: Bio::AlignIO::stockholm::next_aln
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>> -----------------------------------------------------------
>>
>> It appears this is caused by this entry:
>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>
>> I don't care about residues in PDB, so I have just removed minus signs
>> from the ranges. This seems to have fixed the crashing.
>>
>> Is it a known problem? Is there a solution for it?
>>
>> Thanks,
>> Alexandr
>>
>>
>> On 03/20/2009 05:09 PM, albezg wrote:
>>>
>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>> alignment(SimpleAlign).
>>>
>>> There are no issues when I print it, however when I use AlignIO to write
>>> the alignment to a FASTA file, it does not work. Is this behavior
>>> intended?
>>>
>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>
>>> The error:
>>> ------------- EXCEPTION -------------
>>> MSG: No sequence with name [1/1-11]
>>> STACK Bio::SimpleAlign::displayname
>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>> STACK Bio::AlignIO::fasta::write_aln
>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>> STACK toplevel ./demo.pl:14
>>> -------------------------------------
>>>
>>> Alexandr
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 


From ghhu at sibs.ac.cn  Mon Jan 18 21:22:19 2010
From: ghhu at sibs.ac.cn (Guohong Hu)
Date: Tue, 19 Jan 2010 02:22:19 -0000
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
Message-ID: <009801c8b957$2af4f8d0$80deea70$@ac.cn>

Thank for you all. The problem is solved. The bioperl 1.4 version is from
the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I
added all the repo according to the bioperl wiki instruction, somehow 1.4
became a prerequisite for 1.6. But Chris's question reminded me, so I
removed Trouchelle repo, and the installation proceeded without errors. I
suggested we put a note in the wiki link since it looks like an odd issue
not just for me.

Best,
Guohong


_________________________________________
??????: Chris Fields [mailto:cjfields at illinois.edu] 
????????: 2010??1??18?? 23:30
??????: Guohong Hu
????: bioperl-l at lists.open-bio.org
????: Re: [Bioperl-l] Bioperl 1.6

Guohong, 

1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed
first.  Make sure the repos are set according to the Windows installation
instructions on the BioPerl wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

IIRC the actual order of the PPM repository can be critical (PPM pulls based
on highest version, first repo, but sometimes it gets confused).  Just
curious but where is the v 1.4 PPM located?  If it is local to our PPM repo
I can physically remove it to prevent this from happening.

chris

On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:

> Hi there,
> 
> 
> 
> I was trying to install BioPerl in windows using ppm, by following the
> instruction in
> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
> the repositories, and did the search of Bioperl packages. The latest
version
> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
> install it, a number of prerequisite modules were being installed too,
which
> include Bioperl 1.4. Then an error message showed up during installation:
> 
> 
> 
> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
> BioPerl has already installed a file that package bioperl wants to
install."
> 
> 
> 
> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
> wanted to install again. I don't know why bioperl 1.4 was one of the
> prerequisites for 1.6.1. If I just install 1.4, it will be installed
without
> errors. But I need a newer version, because some modules (like
> 
> Bio::Tools::HMM) is not included in 1.4.
> 
> 
> 
> I saw on internet that somebody had the same problem when he was trying to
> install BioPerl 1.5, but I didn't find the solution.
> 
> 
> 
> Anybody has a clue on that? Thank you for your time.
> 
> 
> 
> GH
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jw12 at sanger.ac.uk  Tue Jan 19 05:41:12 2010
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 19 Jan 2010 10:41:12 +0000
Subject: [Bioperl-l] DAS Workshop Registrations now Open (workshop date 7-9
	April 2010)
Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk>

If you don't know about DAS and wish to know how to distribute your  
latest biological annotation to the world then the upcoming DAS  
workshop maybe for you.
If you know about DAS and are maybe a DAS client developer then the  
upcoming DAS workshop is for you (as you will need to know about the  
upcoming DAS 1.6 Specification and how it may affect your software).

For information on the workshop and registration please go to:

http://www.ebi.ac.uk/training/handson/DAS_070410.html


Jonathan Warren
Senior Developer and DAS coordinator
jw12 at sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From SMarkel at accelrys.com  Tue Jan 19 13:00:22 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Tue, 19 Jan 2010 10:00:22 -0800
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
 Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>

Dan,

Life Tech has sample data for E. coli at

http://solidsoftwaretools.com/gf/project/ecoli2x50/

and

http://solidsoftwaretools.com/gf/project/dh10bfrag/.

Reference sequences are included.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
Sent: Monday, 18 January 2010 6:48 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA

Hi All,

A wrapper and output parser for bowtie 'ultrafast, memory-efficient
short read aligner' are now available in the bioperl-live and
bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
bioperl-run/trunk at 16726). Bowtie details are available here:

http://bowtie-bio.sourceforge.net/index.shtml

The modules can return a Bio::Assembly::Scaffold object (operating via
the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
uses large amounts of memory - the test suite works for me with >=2GB
but not with 1GB due to this. (Is there a disk file system based tool
for this for large projects?)

Bowtie (>0.12.0) can align in colour space, but this is not currently
supported by the wrapper though it should not be difficult to add. If
someone can point me to a small set of colour space reads and a
reference sequence I will be able to use these for testing.

Thanks to the core devs for helping me with many of my problems in
putting this together.

Dan

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Tue Jan 19 16:18:20 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 07:48:20 +1030
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
 Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
	<5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>
Message-ID: <1263935900.4813.0.camel@epistle>

Great.

Thanks, Scott.

Dan

On Tue, 2010-01-19 at 10:00 -0800, Scott Markel wrote:
> Dan,
> 
> Life Tech has sample data for E. coli at
> 
> http://solidsoftwaretools.com/gf/project/ecoli2x50/
> 
> and
> 
> http://solidsoftwaretools.com/gf/project/dh10bfrag/.
> 
> Reference sequences are included.
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>     International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
> Sent: Monday, 18 January 2010 6:48 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA
> 
> Hi All,
> 
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
> 
> http://bowtie-bio.sourceforge.net/index.shtml
> 
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
> 
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
> 
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
> 
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Wed Jan 20 00:32:05 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 16:02:05 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
Message-ID: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>

Hi Chris (or others),

I've been looking at ways to do large assemblies (really rnaseq/readseq
comparisons for coverage) with maq/bowtie output and it's clear that for
the size of project that I'm working on the space complexity is too
nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
go.

I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF

This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
read through the docs, and it's not entirely clear (I'm hoping I've
interpreted it the right way), but does this result in the return of
features such that overlapping features are returned as a single feature
while non-overlapping features come back separately. If this is the
case, it would satisfy my requirements perfectly.

thanks for your time
Dan


From jason at bioperl.org  Wed Jan 20 01:35:24 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 19 Jan 2010 22:35:24 -0800
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>

Are you looking at the bowtie features file or the SAM?
-jason
On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote:

> Hi Chris (or others),
>
> I've been looking at ways to do large assemblies (really rnaseq/ 
> readseq
> comparisons for coverage) with maq/bowtie output and it's clear that  
> for
> the size of project that I'm working on the space complexity is too
> nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> go.
>
> I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF ->  
> B:DB:GFF
>
> This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> read through the docs, and it's not entirely clear (I'm hoping I've
> interpreted it the right way), but does this result in the return of
> features such that overlapping features are returned as a single  
> feature
> while non-overlapping features come back separately. If this is the
> case, it would satisfy my requirements perfectly.
>
> thanks for your time
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From dan.kortschak at adelaide.edu.au  Wed Jan 20 02:19:05 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 17:49:05 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
	<C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>
Message-ID: <1263971945.4582.2.camel@epistle>

It doesn't really matter, they are largely inter-convertible. The
problem is not really the upstream processing, but the aggregation of
reads into read-assigned regions (unless I've misunderstood your
question).

Dan

On Tue, 2010-01-19 at 22:35 -0800, Jason Stajich wrote:
> Are you looking at the bowtie features file or the SAM?
> -jason
> On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote:
> 
> > Hi Chris (or others),
> >
> > I've been looking at ways to do large assemblies (really rnaseq/ 
> > readseq
> > comparisons for coverage) with maq/bowtie output and it's clear that  
> > for
> > the size of project that I'm working on the space complexity is too
> > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> > go.
> >
> > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF ->  
> > B:DB:GFF
> >
> > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> > read through the docs, and it's not entirely clear (I'm hoping I've
> > interpreted it the right way), but does this result in the return of
> > features such that overlapping features are returned as a single  
> > feature
> > while non-overlapping features come back separately. If this is the
> > case, it would satisfy my requirements perfectly.
> >
> > thanks for your time
> > Dan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/

-- 
Dan Kortschak <dan.kortschak at adelaide.edu.au>


From ajmackey at gmail.com  Wed Jan 20 07:59:38 2010
From: ajmackey at gmail.com (Aaron Mackey)
Date: Wed, 20 Jan 2010 07:59:38 -0500
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>

I would advise using BEDtools or the R IRanges package for this kind of
aggregation/merging work, rather than trying to reinvent this particular
wheel.

-Aaron

On Wed, Jan 20, 2010 at 12:32 AM, Dan Kortschak <
dan.kortschak at adelaide.edu.au> wrote:

> Hi Chris (or others),
>
> I've been looking at ways to do large assemblies (really rnaseq/readseq
> comparisons for coverage) with maq/bowtie output and it's clear that for
> the size of project that I'm working on the space complexity is too
> nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> go.
>
> I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF
>
> This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> read through the docs, and it's not entirely clear (I'm hoping I've
> interpreted it the right way), but does this result in the return of
> features such that overlapping features are returned as a single feature
> while non-overlapping features come back separately. If this is the
> case, it would satisfy my requirements perfectly.
>
> thanks for your time
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From dan.kortschak at adelaide.edu.au  Wed Jan 20 16:16:39 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 21 Jan 2010 07:46:39 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
	<24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>
Message-ID: <1264022199.4688.29.camel@epistle>

Thanks for that, I'll look into those. BEDtools looks like what I want.

cheers
Dan

On Wed, 2010-01-20 at 07:59 -0500, Aaron Mackey wrote:
> I would advise using BEDtools or the R IRanges package for this kind
> of aggregation/merging work, rather than trying to reinvent this
> particular wheel.
> 
> -Aaron


From biopython at maubp.freeserve.co.uk  Thu Jan 21 07:33:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 21 Jan 2010 12:33:53 +0000
Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in
	BioSQL
Message-ID: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>

Hi all,

This is cross posted to try and ensure relevant people see it.
I suggest we continue the discussion on the BioSQL list
(for how to serialise structured annotation to BioSQL), and/or
the OpenBio list (for things like file format naming conventions).

I am hoping we (Bio*) can be consistent in how we parse and load
into BioSQL the SwissProt DE lines (known as "swiss" format in
both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
equivalent UniProt XML tags (which we are tentatively going to
call the "uniprot" format in Biopython's SeqIO - comments?).

Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
files and load them into BioSQL. Biopython currently treats the DE
comment lines as a long string, as BioPerl used to:

http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html

I understand that BioPerl now turns the SwissProt DE lines into a
TagTree, and for storing this in BioSQL this gets serialised as XML.
I would like Biopython to handle this the same way (although rather
than a Perl TagTree, we'd use a Python structure of course), and
would appreciate clarification of what exactly was implemented
(e.g. which bit of the BioPerl source code should be look at,
and could you show a worked example?).

Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
Open-Bio lists yet) has started work on parsing UniProt XML
files for Biopython. Here the DE comment lines are already
provided broken up with XML markup. Hopefully their nested
structure matches what BioPerl was doing with the SwissProt
DE lines.

Regards,

Peter


From cjfields at illinois.edu  Thu Jan 21 08:34:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Jan 2010 07:34:12 -0600
Subject: [Bioperl-l] [Open-bio-l] SwissProt DE lines and UniProt XML /
	TagTree as XML in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <A6F5F623-2750-4BB0-91F7-5A87BABE367B@illinois.edu>

Peter,

The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag:

http://search.cpan.org/~cmungall/Data-Stag-0.11/Data/Stag.pm

This is where the text output is derived from.  It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable.  We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback. 

If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.).  

chris

On Jan 21, 2010, at 6:33 AM, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter


From sharmashalu.bio at gmail.com  Thu Jan 21 09:25:44 2010
From: sharmashalu.bio at gmail.com (shalu sharma)
Date: Thu, 21 Jan 2010 09:25:44 -0500
Subject: [Bioperl-l] sequence orientation
Message-ID: <465b5a661001210625j3d84a165u69d8c8d21d2fe7ac@mail.gmail.com>

Hi All,
         This is not a perl/bioperl query but i thought that its a best
place to ask.
I have some pyro reads ( from CAMERA) and i want to find out their 5' and 3'
ends. Is there any way i can do this?

I would really appreciate if anyone can help me out.

Thanks
Shalu


From rtbio.2009 at gmail.com  Thu Jan 21 13:28:43 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Thu, 21 Jan 2010 19:28:43 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <196889DF87964224ACDB948681BA7F86@NewLife>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<4C2E8133F916495B876628EF3E8FCBB2@NewLife>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
Message-ID: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>

Hello Mark,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");

while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;
   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;
              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen <maj at fortinbras.us> wrote:

>  Excellent Roopa- it's my pleasure-- MAJ
>
> ----- Original Message -----
> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
> *To:* Mark A. Jensen <maj at fortinbras.us>
> *Sent:* Saturday, January 09, 2010 6:41 PM
> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>
> Hi Mark,
>
> Thank you very very much. The code is working now. Thanks for the support
> and time you have spent on me.
>
> Thanks in advance
> Roopa.
>
> On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>>  There is still a bug with the double quotes. Use "$organ\[ORGN]", which
>> prevents perl from
>> looking for a member of an array called @organ. This would have shown up
>> if 'use strict;' had
>> been in place. Still don't know whether this would work precisely; can you
>> send me the query
>> sequence so I can reproduce your ouput?
>> thanks MAJ
>>
>>  ----- Original Message -----
>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>> *Sent:* Saturday, January 09, 2010 2:02 PM
>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>
>> Hi Mark,
>>
>> I tried it with double quotes but still i got the same o/p with sequences
>> from different species.
>>
>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...  1813
>> 0.0
>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...  1622
>> 0.0
>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...   773
>> 0.0
>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...   749
>> 0.0
>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   542
>> 2e-151
>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...   196
>> 3e-47
>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...   190
>> 1e-45
>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...   181
>> 7e-43
>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...   179
>> 2e-42
>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   178
>> 8e-42
>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...   170
>> 1e-39
>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...   169
>> 4e-39
>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...   167
>> 1e-38
>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...   150
>> 1e-33
>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...   145
>> 5e-32
>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...   143
>> 2e-31
>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...   143
>> 2e-31
>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...   138
>> 7e-30
>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...   138
>> 7e-30
>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...   136
>> 2e-29
>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...   136
>> 2e-29
>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...   134
>> 9e-29
>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...   132
>> 3e-28
>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...   132
>> 3e-28
>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...   132
>> 3e-28
>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...   131
>> 1e-27
>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...   131
>> 1e-27
>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...   129
>> 4e-27
>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...   129
>> 4e-27
>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...   129
>> 4e-27
>> ref|NM_001003470.1|  Danio rerio protein kinase, cAMP-dependen...   129
>> 4e-27
>> ref|XM_001141503.1|  PREDICTED: Pan troglodytes verus protein ...   127
>> 1e-26
>> ref|XM_001145269.1|  PREDICTED: Pan troglodytes protein kinase...   127
>> 1e-26
>> ref|XM_512434.2|  PREDICTED: Pan troglodytes cAMP-dependent pr...   127
>> 1e-26
>> ref|XM_001171457.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_001171437.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_847420.1|  PREDICTED: Canis familiaris similar to Serin...   127
>> 1e-26
>> ref|NM_207518.1|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>> ref|NM_002730.3|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>>
>>
>> Thanks in advance.
>>
>> Roopa.
>>
>> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>>  I understand you. Put in the double quotes and see what happens.
>>>
>>>  ----- Original Message -----
>>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>> *To:* Mark A. Jensen <maj at fortinbras.us>
>>>   *Sent:* Saturday, January 09, 2010 1:40 PM
>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>
>>> Hi Mark,
>>>
>>> Thanks for your reply. It was working when I specifically use the name of
>>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a
>>> $organ which takes the organism given by the user i.e., let it be anything
>>>
>>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc.,  I should get the
>>> sequences related to only those organisms.
>>>
>>> i.e., If the user enters Pseudomonas,the $organ parameter of the code
>>> takes Pseudomonas ,does BLAST and returns only those sequences that produce
>>> significant alignment with Pseudomonas(only).But this is not happening like
>>> that .
>>>
>>> Please help me in this regard.
>>>
>>> Thanks in advance
>>> Roopa
>>>
>>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>
>>>>  Hi Roopa-- You may get what you want if you make the change.
>>>> With single quotes, ENTREZ_QUERY is set to the literal string
>>>>
>>>>  $organ[ORGN]
>>>>
>>>> while, with double quotes, the variable value will be substituted,
>>>> and the parameter should be set to
>>>>
>>>>  Trypanosoma brucei[ORGN]
>>>>
>>>> I'm guess that it worked because the database ignored the strange
>>>> parameter,
>>>> and returned all the matches. Try this and if it doesn't work I look
>>>> harder.
>>>> cheers,
>>>> Mark
>>>>
>>>>  ----- Original Message -----
>>>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>>>> *Sent:* Saturday, January 09, 2010 1:24 PM
>>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>>
>>>> hello Mark,
>>>>
>>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in
>>>> double quotations,but. I would like to have only those specific sequences
>>>> which are specific for my Organism i.e., I need sequences only from the
>>>> organism that I entered.
>>>>
>>>> When the organism is Trypanosoma brucei,I could get even Leishmania and
>>>> other species as the similar sequences. But I want to get only trypanosoma
>>>> brucei sequences.
>>>>
>>>> Could you please help me out in this regard?
>>>>
>>>> Roopa.
>>>>
>>>> My output
>>>>
>>>> I/P organism: Trypanosoma brucei
>>>>
>>>> O/P:-
>>>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1813    0.0
>>>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1622    0.0
>>>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 773    0.0
>>>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 749    0.0
>>>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 542    2e-151
>>>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...
>>>> 196    3e-47
>>>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 190    1e-45
>>>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...
>>>> 181    7e-43
>>>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 179    2e-42
>>>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 178    8e-42
>>>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 170    1e-39
>>>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...
>>>> 168    4e-39
>>>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...
>>>> 167    1e-38
>>>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 150    1e-33
>>>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...
>>>> 145    5e-32
>>>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...
>>>> 143    2e-31
>>>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...
>>>> 143    2e-31
>>>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...
>>>> 138    7e-30
>>>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...
>>>> 138    7e-30
>>>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...
>>>> 136    2e-29
>>>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...
>>>> 136    2e-29
>>>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...
>>>> 134    9e-29
>>>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...
>>>> 132    3e-28
>>>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...
>>>> 132    3e-28
>>>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...
>>>> 132    3e-28
>>>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...
>>>> 131    1e-27
>>>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 131    1e-27
>>>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...
>>>> 129    4e-27
>>>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...
>>>> 129    4e-27
>>>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...
>>>> 129    4e-27
>>>>
>>>> Roopa.
>>>>
>>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>>
>>>>> I see it immediately (from making same bug many times) :
>>>>>
>>>>>
>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>> =>
>>>>> - '$organ[ORGN]');
>>>>> +"$organ[ORGN]");
>>>>>
>>>>>
>>>>> MAJ
>>>>>
>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>> rtbio.2009 at gmail.com>
>>>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>>>> Cc: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Saturday, January 09, 2010 11:57 AM
>>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>>
>>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei
>>>>>> as
>>>>>> the organism parameter,but when I tried to use the Organism parameter
>>>>>> from
>>>>>> the user,it was not working i.e., I was unable to get the target
>>>>>> sequences.
>>>>>> Please help me in this regard. My code is
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>>
>>>>>> #path for extra camel module
>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>> use Roopablast;
>>>>>>
>>>>>>
>>>>>> use Bio::SearchIO;
>>>>>> use Bio::Search::Result::BlastResult;
>>>>>> use Bio::Perl;
>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::DB::GenBank;
>>>>>>
>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>> my $outstring ="";
>>>>>>
>>>>>> &parse_form;
>>>>>>
>>>>>> print "Content-type: text/html\n\n";
>>>>>> print "<HTML>\n";
>>>>>> print "<head><title>RNAi Result</title>";
>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>> print "</head>\n";
>>>>>> print "<body>\n";
>>>>>> print " Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>> print "</BODY>\n";
>>>>>> print "</HTML>\n";
>>>>>>
>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>> exit if $pid;
>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>
>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>
>>>>>> print OUTFILE "<HTML>\n
>>>>>> <head><title>RNAi Result</title>
>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>> </head>\n
>>>>>> <body>\n
>>>>>>  Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>> wait......<br>
>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>> </BODY>\n
>>>>>> </HTML>\n";
>>>>>>
>>>>>> close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
>>>>>>
>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>
>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>> $in{'Threshold'});
>>>>>>
>>>>>>
>>>>>> sub blastcode
>>>>>> {
>>>>>>
>>>>>> $inpu1= $_[0];
>>>>>>
>>>>>> $organ= $_[1];
>>>>>>
>>>>>> open(NUC,'>',$nuc);
>>>>>> print NUC $inpu1,"\n";
>>>>>> close(NUC);
>>>>>>
>>>>>> my $prog = 'blastn';
>>>>>> my $db   = 'refseq_rna';
>>>>>> my $e_val= '1e-10';
>>>>>> my $organism= $organ;
>>>>>>
>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>
>>>>>> my @params = ( '-prog' => $prog,
>>>>>>        '-data' => $db,
>>>>>>        '-expect' => $e_val,
>>>>>>        '-readmethod' => 'SearchIO',
>>>>>>       '-Organism'   => $organism );
>>>>>>
>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>            print OUTFILE $inpu1;
>>>>>>             close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>>> =>
>>>>>> '$organ[ORGN]');
>>>>>>
>>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>
>>>>>>  #change a paramter
>>>>>>
>>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>> Brucei[ORGN]';
>>>>>>
>>>>>> #change a paramter
>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>> '$input2[ORGN]';
>>>>>>
>>>>>>  my $v = 1;
>>>>>>  #$v is just to turn on and off the messages
>>>>>>
>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>> '-organism' => $organ );
>>>>>>
>>>>>>
>>>>>> while (my $input = $str->next_seq())
>>>>>> {
>>>>>>  #Blast a sequence against a database:
>>>>>>   #Alternatively, you could  pass in a file with many
>>>>>>   #sequences rather than loop through sequence one at a time
>>>>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>   #and swap the two lines below for an example of that.
>>>>>>
>>>>>>            #open(OUTFILE,'>',$debugfile);
>>>>>>             # print OUTFILE $input;
>>>>>>             #close(OUTFILE);
>>>>>>
>>>>>>
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>
>>>>>>               open(OUTFILE,'>',$debugfile);
>>>>>>            #   print OUTFILE $r;
>>>>>>               close(OUTFILE);
>>>>>>
>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>  #   open(OUTFILE,'>',$debugfile);
>>>>>>   #           print OUTFILE "while entered";
>>>>>>    #         close(OUTFILE);
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>
>>>>>>     #         open(OUTFILE,'>',$debugfile);
>>>>>>      #        print OUTFILE "foreach entered";
>>>>>>       #      close(OUTFILE);
>>>>>>
>>>>>>       my $rc = $factory->retrieve_blast($rid);
>>>>>>
>>>>>>       if( !ref($rc) )
>>>>>>       {
>>>>>>       if( $rc < 0 )
>>>>>>       {
>>>>>>       $factory->remove_rid($rid);
>>>>>>       }
>>>>>>        open(OUTFILE,'>',$debugfile);
>>>>>>        #      print OUTFILE "if entered";
>>>>>>             close(OUTFILE);
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>       }
>>>>>>      else {
>>>>>>         #    open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "else entered";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>         my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>         print BLASTDEBUGFILE $result->next_hit();
>>>>>>         close(BLASTDEBUGFILE);
>>>>>>
>>>>>>       my $filename =
>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>
>>>>>>        # open(DEBUGFILE,'>',$debugfile);
>>>>>>        # open(new,'>',$filename);
>>>>>>        # @arra=<new>;
>>>>>>        # print DEBUGFILE @arra;
>>>>>>        # close(DEBUGFILE);
>>>>>>        # close(new);
>>>>>>
>>>>>>        $factory->save_output($filename);
>>>>>>  # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>      # close(BLASTDEBUGFILE);
>>>>>>
>>>>>>      $factory->remove_rid($rid);
>>>>>>
>>>>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>      print BLASTDEBUGFILE  $organism;
>>>>>>       close(BLASTDEBUGFILE);
>>>>>>
>>>>>>   # open(OUTFILE,'>',$outfile);
>>>>>>   # print OUTFILE "Test2 $result->database_name()";
>>>>>>   # close(OUTFILE);
>>>>>>
>>>>>> #$hit = $result->next_hit;
>>>>>> #open(new,'>',$debugfile);
>>>>>> #print $hit;
>>>>>> #close(new);
>>>>>>
>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>
>>>>>>           next unless ( $v > 0);
>>>>>>
>>>>>>         #     open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "$hit in while hits";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>>>>                 push(@seqs,$dna);
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>>  }
>>>>>>
>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>  #print OUTFILE $seqs[0];
>>>>>>  #close(OUTFILE);
>>>>>>
>>>>>> return(@seqs);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Roopa.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Roopa--
>>>>>>>
>>>>>>> I got your code to work with the following changes:
>>>>>>>
>>>>>>> +# the input should be a valid FASTA file...
>>>>>>> ...
>>>>>>> open(NUC,'>',$nuc);
>>>>>>> +print NUC ">seq (need a name line for valid fasta)\n";
>>>>>>> print NUC $inpu1, "\n";
>>>>>>> close(NUC);
>>>>>>> ...
>>>>>>>
>>>>>>> +# you can set these header parms in the call itself...
>>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params,
>>>>>>> -ENTREZ_QUERY =>
>>>>>>> ''Trypanosoma Brucei[ORGN]');
>>>>>>>
>>>>>>>  #change a paramter
>>>>>>> +# commented this out...
>>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>> 'Trypanosoma
>>>>>>> Brucei[ORGN]';
>>>>>>>
>>>>>>> MAJ
>>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>>>> rtbio.2009 at gmail.com
>>>>>>> >
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, January 08, 2010 10:00 AM
>>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>>>>>>
>>>>>>>
>>>>>>>  Hello all,
>>>>>>>
>>>>>>>>
>>>>>>>> I was trying Remote blast using Bioperl. My input data is a
>>>>>>>> Trypanosoma
>>>>>>>> brucei sequence in Fasta format. When I was trying to submit to
>>>>>>>> BLAST
>>>>>>>> using
>>>>>>>> the step
>>>>>>>> $r=$factory->submit_blast($input)
>>>>>>>> It was not returning anything which I checked by debugging the code.
>>>>>>>> It is
>>>>>>>> not blasting my input sequence even though I mentioned all the
>>>>>>>> parameters.I
>>>>>>>> would paste the code below.
>>>>>>>>
>>>>>>>> Please help me in solving put this problem. It is very urgent.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Roopa.
>>>>>>>>
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #path for extra camel module
>>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>>>> use Roopablast;
>>>>>>>>
>>>>>>>>
>>>>>>>> use Bio::SearchIO;
>>>>>>>> use Bio::Search::Result::BlastResult;
>>>>>>>> use Bio::Perl;
>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>> use Bio::Seq;
>>>>>>>> use Bio::SeqIO;
>>>>>>>> use Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>> my $outstring ="";
>>>>>>>>
>>>>>>>> &parse_form;
>>>>>>>>
>>>>>>>> print "Content-type: text/html\n\n";
>>>>>>>> print "<HTML>\n";
>>>>>>>> print "<head><title>RNAi Result</title>";
>>>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>>>> print "</head>\n";
>>>>>>>> print "<body>\n";
>>>>>>>> print " Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>>>> print "</BODY>\n";
>>>>>>>> print "</HTML>\n";
>>>>>>>>
>>>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>>>> exit if $pid;
>>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>>>> </head>\n
>>>>>>>> <body>\n
>>>>>>>>  Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>>>> wait......<br>
>>>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>>>> </BODY>\n
>>>>>>>> </HTML>\n";
>>>>>>>>
>>>>>>>> close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> @compseqs = blastcode($in{'Inputseq'});
>>>>>>>>
>>>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>>>
>>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>>>> $in{'Threshold'});
>>>>>>>>
>>>>>>>>
>>>>>>>> sub blastcode
>>>>>>>> {
>>>>>>>>
>>>>>>>> $inpu1= $_[0];
>>>>>>>>
>>>>>>>> #$organ= $_[1];
>>>>>>>>
>>>>>>>> open(NUC,'>',$nuc);
>>>>>>>> print NUC $inpu1;
>>>>>>>> close(NUC);
>>>>>>>>
>>>>>>>> my $prog = 'blastn';
>>>>>>>> my $db   = 'refseq_rna';
>>>>>>>> my $e_val= '1e-10';
>>>>>>>> my $organism= 'Trypanosoma Brucei';
>>>>>>>>
>>>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>       '-data' => $db,
>>>>>>>>       '-expect' => $e_val,
>>>>>>>>       '-readmethod' => 'SearchIO',
>>>>>>>>       '-Organism'   => $organism );
>>>>>>>>
>>>>>>>>          # open(OUTFILE,'>',$debugfile);
>>>>>>>>           #  print OUTFILE @params;
>>>>>>>>           # close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>
>>>>>>>>  #change a paramter
>>>>>>>>
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>>>> Brucei[ORGN]';
>>>>>>>>
>>>>>>>> #change a paramter
>>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>>> '$input2[ORGN]';
>>>>>>>>
>>>>>>>>  my $v = 1;
>>>>>>>>  #$v is just to turn on and off the messages
>>>>>>>>
>>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>>>> '-organism' => 'Trypanosoma Brucei' );
>>>>>>>>
>>>>>>>>
>>>>>>>> while (my $input = $str->next_seq())
>>>>>>>> {
>>>>>>>>  #Blast a sequence against a database:
>>>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>>>  #and swap the two lines below for an example of that.
>>>>>>>>
>>>>>>>>           open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE $input;
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  my $r = $factory->submit_blast($input);    #The program stops here
>>>>>>>> it
>>>>>>>> does not return any value and it does not enter the While
>>>>>>>> loop,Please help
>>>>>>>> me in this regard.#
>>>>>>>>              open(OUTFILE,'>',$debugfile);
>>>>>>>>              print OUTFILE $r;
>>>>>>>>              close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>>>
>>>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>>>    open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "while entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>   foreach my $rid ( @rids ) {
>>>>>>>>
>>>>>>>>             open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "foreach entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>>>
>>>>>>>>      if( !ref($rc) )
>>>>>>>>      {
>>>>>>>>      if( $rc < 0 )
>>>>>>>>      {
>>>>>>>>      $factory->remove_rid($rid);
>>>>>>>>      }
>>>>>>>>       open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "if entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>       print STDERR "." if ( $v > 0 );
>>>>>>>>       sleep 5;
>>>>>>>>      }
>>>>>>>>     else {
>>>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "else entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>        my $result = $rc->next_result();
>>>>>>>>       #save the output
>>>>>>>>      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>        print BLASTDEBUGFILE $result->next_hit();
>>>>>>>>        close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>      my $filename =
>>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>>>
>>>>>>>>       # open(DEBUGFILE,'>',$debugfile);
>>>>>>>>       # open(new,'>',$filename);
>>>>>>>>       # @arra=<new>;
>>>>>>>>       # print DEBUGFILE @arra;
>>>>>>>>       # close(DEBUGFILE);
>>>>>>>>       # close(new);
>>>>>>>>
>>>>>>>>       $factory->save_output($filename);
>>>>>>>>
>>>>>>>>     # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>>>     # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>>>     # close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>     $factory->remove_rid($rid);
>>>>>>>>
>>>>>>>>     open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>     print BLASTDEBUGFILE  $organism;
>>>>>>>>      close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>  # open(OUTFILE,'>',$outfile);
>>>>>>>>  # print OUTFILE "Test2 $result->database_name()";
>>>>>>>>  # close(OUTFILE);
>>>>>>>>
>>>>>>>> #$hit = $result->next_hit;
>>>>>>>> #open(new,'>',$debugfile);
>>>>>>>> #print $hit;
>>>>>>>> #close(new);
>>>>>>>>
>>>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>>>
>>>>>>>>          next unless ( $v > 0);
>>>>>>>>
>>>>>>>>        #     open(OUTFILE,'>',$debugfile);
>>>>>>>>         #    print OUTFILE "$hit in while hits";
>>>>>>>>          #  close(OUTFILE);
>>>>>>>>
>>>>>>>>     my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>>>         my $dna = $sequ->seq();        # get the sequence as a
>>>>>>>> string
>>>>>>>>                push(@seqs,$dna);
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>    }
>>>>>>>>  }
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>>>  #print OUTFILE $seqs[0];
>>>>>>>>  #close(OUTFILE);
>>>>>>>>
>>>>>>>> return(@seqs);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile) || die ;
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>>>>>>> <body>\n
>>>>>>>> <p><font face=\"Courier, monospace font set\">
>>>>>>>> Inputsequence: <br>";
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      print OUTFILE " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      print OUTFILE "<br>\n";
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> print OUTFILE "</font> <p>";
>>>>>>>>
>>>>>>>> $z=@compseqs;
>>>>>>>>
>>>>>>>> for($k=1;$k<$z;$k++) {
>>>>>>>>  print OUTFILE "<font face=\"Courier, monospace font
>>>>>>>> set\"><p>Compare
>>>>>>>> Sequence: <br>";
>>>>>>>>
>>>>>>>>  for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>>>>>>
>>>>>>>>      print OUTFILE substr ($compseqs[$k], $i, 1);
>>>>>>>>
>>>>>>>>      if ( ($i+1)%10==0){
>>>>>>>>          print OUTFILE " ";
>>>>>>>>      }
>>>>>>>>      if ( ($i+1)%60==0){
>>>>>>>>          print OUTFILE "<br>\n";
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  print OUTFILE "<p></font>";
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<p>
>>>>>>>> Window: <br>$in{'Windowsize'}
>>>>>>>> <p>
>>>>>>>> <p>
>>>>>>>> Threshold: <br>$in{'Threshold'}
>>>>>>>> <p>";
>>>>>>>> my $j=0;
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>>>>>>      if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>>>>>>          $j=$in{'Windowsize'};
>>>>>>>>      }
>>>>>>>>      $height=$out[$i]->{similar}*5;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ($j>0) {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>      $j--;
>>>>>>>>  }
>>>>>>>>  else {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      $outstring .= " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      $outstring .= "<br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%800==0){
>>>>>>>>      print OUTFILE "<br><br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>>>>>>> set\">$outstring</font>";
>>>>>>>>
>>>>>>>> #foreach (@out) {
>>>>>>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar}
>>>>>>>> matchs<p>";
>>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>>>>>>
>>>>>>>> #    }
>>>>>>>> #}
>>>>>>>>
>>>>>>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>>>>>>
>>>>>>>> close OUTFILE;
>>>>>>>>
>>>>>>>> #nameprint();
>>>>>>>>
>>>>>>>> sub parse_form {
>>>>>>>>  local ($buffer, @pairs, $pair, $name, $value);
>>>>>>>>  # Read in text
>>>>>>>>  $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>>>>>>  if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>>>>>>  {
>>>>>>>>      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>>>>>>  }
>>>>>>>>  else
>>>>>>>>  {
>>>>>>>>      $buffer = $ENV{'QUERY_STRING'};
>>>>>>>>  }
>>>>>>>>  @pairs = split(/&/, $buffer);
>>>>>>>>  foreach $pair (@pairs)
>>>>>>>>  {
>>>>>>>>      ($name, $value) = split(/=/, $pair);
>>>>>>>>      $value =~ tr/+/ /;
>>>>>>>>      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>>>>>>      $in{$name} = $value;
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>


From bernd.web at gmail.com  Thu Jan 21 13:37:18 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 21 Jan 2010 19:37:18 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
	<c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
Message-ID: <716af09c1001211037p59b19a29l1967f1e514469e79@mail.gmail.com>

Hi,

Regarding RemoteBlast, my I add a query?
It seems that Bio::Tools::Run::RemoteBlast  is sending each sequence
seperately to the NCBI (at least in BP 1.5.2).
This means that for each Sequence a RID is to be checked. Is this
indeed the case?
The BLAST URL-API or batch interface supports sending multiple
sequences at once.

Regards,
Bernd

On Thu, Jan 21, 2010 at 7:28 PM, Roopa Raghuveer <rtbio.2009 at gmail.com> wrote:
> Hello Mark,
>
> This is Roopa again. I have a small problem again. I am working on Remote
> blast. The program works well. But the problem is this. ?The program
> accesses the server and gets the output correctly. I am trying to send the
> result sequences into an array and I found that always the first sequence
> among the Result sequences is missing. The code is
>
> ?my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => "$organ\[ORGN]");


From cjfields at illinois.edu  Thu Jan 21 23:31:25 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Jan 2010 22:31:25 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
In-Reply-To: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
Message-ID: <BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>

Jay,

Did you want to release it to CPAN?  I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight.

chris

On Jan 18, 2010, at 6:22 PM, Jay Hannah wrote:

> I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do.
> 
>   http://github.com/jhannah/bio-broodcomb
> 
> It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. 
> 
> The first two functions I stuck in the framework:
> 
> Find subsequences (Bio::BroodComb::SubSeq):
> 
>   use Bio::BroodComb;
>   my $bc = Bio::BroodComb->new();
>   $bc->load_large_seq(file => "large_seq.fasta");
>   $bc->load_small_seq(file => "small_seq.fasta");
>   $bc->find_subseqs();
>   print $bc->subseq_report1;
> 
> In-silico PCR (Bio::BroodComb::PCR):
> 
>  use Bio::BroodComb;
>  my $bc = Bio::BroodComb->new();
>  $bc->load_large_seq(file => "large_seq.fasta");
>  $bc->add_primerset(
>     description    => "U5/R",   # however you want it reported
>     forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA',
>     reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT',
>  );
>  $bc->find_pcr_hits();
>  $bc->find_pcr_products();
>  print $bc->pcr_report1;
> 
> I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever.
> 
> Suggestions, contributions welcome.   :)
> 
>   http://github.com/jhannah/bio-broodcomb
> 
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Fri Jan 22 01:17:14 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 21 Jan 2010 22:17:14 -0800
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
Message-ID: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>

I'm considering putting in allowable initialization parameter (and get/ 
set) for Bio::AlignIO that would allow setting of the alphabet.  This  
is then passed to Bio::LocatableSeq creation so that _guess_alphabet  
isn't called. This will allow removal of warnings about empty  
sequences because _guess_alphabet won't be called on a sequence if we  
have explictly set the alphabet.

This worked great on my local install and tests pass.  Any objections  
or concerns?

basically it means when you make an AlignIO you can specify the  
alphabet i.e.

my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
file => 'genome.fasaln');

I have some alignments with empty sequences and I think turning off  
the warnings is appropriate where I force the alphabet choice. It  
should also have a very modest speedup benefit too.

-jason
--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From rtbio.2009 at gmail.com  Fri Jan 22 04:54:32 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 22 Jan 2010 10:54:32 +0100
Subject: [Bioperl-l] Fwd:  Regarding blast in Bioperl
In-Reply-To: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
	<c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
Message-ID: <c7cac1601001220154r4f92651ejb79663898e0b8fc2@mail.gmail.com>

---------- Forwarded message ----------
From: Roopa Raghuveer <rtbio.2009 at gmail.com>
Date: Thu, Jan 21, 2010 at 7:28 PM
Subject: Re: [Bioperl-l] Regarding blast in Bioperl
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: bioperl-l at lists.open-bio.org


Hello Mark,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen <maj at fortinbras.us> wrote:

>  Excellent Roopa- it's my pleasure-- MAJ
>
> ----- Original Message -----
>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
> *To:* Mark A. Jensen <maj at fortinbras.us>
>  *Sent:* Saturday, January 09, 2010 6:41 PM
> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>
> Hi Mark,
>
> Thank you very very much. The code is working now. Thanks for the support
> and time you have spent on me.
>
> Thanks in advance
> Roopa.
>
> On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>>  There is still a bug with the double quotes. Use "$organ\[ORGN]", which
>> prevents perl from
>> looking for a member of an array called @organ. This would have shown up
>> if 'use strict;' had
>> been in place. Still don't know whether this would work precisely; can you
>> send me the query
>> sequence so I can reproduce your ouput?
>> thanks MAJ
>>
>>  ----- Original Message -----
>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>> *Sent:* Saturday, January 09, 2010 2:02 PM
>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>
>> Hi Mark,
>>
>> I tried it with double quotes but still i got the same o/p with sequences
>> from different species.
>>
>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...  1813
>> 0.0
>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...  1622
>> 0.0
>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...   773
>> 0.0
>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...   749
>> 0.0
>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   542
>> 2e-151
>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...   196
>> 3e-47
>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...   190
>> 1e-45
>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...   181
>> 7e-43
>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...   179
>> 2e-42
>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   178
>> 8e-42
>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...   170
>> 1e-39
>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...   169
>> 4e-39
>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...   167
>> 1e-38
>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...   150
>> 1e-33
>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...   145
>> 5e-32
>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...   143
>> 2e-31
>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...   143
>> 2e-31
>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...   138
>> 7e-30
>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...   138
>> 7e-30
>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...   136
>> 2e-29
>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...   136
>> 2e-29
>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...   134
>> 9e-29
>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...   132
>> 3e-28
>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...   132
>> 3e-28
>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...   132
>> 3e-28
>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...   131
>> 1e-27
>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...   131
>> 1e-27
>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...   129
>> 4e-27
>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...   129
>> 4e-27
>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...   129
>> 4e-27
>> ref|NM_001003470.1|  Danio rerio protein kinase, cAMP-dependen...   129
>> 4e-27
>> ref|XM_001141503.1|  PREDICTED: Pan troglodytes verus protein ...   127
>> 1e-26
>> ref|XM_001145269.1|  PREDICTED: Pan troglodytes protein kinase...   127
>> 1e-26
>> ref|XM_512434.2|  PREDICTED: Pan troglodytes cAMP-dependent pr...   127
>> 1e-26
>> ref|XM_001171457.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_001171437.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_847420.1|  PREDICTED: Canis familiaris similar to Serin...   127
>> 1e-26
>> ref|NM_207518.1|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>> ref|NM_002730.3|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>>
>>
>> Thanks in advance.
>>
>> Roopa.
>>
>> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>>  I understand you. Put in the double quotes and see what happens.
>>>
>>>  ----- Original Message -----
>>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>> *To:* Mark A. Jensen <maj at fortinbras.us>
>>>   *Sent:* Saturday, January 09, 2010 1:40 PM
>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>
>>> Hi Mark,
>>>
>>> Thanks for your reply. It was working when I specifically use the name of
>>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a
>>> $organ which takes the organism given by the user i.e., let it be anything
>>>
>>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc.,  I should get the
>>> sequences related to only those organisms.
>>>
>>> i.e., If the user enters Pseudomonas,the $organ parameter of the code
>>> takes Pseudomonas ,does BLAST and returns only those sequences that produce
>>> significant alignment with Pseudomonas(only).But this is not happening like
>>> that .
>>>
>>> Please help me in this regard.
>>>
>>> Thanks in advance
>>> Roopa
>>>
>>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>
>>>>  Hi Roopa-- You may get what you want if you make the change.
>>>> With single quotes, ENTREZ_QUERY is set to the literal string
>>>>
>>>>  $organ[ORGN]
>>>>
>>>> while, with double quotes, the variable value will be substituted,
>>>> and the parameter should be set to
>>>>
>>>>  Trypanosoma brucei[ORGN]
>>>>
>>>> I'm guess that it worked because the database ignored the strange
>>>> parameter,
>>>> and returned all the matches. Try this and if it doesn't work I look
>>>> harder.
>>>> cheers,
>>>> Mark
>>>>
>>>>  ----- Original Message -----
>>>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>>>> *Sent:* Saturday, January 09, 2010 1:24 PM
>>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>>
>>>> hello Mark,
>>>>
>>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in
>>>> double quotations,but. I would like to have only those specific sequences
>>>> which are specific for my Organism i.e., I need sequences only from the
>>>> organism that I entered.
>>>>
>>>> When the organism is Trypanosoma brucei,I could get even Leishmania and
>>>> other species as the similar sequences. But I want to get only trypanosoma
>>>> brucei sequences.
>>>>
>>>> Could you please help me out in this regard?
>>>>
>>>> Roopa.
>>>>
>>>> My output
>>>>
>>>> I/P organism: Trypanosoma brucei
>>>>
>>>> O/P:-
>>>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1813    0.0
>>>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1622    0.0
>>>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 773    0.0
>>>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 749    0.0
>>>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 542    2e-151
>>>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...
>>>> 196    3e-47
>>>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 190    1e-45
>>>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...
>>>> 181    7e-43
>>>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 179    2e-42
>>>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 178    8e-42
>>>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 170    1e-39
>>>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...
>>>> 168    4e-39
>>>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...
>>>> 167    1e-38
>>>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 150    1e-33
>>>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...
>>>> 145    5e-32
>>>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...
>>>> 143    2e-31
>>>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...
>>>> 143    2e-31
>>>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...
>>>> 138    7e-30
>>>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...
>>>> 138    7e-30
>>>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...
>>>> 136    2e-29
>>>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...
>>>> 136    2e-29
>>>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...
>>>> 134    9e-29
>>>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...
>>>> 132    3e-28
>>>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...
>>>> 132    3e-28
>>>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...
>>>> 132    3e-28
>>>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...
>>>> 131    1e-27
>>>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 131    1e-27
>>>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...
>>>> 129    4e-27
>>>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...
>>>> 129    4e-27
>>>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...
>>>> 129    4e-27
>>>>
>>>> Roopa.
>>>>
>>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>>
>>>>> I see it immediately (from making same bug many times) :
>>>>>
>>>>>
>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>> =>
>>>>> - '$organ[ORGN]');
>>>>> +"$organ[ORGN]");
>>>>>
>>>>>
>>>>> MAJ
>>>>>
>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>> rtbio.2009 at gmail.com>
>>>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>>>> Cc: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Saturday, January 09, 2010 11:57 AM
>>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>>
>>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei
>>>>>> as
>>>>>> the organism parameter,but when I tried to use the Organism parameter
>>>>>> from
>>>>>> the user,it was not working i.e., I was unable to get the target
>>>>>> sequences.
>>>>>> Please help me in this regard. My code is
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>>
>>>>>> #path for extra camel module
>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>> use Roopablast;
>>>>>>
>>>>>>
>>>>>> use Bio::SearchIO;
>>>>>> use Bio::Search::Result::BlastResult;
>>>>>> use Bio::Perl;
>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::DB::GenBank;
>>>>>>
>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>> my $outstring ="";
>>>>>>
>>>>>> &parse_form;
>>>>>>
>>>>>> print "Content-type: text/html\n\n";
>>>>>> print "<HTML>\n";
>>>>>> print "<head><title>RNAi Result</title>";
>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>> print "</head>\n";
>>>>>> print "<body>\n";
>>>>>> print " Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>> print "</BODY>\n";
>>>>>> print "</HTML>\n";
>>>>>>
>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>> exit if $pid;
>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>
>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>
>>>>>> print OUTFILE "<HTML>\n
>>>>>> <head><title>RNAi Result</title>
>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>> </head>\n
>>>>>> <body>\n
>>>>>>  Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>> wait......<br>
>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>> </BODY>\n
>>>>>> </HTML>\n";
>>>>>>
>>>>>> close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
>>>>>>
>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>
>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>> $in{'Threshold'});
>>>>>>
>>>>>>
>>>>>> sub blastcode
>>>>>> {
>>>>>>
>>>>>> $inpu1= $_[0];
>>>>>>
>>>>>> $organ= $_[1];
>>>>>>
>>>>>> open(NUC,'>',$nuc);
>>>>>> print NUC $inpu1,"\n";
>>>>>> close(NUC);
>>>>>>
>>>>>> my $prog = 'blastn';
>>>>>> my $db   = 'refseq_rna';
>>>>>> my $e_val= '1e-10';
>>>>>> my $organism= $organ;
>>>>>>
>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>
>>>>>> my @params = ( '-prog' => $prog,
>>>>>>        '-data' => $db,
>>>>>>        '-expect' => $e_val,
>>>>>>        '-readmethod' => 'SearchIO',
>>>>>>       '-Organism'   => $organism );
>>>>>>
>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>            print OUTFILE $inpu1;
>>>>>>             close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>>> =>
>>>>>> '$organ[ORGN]');
>>>>>>
>>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>
>>>>>>  #change a paramter
>>>>>>
>>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>> Brucei[ORGN]';
>>>>>>
>>>>>> #change a paramter
>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>> '$input2[ORGN]';
>>>>>>
>>>>>>  my $v = 1;
>>>>>>  #$v is just to turn on and off the messages
>>>>>>
>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>> '-organism' => $organ );
>>>>>>
>>>>>>
>>>>>> while (my $input = $str->next_seq())
>>>>>> {
>>>>>>  #Blast a sequence against a database:
>>>>>>   #Alternatively, you could  pass in a file with many
>>>>>>   #sequences rather than loop through sequence one at a time
>>>>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>   #and swap the two lines below for an example of that.
>>>>>>
>>>>>>            #open(OUTFILE,'>',$debugfile);
>>>>>>             # print OUTFILE $input;
>>>>>>             #close(OUTFILE);
>>>>>>
>>>>>>
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>
>>>>>>               open(OUTFILE,'>',$debugfile);
>>>>>>            #   print OUTFILE $r;
>>>>>>               close(OUTFILE);
>>>>>>
>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>  #   open(OUTFILE,'>',$debugfile);
>>>>>>   #           print OUTFILE "while entered";
>>>>>>    #         close(OUTFILE);
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>
>>>>>>     #         open(OUTFILE,'>',$debugfile);
>>>>>>      #        print OUTFILE "foreach entered";
>>>>>>       #      close(OUTFILE);
>>>>>>
>>>>>>       my $rc = $factory->retrieve_blast($rid);
>>>>>>
>>>>>>       if( !ref($rc) )
>>>>>>       {
>>>>>>       if( $rc < 0 )
>>>>>>       {
>>>>>>       $factory->remove_rid($rid);
>>>>>>       }
>>>>>>        open(OUTFILE,'>',$debugfile);
>>>>>>        #      print OUTFILE "if entered";
>>>>>>             close(OUTFILE);
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>       }
>>>>>>      else {
>>>>>>         #    open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "else entered";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>         my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>         print BLASTDEBUGFILE $result->next_hit();
>>>>>>         close(BLASTDEBUGFILE);
>>>>>>
>>>>>>       my $filename =
>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>
>>>>>>        # open(DEBUGFILE,'>',$debugfile);
>>>>>>        # open(new,'>',$filename);
>>>>>>        # @arra=<new>;
>>>>>>        # print DEBUGFILE @arra;
>>>>>>        # close(DEBUGFILE);
>>>>>>        # close(new);
>>>>>>
>>>>>>        $factory->save_output($filename);
>>>>>>  # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>      # close(BLASTDEBUGFILE);
>>>>>>
>>>>>>      $factory->remove_rid($rid);
>>>>>>
>>>>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>      print BLASTDEBUGFILE  $organism;
>>>>>>       close(BLASTDEBUGFILE);
>>>>>>
>>>>>>   # open(OUTFILE,'>',$outfile);
>>>>>>   # print OUTFILE "Test2 $result->database_name()";
>>>>>>   # close(OUTFILE);
>>>>>>
>>>>>> #$hit = $result->next_hit;
>>>>>> #open(new,'>',$debugfile);
>>>>>> #print $hit;
>>>>>> #close(new);
>>>>>>
>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>
>>>>>>           next unless ( $v > 0);
>>>>>>
>>>>>>         #     open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "$hit in while hits";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>>>>                 push(@seqs,$dna);
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>>  }
>>>>>>
>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>  #print OUTFILE $seqs[0];
>>>>>>  #close(OUTFILE);
>>>>>>
>>>>>> return(@seqs);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Roopa.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Roopa--
>>>>>>>
>>>>>>> I got your code to work with the following changes:
>>>>>>>
>>>>>>> +# the input should be a valid FASTA file...
>>>>>>> ...
>>>>>>> open(NUC,'>',$nuc);
>>>>>>> +print NUC ">seq (need a name line for valid fasta)\n";
>>>>>>> print NUC $inpu1, "\n";
>>>>>>> close(NUC);
>>>>>>> ...
>>>>>>>
>>>>>>> +# you can set these header parms in the call itself...
>>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params,
>>>>>>> -ENTREZ_QUERY =>
>>>>>>> ''Trypanosoma Brucei[ORGN]');
>>>>>>>
>>>>>>>  #change a paramter
>>>>>>> +# commented this out...
>>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>> 'Trypanosoma
>>>>>>> Brucei[ORGN]';
>>>>>>>
>>>>>>> MAJ
>>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>>>> rtbio.2009 at gmail.com
>>>>>>> >
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, January 08, 2010 10:00 AM
>>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>>>>>>
>>>>>>>
>>>>>>>  Hello all,
>>>>>>>
>>>>>>>>
>>>>>>>> I was trying Remote blast using Bioperl. My input data is a
>>>>>>>> Trypanosoma
>>>>>>>> brucei sequence in Fasta format. When I was trying to submit to
>>>>>>>> BLAST
>>>>>>>> using
>>>>>>>> the step
>>>>>>>> $r=$factory->submit_blast($input)
>>>>>>>> It was not returning anything which I checked by debugging the code.
>>>>>>>> It is
>>>>>>>> not blasting my input sequence even though I mentioned all the
>>>>>>>> parameters.I
>>>>>>>> would paste the code below.
>>>>>>>>
>>>>>>>> Please help me in solving put this problem. It is very urgent.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Roopa.
>>>>>>>>
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #path for extra camel module
>>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>>>> use Roopablast;
>>>>>>>>
>>>>>>>>
>>>>>>>> use Bio::SearchIO;
>>>>>>>> use Bio::Search::Result::BlastResult;
>>>>>>>> use Bio::Perl;
>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>> use Bio::Seq;
>>>>>>>> use Bio::SeqIO;
>>>>>>>> use Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>> my $outstring ="";
>>>>>>>>
>>>>>>>> &parse_form;
>>>>>>>>
>>>>>>>> print "Content-type: text/html\n\n";
>>>>>>>> print "<HTML>\n";
>>>>>>>> print "<head><title>RNAi Result</title>";
>>>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>>>> print "</head>\n";
>>>>>>>> print "<body>\n";
>>>>>>>> print " Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>>>> print "</BODY>\n";
>>>>>>>> print "</HTML>\n";
>>>>>>>>
>>>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>>>> exit if $pid;
>>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>>>> </head>\n
>>>>>>>> <body>\n
>>>>>>>>  Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>>>> wait......<br>
>>>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>>>> </BODY>\n
>>>>>>>> </HTML>\n";
>>>>>>>>
>>>>>>>> close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> @compseqs = blastcode($in{'Inputseq'});
>>>>>>>>
>>>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>>>
>>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>>>> $in{'Threshold'});
>>>>>>>>
>>>>>>>>
>>>>>>>> sub blastcode
>>>>>>>> {
>>>>>>>>
>>>>>>>> $inpu1= $_[0];
>>>>>>>>
>>>>>>>> #$organ= $_[1];
>>>>>>>>
>>>>>>>> open(NUC,'>',$nuc);
>>>>>>>> print NUC $inpu1;
>>>>>>>> close(NUC);
>>>>>>>>
>>>>>>>> my $prog = 'blastn';
>>>>>>>> my $db   = 'refseq_rna';
>>>>>>>> my $e_val= '1e-10';
>>>>>>>> my $organism= 'Trypanosoma Brucei';
>>>>>>>>
>>>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>       '-data' => $db,
>>>>>>>>       '-expect' => $e_val,
>>>>>>>>       '-readmethod' => 'SearchIO',
>>>>>>>>       '-Organism'   => $organism );
>>>>>>>>
>>>>>>>>          # open(OUTFILE,'>',$debugfile);
>>>>>>>>           #  print OUTFILE @params;
>>>>>>>>           # close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>
>>>>>>>>  #change a paramter
>>>>>>>>
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>>>> Brucei[ORGN]';
>>>>>>>>
>>>>>>>> #change a paramter
>>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>>> '$input2[ORGN]';
>>>>>>>>
>>>>>>>>  my $v = 1;
>>>>>>>>  #$v is just to turn on and off the messages
>>>>>>>>
>>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>>>> '-organism' => 'Trypanosoma Brucei' );
>>>>>>>>
>>>>>>>>
>>>>>>>> while (my $input = $str->next_seq())
>>>>>>>> {
>>>>>>>>  #Blast a sequence against a database:
>>>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>>>  #and swap the two lines below for an example of that.
>>>>>>>>
>>>>>>>>           open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE $input;
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  my $r = $factory->submit_blast($input);    #The program stops here
>>>>>>>> it
>>>>>>>> does not return any value and it does not enter the While
>>>>>>>> loop,Please help
>>>>>>>> me in this regard.#
>>>>>>>>              open(OUTFILE,'>',$debugfile);
>>>>>>>>              print OUTFILE $r;
>>>>>>>>              close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>>>
>>>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>>>    open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "while entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>   foreach my $rid ( @rids ) {
>>>>>>>>
>>>>>>>>             open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "foreach entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>>>
>>>>>>>>      if( !ref($rc) )
>>>>>>>>      {
>>>>>>>>      if( $rc < 0 )
>>>>>>>>      {
>>>>>>>>      $factory->remove_rid($rid);
>>>>>>>>      }
>>>>>>>>       open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "if entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>       print STDERR "." if ( $v > 0 );
>>>>>>>>       sleep 5;
>>>>>>>>      }
>>>>>>>>     else {
>>>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "else entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>        my $result = $rc->next_result();
>>>>>>>>       #save the output
>>>>>>>>      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>        print BLASTDEBUGFILE $result->next_hit();
>>>>>>>>        close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>      my $filename =
>>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>>>
>>>>>>>>       # open(DEBUGFILE,'>',$debugfile);
>>>>>>>>       # open(new,'>',$filename);
>>>>>>>>       # @arra=<new>;
>>>>>>>>       # print DEBUGFILE @arra;
>>>>>>>>       # close(DEBUGFILE);
>>>>>>>>       # close(new);
>>>>>>>>
>>>>>>>>       $factory->save_output($filename);
>>>>>>>>
>>>>>>>>     # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>>>     # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>>>     # close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>     $factory->remove_rid($rid);
>>>>>>>>
>>>>>>>>     open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>     print BLASTDEBUGFILE  $organism;
>>>>>>>>      close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>  # open(OUTFILE,'>',$outfile);
>>>>>>>>  # print OUTFILE "Test2 $result->database_name()";
>>>>>>>>  # close(OUTFILE);
>>>>>>>>
>>>>>>>> #$hit = $result->next_hit;
>>>>>>>> #open(new,'>',$debugfile);
>>>>>>>> #print $hit;
>>>>>>>> #close(new);
>>>>>>>>
>>>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>>>
>>>>>>>>          next unless ( $v > 0);
>>>>>>>>
>>>>>>>>        #     open(OUTFILE,'>',$debugfile);
>>>>>>>>         #    print OUTFILE "$hit in while hits";
>>>>>>>>          #  close(OUTFILE);
>>>>>>>>
>>>>>>>>     my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>>>         my $dna = $sequ->seq();        # get the sequence as a
>>>>>>>> string
>>>>>>>>                push(@seqs,$dna);
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>    }
>>>>>>>>  }
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>>>  #print OUTFILE $seqs[0];
>>>>>>>>  #close(OUTFILE);
>>>>>>>>
>>>>>>>> return(@seqs);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile) || die ;
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>>>>>>> <body>\n
>>>>>>>> <p><font face=\"Courier, monospace font set\">
>>>>>>>> Inputsequence: <br>";
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      print OUTFILE " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      print OUTFILE "<br>\n";
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> print OUTFILE "</font> <p>";
>>>>>>>>
>>>>>>>> $z=@compseqs;
>>>>>>>>
>>>>>>>> for($k=1;$k<$z;$k++) {
>>>>>>>>  print OUTFILE "<font face=\"Courier, monospace font
>>>>>>>> set\"><p>Compare
>>>>>>>> Sequence: <br>";
>>>>>>>>
>>>>>>>>  for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>>>>>>
>>>>>>>>      print OUTFILE substr ($compseqs[$k], $i, 1);
>>>>>>>>
>>>>>>>>      if ( ($i+1)%10==0){
>>>>>>>>          print OUTFILE " ";
>>>>>>>>      }
>>>>>>>>      if ( ($i+1)%60==0){
>>>>>>>>          print OUTFILE "<br>\n";
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  print OUTFILE "<p></font>";
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<p>
>>>>>>>> Window: <br>$in{'Windowsize'}
>>>>>>>> <p>
>>>>>>>> <p>
>>>>>>>> Threshold: <br>$in{'Threshold'}
>>>>>>>> <p>";
>>>>>>>> my $j=0;
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>>>>>>      if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>>>>>>          $j=$in{'Windowsize'};
>>>>>>>>      }
>>>>>>>>      $height=$out[$i]->{similar}*5;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ($j>0) {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>      $j--;
>>>>>>>>  }
>>>>>>>>  else {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      $outstring .= " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      $outstring .= "<br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%800==0){
>>>>>>>>      print OUTFILE "<br><br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>>>>>>> set\">$outstring</font>";
>>>>>>>>
>>>>>>>> #foreach (@out) {
>>>>>>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar}
>>>>>>>> matchs<p>";
>>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>>>>>>
>>>>>>>> #    }
>>>>>>>> #}
>>>>>>>>
>>>>>>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>>>>>>
>>>>>>>> close OUTFILE;
>>>>>>>>
>>>>>>>> #nameprint();
>>>>>>>>
>>>>>>>> sub parse_form {
>>>>>>>>  local ($buffer, @pairs, $pair, $name, $value);
>>>>>>>>  # Read in text
>>>>>>>>  $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>>>>>>  if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>>>>>>  {
>>>>>>>>      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>>>>>>  }
>>>>>>>>  else
>>>>>>>>  {
>>>>>>>>      $buffer = $ENV{'QUERY_STRING'};
>>>>>>>>  }
>>>>>>>>  @pairs = split(/&/, $buffer);
>>>>>>>>  foreach $pair (@pairs)
>>>>>>>>  {
>>>>>>>>      ($name, $value) = split(/=/, $pair);
>>>>>>>>      $value =~ tr/+/ /;
>>>>>>>>      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>>>>>>      $in{$name} = $value;
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>


From maj at fortinbras.us  Fri Jan 22 07:34:59 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 07:34:59 -0500
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
Message-ID: <BB6A0E3FAC154E8FB690E5749375A1BC@NewLife>

I'm down with that.

----- Original Message ----- 
From: "Jason Stajich" <jason at bioperl.org>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 1:17 AM
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO


> I'm considering putting in allowable initialization parameter (and get/ 
> set) for Bio::AlignIO that would allow setting of the alphabet.  This  
> is then passed to Bio::LocatableSeq creation so that _guess_alphabet  
> isn't called. This will allow removal of warnings about empty  
> sequences because _guess_alphabet won't be called on a sequence if we  
> have explictly set the alphabet.
> 
> This worked great on my local install and tests pass.  Any objections  
> or concerns?
> 
> basically it means when you make an AlignIO you can specify the  
> alphabet i.e.
> 
> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
> file => 'genome.fasaln');
> 
> I have some alignments with empty sequences and I think turning off  
> the warnings is appropriate where I force the alphabet choice. It  
> should also have a very modest speedup benefit too.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From avilella at gmail.com  Fri Jan 22 08:07:26 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 13:07:26 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
Message-ID: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>

Hi,

I would like to write a script that merges fragments in a Bio::SimpleAlign
object on the basis of
some $seq->display_name rule.

I basically want to start with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.234     QWERTYU-------------------
seq2.345     ----------ASDFGH----------
seq2.456     -------------------ZXCVBNM

And end with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM

Can people suggest any Bio::SimpleAlign methods that would help here?

Cheers,

Albert.


From maj at fortinbras.us  Fri Jan 22 08:31:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 08:31:54 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
Message-ID: <EF1FEC1B43C146B6BBF827EA56171777@NewLife>

Here's one of my favorite tricks for this: XOR mask on gap symbol.
MAJ

use Bio::SeqIO;
use Bio::Seq;
use strict; 

my $seqio = Bio::SeqIO->new( -fh => \*DATA );

my $acc = $seqio->next_seq->seq ^ '-';
while ($_ = $seqio->next_seq ) {
    $acc ^= ($_->seq ^ '-');
}
my $mrg = Bio::Seq->new( -id => 'merged',
    -seq => $acc ^ '-' );
1;


__END__
>seq2.234     
QWERTYU-------------------
>seq2.345     
----------ASDFGH----------
>seq2.456     
-------------------ZXCVBNM

----- Original Message ----- 
From: "Albert Vilella" <avilella at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 8:07 AM
Subject: [Bioperl-l] Merging fragments in a simplealign


> Hi,
> 
> I would like to write a script that merges fragments in a Bio::SimpleAlign
> object on the basis of
> some $seq->display_name rule.
> 
> I basically want to start with something like this:
> 
> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> seq2.234     QWERTYU-------------------
> seq2.345     ----------ASDFGH----------
> seq2.456     -------------------ZXCVBNM
> 
> And end with something like this:
> 
> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
> 
> Can people suggest any Bio::SimpleAlign methods that would help here?
> 
> Cheers,
> 
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Fri Jan 22 08:34:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 07:34:07 -0600
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
Message-ID: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>

Sounds good to me.  The warnings are a bit too tight on this module anyway.

I still think we have plans towards refactoring some of this, not sure how far along they are:

http://www.bioperl.org/wiki/Align_Refactor

chris

On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote:

> I'm considering putting in allowable initialization parameter (and get/set) for Bio::AlignIO that would allow setting of the alphabet.  This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet.
> 
> This worked great on my local install and tests pass.  Any objections or concerns?
> 
> basically it means when you make an AlignIO you can specify the alphabet i.e.
> 
> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', -file => 'genome.fasaln');
> 
> I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Fri Jan 22 08:40:57 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 07:40:57 -0600
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <EF1FEC1B43C146B6BBF827EA56171777@NewLife>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
Message-ID: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>

May be something for the cook/scrapbook?

chris

On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:

> Here's one of my favorite tricks for this: XOR mask on gap symbol.
> MAJ
> 
> use Bio::SeqIO;
> use Bio::Seq;
> use strict; 
> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
> 
> my $acc = $seqio->next_seq->seq ^ '-';
> while ($_ = $seqio->next_seq ) {
>   $acc ^= ($_->seq ^ '-');
> }
> my $mrg = Bio::Seq->new( -id => 'merged',
>   -seq => $acc ^ '-' );
> 1;
> 
> 
> __END__
>> seq2.234     
> QWERTYU-------------------
>> seq2.345     
> ----------ASDFGH----------
>> seq2.456     
> -------------------ZXCVBNM
> 
> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 22, 2010 8:07 AM
> Subject: [Bioperl-l] Merging fragments in a simplealign
> 
> 
>> Hi,
>> I would like to write a script that merges fragments in a Bio::SimpleAlign
>> object on the basis of
>> some $seq->display_name rule.
>> I basically want to start with something like this:
>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> seq2.234     QWERTYU-------------------
>> seq2.345     ----------ASDFGH----------
>> seq2.456     -------------------ZXCVBNM
>> And end with something like this:
>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>> Can people suggest any Bio::SimpleAlign methods that would help here?
>> Cheers,
>> Albert.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From holland at eaglegenomics.com  Fri Jan 22 05:51:52 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 22 Jan 2010 10:51:52 +0000
Subject: [Bioperl-l] [BioSQL-l] SwissProt DE lines and UniProt XML /
	TagTree as XML in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <8FECCBDE-2DE1-40EE-B5A4-73BDAC893E2D@eaglegenomics.com>

Nice idea. Currently, BioJava just stores the complete section as a string without parsing it, but it provides a parser module for converting it into useful tag/value format within a user's program (but not to be stored in BioSQL).

On 21 Jan 2010, at 12:33, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andrea at biocomp.unibo.it  Fri Jan 22 07:18:32 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 22 Jan 2010 13:18:32 +0100 (CET)
Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML
	in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <2b6e30c4628585042366646a7b46386e.squirrel@lipid.biocomp.unibo.it>

I think that the point here can be a little broader, since not only the
swissprot DE lines carry complex and structured data.
To define a common, language-independent way to store structured data into
the comment and *_qualifier_value tables of the actual BioSQL schema could
be very useful.
XML looks like a good candidate to me, and the UniprotXML format can be
used as reference or as a template to start from.
Each Bio* project will then parse and report this structured data in its
own programming language data structure.

Andrea


> Hi all,
>
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
>
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
>
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
>
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
>
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
>
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
>
> Regards,
>
> Peter
>


From avilella at gmail.com  Fri Jan 22 11:04:13 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 16:04:13 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
Message-ID: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>

Is there/should be a 'have_pairwise_overlap' method similar to this?

# $seq1 and $seq3 have matching ids
my $seq1 = $aln->each_seq_by_id($seq1->display_id);
my $seq3 = $aln->each_seq_by_id($seq3->display_id);

my $ret = $aln->have_pairwise_overlap($seq1,$seq3);

On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu> wrote:

> May be something for the cook/scrapbook?
>
> chris
>
> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>
> > Here's one of my favorite tricks for this: XOR mask on gap symbol.
> > MAJ
> >
> > use Bio::SeqIO;
> > use Bio::Seq;
> > use strict;
> > my $seqio = Bio::SeqIO->new( -fh => \*DATA );
> >
> > my $acc = $seqio->next_seq->seq ^ '-';
> > while ($_ = $seqio->next_seq ) {
> >   $acc ^= ($_->seq ^ '-');
> > }
> > my $mrg = Bio::Seq->new( -id => 'merged',
> >   -seq => $acc ^ '-' );
> > 1;
> >
> >
> > __END__
> >> seq2.234
> > QWERTYU-------------------
> >> seq2.345
> > ----------ASDFGH----------
> >> seq2.456
> > -------------------ZXCVBNM
> >
> > ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
> > To: <bioperl-l at lists.open-bio.org>
> > Sent: Friday, January 22, 2010 8:07 AM
> > Subject: [Bioperl-l] Merging fragments in a simplealign
> >
> >
> >> Hi,
> >> I would like to write a script that merges fragments in a
> Bio::SimpleAlign
> >> object on the basis of
> >> some $seq->display_name rule.
> >> I basically want to start with something like this:
> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> >> seq2.234     QWERTYU-------------------
> >> seq2.345     ----------ASDFGH----------
> >> seq2.456     -------------------ZXCVBNM
> >> And end with something like this:
> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> >> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
> >> Can people suggest any Bio::SimpleAlign methods that would help here?
> >> Cheers,
> >> Albert.
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From maj at fortinbras.us  Fri Jan 22 11:02:55 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 11:02:55 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
Message-ID: <BE7957A2791345DAB092D997A4656AA8@NewLife>

http://www.bioperl.org/wiki/Merge_gapped_sequences_across_a_common_region
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Albert Vilella" <avilella at gmail.com>; <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 8:40 AM
Subject: Re: [Bioperl-l] Merging fragments in a simplealign


> May be something for the cook/scrapbook?
> 
> chris
> 
> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
> 
>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>> MAJ
>> 
>> use Bio::SeqIO;
>> use Bio::Seq;
>> use strict; 
>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>> 
>> my $acc = $seqio->next_seq->seq ^ '-';
>> while ($_ = $seqio->next_seq ) {
>>   $acc ^= ($_->seq ^ '-');
>> }
>> my $mrg = Bio::Seq->new( -id => 'merged',
>>   -seq => $acc ^ '-' );
>> 1;
>> 
>> 
>> __END__
>>> seq2.234     
>> QWERTYU-------------------
>>> seq2.345     
>> ----------ASDFGH----------
>>> seq2.456     
>> -------------------ZXCVBNM
>> 
>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 22, 2010 8:07 AM
>> Subject: [Bioperl-l] Merging fragments in a simplealign
>> 
>> 
>>> Hi,
>>> I would like to write a script that merges fragments in a Bio::SimpleAlign
>>> object on the basis of
>>> some $seq->display_name rule.
>>> I basically want to start with something like this:
>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>> seq2.234     QWERTYU-------------------
>>> seq2.345     ----------ASDFGH----------
>>> seq2.456     -------------------ZXCVBNM
>>> And end with something like this:
>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>> Cheers,
>>> Albert.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
>


From avilella at gmail.com  Fri Jan 22 12:50:57 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 17:50:57 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
	<358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
Message-ID: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>

Or to rephrase my answer, what is the closest way for the code below that
already exists?

On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:

> Is there/should be a 'have_pairwise_overlap' method similar to this?
>
> # $seq1 and $seq3 have matching ids
> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>
> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>
>
> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> May be something for the cook/scrapbook?
>>
>> chris
>>
>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>
>> > Here's one of my favorite tricks for this: XOR mask on gap symbol.
>> > MAJ
>> >
>> > use Bio::SeqIO;
>> > use Bio::Seq;
>> > use strict;
>> > my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>> >
>> > my $acc = $seqio->next_seq->seq ^ '-';
>> > while ($_ = $seqio->next_seq ) {
>> >   $acc ^= ($_->seq ^ '-');
>> > }
>> > my $mrg = Bio::Seq->new( -id => 'merged',
>> >   -seq => $acc ^ '-' );
>> > 1;
>> >
>> >
>> > __END__
>> >> seq2.234
>> > QWERTYU-------------------
>> >> seq2.345
>> > ----------ASDFGH----------
>> >> seq2.456
>> > -------------------ZXCVBNM
>> >
>> > ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>> >
>> > To: <bioperl-l at lists.open-bio.org>
>> > Sent: Friday, January 22, 2010 8:07 AM
>> > Subject: [Bioperl-l] Merging fragments in a simplealign
>> >
>> >
>> >> Hi,
>> >> I would like to write a script that merges fragments in a
>> Bio::SimpleAlign
>> >> object on the basis of
>> >> some $seq->display_name rule.
>> >> I basically want to start with something like this:
>> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> >> seq2.234     QWERTYU-------------------
>> >> seq2.345     ----------ASDFGH----------
>> >> seq2.456     -------------------ZXCVBNM
>> >> And end with something like this:
>> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> >> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>> >> Can people suggest any Bio::SimpleAlign methods that would help here?
>> >> Cheers,
>> >> Albert.
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From jay at jays.net  Fri Jan 22 13:30:57 2010
From: jay at jays.net (Jay Hannah)
Date: Fri, 22 Jan 2010 12:30:57 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
In-Reply-To: <BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>
References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
	<BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>
Message-ID: <EAD0FFCE-6DDF-4723-8D08-70ECF157FAAA@jays.net>

On Jan 21, 2010, at 10:31 PM, Chris Fields wrote:
> Did you want to release it to CPAN?  I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight.

Yes, I was thinking I would. No one has (yet) told me it's the worst idea ever, so I'm feeling encouraged.  :)

Given smallish inputs / databases (up to a few million rows) where some lightweight schema + SQLite + BioPerl can get the job done, it's nice to have a little easy-to-run toolbox. New tables and Roles bolt on easily, so I'll be adding them as they surface at $work[1]. 

Thanks for your interest.   :)

Jay Hannah
http://github.com/jhannah/bio-broodcomb
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From dalalhina at gmail.com  Fri Jan 22 12:31:09 2010
From: dalalhina at gmail.com (hina dalal)
Date: Fri, 22 Jan 2010 17:31:09 +0000
Subject: [Bioperl-l] Bioperl installation failed
Message-ID: <425f75df1001220931t49f5c768j97d91d2dd1757f19@mail.gmail.com>

Hi


I have installed PERL from Activesate and now trying to install bioperl but
can not do it . Neither from PPM (it is showing error ?Ppm install failed:
404 not found?) nor from CPAN / manual installation. It is not allowing me
to download nmake, showing that ?the version of this file is not compatible
with the version of windows you are running. Check your computer system
information to see whether you need 32 bit or 64 bit of this program.? I am
using windows VISTA.


Please help.


Regards


Hina


From H.Dalal at sms.ed.ac.uk  Fri Jan 22 12:34:55 2010
From: H.Dalal at sms.ed.ac.uk (Hina Dalal)
Date: Fri, 22 Jan 2010 17:34:55 +0000
Subject: [Bioperl-l] BioPerl installation failed: please help
Message-ID: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>

Hi

I have installed PERL from Activesate and now trying to install  
bioperl but can not do it . Neither from PPM (it is showing error ?Ppm  
install failed: 404 not found?) nor from CPAN manual installation. It  
is not allowing me to download nmake, showing that ?the version of  
this file is not compatible with the version of windows you are  
running. Check your computer system information to see whether you  
need 32 bit or 64 bit of this program.?

Please help.

Regards

Hina


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jason at bioperl.org  Fri Jan 22 14:18:30 2010
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 22 Jan 2010 11:18:30 -0800
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
	<55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>
Message-ID: <59EC9331-FB2F-4338-AD58-2D501A528A18@bioperl.org>

Done, as of r16739. Look forward to the refactor work too.

-jason
On Jan 22, 2010, at 5:34 AM, Chris Fields wrote:

> Sounds good to me.  The warnings are a bit too tight on this module  
> anyway.
>
> I still think we have plans towards refactoring some of this, not  
> sure how far along they are:
>
> http://www.bioperl.org/wiki/Align_Refactor
>
> chris
>
> On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote:
>
>> I'm considering putting in allowable initialization parameter (and  
>> get/set) for Bio::AlignIO that would allow setting of the  
>> alphabet.  This is then passed to Bio::LocatableSeq creation so  
>> that _guess_alphabet isn't called. This will allow removal of  
>> warnings about empty sequences because _guess_alphabet won't be  
>> called on a sequence if we have explictly set the alphabet.
>>
>> This worked great on my local install and tests pass.  Any  
>> objections or concerns?
>>
>> basically it means when you make an AlignIO you can specify the  
>> alphabet i.e.
>>
>> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
>> file => 'genome.fasaln');
>>
>> I have some alignments with empty sequences and I think turning off  
>> the warnings is appropriate where I force the alphabet choice. It  
>> should also have a very modest speedup benefit too.
>>
>> -jason
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>> http://twitter.com/hyphaltip
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From cjfields at illinois.edu  Fri Jan 22 14:22:43 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 13:22:43 -0600
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
	<358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
	<358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
Message-ID: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>

This could exist, but should go into a general Utilities module.  Part of the Align refactoring was to pull a good number of the methods into a general utilities module, so this would fit into that category.

chris

On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote:

> Or to rephrase my answer, what is the closest way for the code below that
> already exists?
> 
> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:
> 
>> Is there/should be a 'have_pairwise_overlap' method similar to this?
>> 
>> # $seq1 and $seq3 have matching ids
>> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
>> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>> 
>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>> 
>> 
>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>> 
>>> May be something for the cook/scrapbook?
>>> 
>>> chris
>>> 
>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>> 
>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>>>> MAJ
>>>> 
>>>> use Bio::SeqIO;
>>>> use Bio::Seq;
>>>> use strict;
>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>>>> 
>>>> my $acc = $seqio->next_seq->seq ^ '-';
>>>> while ($_ = $seqio->next_seq ) {
>>>>  $acc ^= ($_->seq ^ '-');
>>>> }
>>>> my $mrg = Bio::Seq->new( -id => 'merged',
>>>>  -seq => $acc ^ '-' );
>>>> 1;
>>>> 
>>>> 
>>>> __END__
>>>>> seq2.234
>>>> QWERTYU-------------------
>>>>> seq2.345
>>>> ----------ASDFGH----------
>>>>> seq2.456
>>>> -------------------ZXCVBNM
>>>> 
>>>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>>>> 
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Friday, January 22, 2010 8:07 AM
>>>> Subject: [Bioperl-l] Merging fragments in a simplealign
>>>> 
>>>> 
>>>>> Hi,
>>>>> I would like to write a script that merges fragments in a
>>> Bio::SimpleAlign
>>>>> object on the basis of
>>>>> some $seq->display_name rule.
>>>>> I basically want to start with something like this:
>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>> seq2.234     QWERTYU-------------------
>>>>> seq2.345     ----------ASDFGH----------
>>>>> seq2.456     -------------------ZXCVBNM
>>>>> And end with something like this:
>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>>>> Cheers,
>>>>> Albert.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Fri Jan 22 14:29:07 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 14:29:07 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com><EF1FEC1B43C146B6BBF827EA56171777@NewLife><058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu><358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com><358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
	<14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>
Message-ID: <0F7B7E5FE70D4C5CB34B27045561823C@NewLife>

I'd recommend making an enhancement request via Bugzilla, so we don't forget-
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Albert Vilella" <avilella at gmail.com>
Cc: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 2:22 PM
Subject: Re: [Bioperl-l] Merging fragments in a simplealign


> This could exist, but should go into a general Utilities module.  Part of the 
> Align refactoring was to pull a good number of the methods into a general 
> utilities module, so this would fit into that category.
>
> chris
>
> On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote:
>
>> Or to rephrase my answer, what is the closest way for the code below that
>> already exists?
>>
>> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:
>>
>>> Is there/should be a 'have_pairwise_overlap' method similar to this?
>>>
>>> # $seq1 and $seq3 have matching ids
>>> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
>>> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>>>
>>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>>>
>>>
>>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>>>
>>>> May be something for the cook/scrapbook?
>>>>
>>>> chris
>>>>
>>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>>>
>>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>>>>> MAJ
>>>>>
>>>>> use Bio::SeqIO;
>>>>> use Bio::Seq;
>>>>> use strict;
>>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>>>>>
>>>>> my $acc = $seqio->next_seq->seq ^ '-';
>>>>> while ($_ = $seqio->next_seq ) {
>>>>>  $acc ^= ($_->seq ^ '-');
>>>>> }
>>>>> my $mrg = Bio::Seq->new( -id => 'merged',
>>>>>  -seq => $acc ^ '-' );
>>>>> 1;
>>>>>
>>>>>
>>>>> __END__
>>>>>> seq2.234
>>>>> QWERTYU-------------------
>>>>>> seq2.345
>>>>> ----------ASDFGH----------
>>>>>> seq2.456
>>>>> -------------------ZXCVBNM
>>>>>
>>>>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>>>>>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Friday, January 22, 2010 8:07 AM
>>>>> Subject: [Bioperl-l] Merging fragments in a simplealign
>>>>>
>>>>>
>>>>>> Hi,
>>>>>> I would like to write a script that merges fragments in a
>>>> Bio::SimpleAlign
>>>>>> object on the basis of
>>>>>> some $seq->display_name rule.
>>>>>> I basically want to start with something like this:
>>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>>> seq2.234     QWERTYU-------------------
>>>>>> seq2.345     ----------ASDFGH----------
>>>>>> seq2.456     -------------------ZXCVBNM
>>>>>> And end with something like this:
>>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>>>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>>>>> Cheers,
>>>>>> Albert.
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Fri Jan 22 14:33:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 14:33:41 -0500
Subject: [Bioperl-l] BioPerl installation failed: please help
In-Reply-To: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>
References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>
Message-ID: <2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife>

Hina-- 
See the protocol at 
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
for ActiveState installation. If it doesn't work, please let us know at which 
step the failure happened.
cheers, MAJ
----- Original Message ----- 
From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 12:34 PM
Subject: [Bioperl-l] BioPerl installation failed: please help


Hi

I have installed PERL from Activesate and now trying to install
bioperl but can not do it . Neither from PPM (it is showing error "Ppm
install failed: 404 not found") nor from CPAN manual installation. It
is not allowing me to download nmake, showing that "the version of
this file is not compatible with the version of windows you are
running. Check your computer system information to see whether you
need 32 bit or 64 bit of this program."

Please help.

Regards

Hina


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Fri Jan 22 15:13:15 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 15:13:15 -0500
Subject: [Bioperl-l] BioPerl installation failed: please help
In-Reply-To: <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk>
References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk><2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife>
	<20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk>
Message-ID: <9E5DE384E2C8416B8373E390ABDB7DFE@NewLife>

Ok Hina,
I'm not seeing any issues with the presence or availability of 
http://bioperl.org/DIST
from my machine. Can you access that url in a browser? If not, the king of the 
King's
Buildings may not be allowing access. Also, can you do the following:

C:> ppm-shell
ppm> repo list

Note the number of the repo that corresponds to bioperl (if any) and do

ppm> repo describe n

where 'n' is that number, and send the output along.

cheers, MAJ

----- Original Message ----- 
From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Sent: Friday, January 22, 2010 3:01 PM
Subject: Re: [Bioperl-l] BioPerl installation failed: please help


Hi Mark

warm regards

I was following that protocol only , but the problem is when I tried
to do it from PPM, and when I reach at the stem install BioPerl, it is
showing error "Ppm
install failed: 404 not found" in the end. and when I tried it by CPAN
/manual installation, I couldn't download nmake,its showing that "the
version of
this file is not compatible with the version of windows you are
running. Check your computer system information to see whether you
need 32 bit or 64 bit of this program and than contact the software
publisher."


What should I do? Please help.

Regards

Hina


Quoting "Mark A. Jensen" <maj at fortinbras.us>:

> Hina-- See the protocol at
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
> for ActiveState installation. If it doesn't work, please let us know at
> which step the failure happened.
> cheers, MAJ
> ----- Original Message ----- From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 22, 2010 12:34 PM
> Subject: [Bioperl-l] BioPerl installation failed: please help
>
>
> Hi
>
> I have installed PERL from Activesate and now trying to install
> bioperl but can not do it . Neither from PPM (it is showing error "Ppm
> install failed: 404 not found") nor from CPAN manual installation. It
> is not allowing me to download nmake, showing that "the version of
> this file is not compatible with the version of windows you are
> running. Check your computer system information to see whether you
> need 32 bit or 64 bit of this program."
>
> Please help.
>
> Regards
>
> Hina
>
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From pengyu.ut at gmail.com  Sun Jan 24 20:29:59 2010
From: pengyu.ut at gmail.com (Peng Yu)
Date: Sun, 24 Jan 2010 19:29:59 -0600
Subject: [Bioperl-l] Transcribe in bioperl
Message-ID: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>

I found the function 'translate' in bioperl. But I don't find
'transcribe'. Is there such a function?


From jason at bioperl.org  Sun Jan 24 21:06:48 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 24 Jan 2010 18:06:48 -0800
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
Message-ID: <BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>

What exactly do you want to do?
spliced_seq for a feature would be the closest thing...

-jason
On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:

> I found the function 'translate' in bioperl. But I don't find
> 'transcribe'. Is there such a function?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From pengyu.ut at gmail.com  Sun Jan 24 21:22:12 2010
From: pengyu.ut at gmail.com (Peng Yu)
Date: Sun, 24 Jan 2010 20:22:12 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
	<BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
Message-ID: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>

To convert from T to U. I could use perl's builtin function. But it is
semantically far away from 'transcribe'. If there is a function with
name 'transcribe', it will be better.

On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
> What exactly do you want to do?
> spliced_seq for a feature would be the closest thing...
>
> -jason
> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>
>> I found the function 'translate' in bioperl. But I don't find
>> 'transcribe'. Is there such a function?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
>
>


From maj at fortinbras.us  Sun Jan 24 21:48:33 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 24 Jan 2010 21:48:33 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
Message-ID: <FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>

Not a bad idea, a semantics-preserving/checking thing. 
transcribe() could return an object with alphabet == 'rna'
and the T's flipped, or bork if called against an object with alphbet != 'dna'.
I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to 
be stashed), if desired.

----- Original Message ----- 
From: "Peng Yu" <pengyu.ut at gmail.com>
To: "Jason Stajich" <jason at bioperl.org>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 24, 2010 9:22 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


> To convert from T to U. I could use perl's builtin function. But it is
> semantically far away from 'transcribe'. If there is a function with
> name 'transcribe', it will be better.
> 
> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>> What exactly do you want to do?
>> spliced_seq for a feature would be the closest thing...
>>
>> -jason
>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>
>>> I found the function 'translate' in bioperl. But I don't find
>>> 'transcribe'. Is there such a function?
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>> http://twitter.com/hyphaltip
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Sun Jan 24 23:39:43 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 22:39:43 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
	<FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
Message-ID: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>

I think the main reason there hasn't been a transcribe() is that very few users ask for it.  Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() and/or translate() (i.e. they don't care about the intermediate mRNA).  I don't have a problem with adding a transcribe method to PrimarySeq, but (and Mark has already picked up on this) it should be constrained to DNA only and return RNA.  And there might be a case for adding the analogous reverse_translate().  

Also worth adding this to the proper interface class (PrimarySeqI, I think) so all Seq/PrimarySeq will have it (or have to implement their own).

chris

On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote:

> Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna'
> and the T's flipped, or bork if called against an object with alphbet != 'dna'.
> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired.
> 
> ----- Original Message ----- From: "Peng Yu" <pengyu.ut at gmail.com>
> To: "Jason Stajich" <jason at bioperl.org>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Sunday, January 24, 2010 9:22 PM
> Subject: Re: [Bioperl-l] Transcribe in bioperl
> 
> 
>> To convert from T to U. I could use perl's builtin function. But it is
>> semantically far away from 'transcribe'. If there is a function with
>> name 'transcribe', it will be better.
>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>>> What exactly do you want to do?
>>> spliced_seq for a feature would be the closest thing...
>>> 
>>> -jason
>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>> 
>>>> I found the function 'translate' in bioperl. But I don't find
>>>> 'transcribe'. Is there such a function?
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> --
>>> Jason Stajich
>>> jason.stajich at gmail.com
>>> jason at bioperl.org
>>> http://fungalgenomes.org/
>>> http://twitter.com/hyphaltip
>>> 
>>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Sun Jan 24 23:43:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 22:43:07 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
	<FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
	<B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
Message-ID: <489E0B85-0BC3-45DB-8660-494CF69F35FF@illinois.edu>


On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:

> ...And there might be a case for adding the analogous reverse_translate().  

Bah.  Meant reverse_transcribe().  Ah well.

chris


From dan.kortschak at adelaide.edu.au  Mon Jan 25 00:33:28 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 25 Jan 2010 16:03:28 +1030
Subject: [Bioperl-l] BEDTools module
Message-ID: <1264397608.4898.9.camel@epistle>

Hi All,

A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
and Ira Hall is now available in the bioperl-run subversion repository
(bioperl-run/trunk r16754).

Using BEDTools you can, among other things:

      * Intersecting two BED files in search of overlapping features.
      * Merging overlapping features.
      * Screening for paired-end (PE) overlaps between PE sequences and
        existing genomic features.
      * Calculating the depth and breadth of sequence coverage across
        defined "windows" in a genome.

(see <http://code.google.com/p/bedtools/> for manuals and downloads).

BEDTools is a suite of 17 commandline executable. The module attempts to
provide and options comprehensively and can return Bio::SeqIO or
Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
where specific handling has not been implemented - please give feedback
on desired features for this).

cheers
Dan


From cjfields at illinois.edu  Mon Jan 25 00:35:06 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 23:35:06 -0600
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
Message-ID: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>

Just a quick question for those using DNAStatistics.  I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data:

>seq1
GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>seq2
GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>seq3
GGTACCAGCAGGTGGTCCGCCTA------------------------------
>seq4
--------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC

Since seq3 and seq4 don't overlap, the distance can't be calculated.  In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage.  Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere?

chris


From jason at bioperl.org  Mon Jan 25 00:58:03 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 24 Jan 2010 21:58:03 -0800
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
In-Reply-To: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
Message-ID: <B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>

It could also return -1 which is used as place holder for NA in other  
programs that generate distance matrices.
-jason
On Jan 24, 2010, at 9:35 PM, Chris Fields wrote:

> Just a quick question for those using DNAStatistics.  I just fixed a  
> bug in Bio::Align::DNAStatistics that failed with a div by zero  
> error (bug 2901) on this data:
>
>> seq1
> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>> seq2
> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>> seq3
> GGTACCAGCAGGTGGTCCGCCTA------------------------------
>> seq4
> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC
>
> Since seq3 and seq4 don't overlap, the distance can't be  
> calculated.  In our case, I replace the score with 'NA' as a  
> placeholder, but I'm worried about downstream app breakage.  Anyone  
> have an objection to using 'NA' here, or know of ways this may lead  
> to problems elsewhere?
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From maj at fortinbras.us  Mon Jan 25 08:17:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 08:17:54 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org><366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com><FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
	<B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
Message-ID: <ED0F320909EF4DB99FF0C91423F83209@NewLife>

transcribe() and rev_transcribe added to Bio::PrimarySeqI, plus tests in 
t/Seq.t, @ r16757
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Peng Yu" <pengyu.ut at gmail.com>
Sent: Sunday, January 24, 2010 11:39 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


>I think the main reason there hasn't been a transcribe() is that very few users 
>ask for it.  Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() 
>and/or translate() (i.e. they don't care about the intermediate mRNA).  I don't 
>have a problem with adding a transcribe method to PrimarySeq, but (and Mark has 
>already picked up on this) it should be constrained to DNA only and return RNA. 
>And there might be a case for adding the analogous reverse_translate().
>
> Also worth adding this to the proper interface class (PrimarySeqI, I think) so 
> all Seq/PrimarySeq will have it (or have to implement their own).
>
> chris
>
> On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote:
>
>> Not a bad idea, a semantics-preserving/checking thing. transcribe() could 
>> return an object with alphabet == 'rna'
>> and the T's flipped, or bork if called against an object with alphbet != 
>> 'dna'.
>> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to 
>> be stashed), if desired.
>>
>> ----- Original Message ----- From: "Peng Yu" <pengyu.ut at gmail.com>
>> To: "Jason Stajich" <jason at bioperl.org>
>> Cc: <bioperl-l at lists.open-bio.org>
>> Sent: Sunday, January 24, 2010 9:22 PM
>> Subject: Re: [Bioperl-l] Transcribe in bioperl
>>
>>
>>> To convert from T to U. I could use perl's builtin function. But it is
>>> semantically far away from 'transcribe'. If there is a function with
>>> name 'transcribe', it will be better.
>>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>> What exactly do you want to do?
>>>> spliced_seq for a feature would be the closest thing...
>>>>
>>>> -jason
>>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>>>
>>>>> I found the function 'translate' in bioperl. But I don't find
>>>>> 'transcribe'. Is there such a function?
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at gmail.com
>>>> jason at bioperl.org
>>>> http://fungalgenomes.org/
>>>> http://twitter.com/hyphaltip
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Mon Jan 25 08:23:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 07:23:12 -0600
Subject: [Bioperl-l] BEDTools module
In-Reply-To: <1264397608.4898.9.camel@epistle>
References: <1264397608.4898.9.camel@epistle>
Message-ID: <0F5CE93E-0E6C-4317-806B-A463A9B0917E@illinois.edu>

Great work Dan!  

chris

On Jan 24, 2010, at 11:33 PM, Dan Kortschak wrote:

> Hi All,
> 
> A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
> and Ira Hall is now available in the bioperl-run subversion repository
> (bioperl-run/trunk r16754).
> 
> Using BEDTools you can, among other things:
> 
>      * Intersecting two BED files in search of overlapping features.
>      * Merging overlapping features.
>      * Screening for paired-end (PE) overlaps between PE sequences and
>        existing genomic features.
>      * Calculating the depth and breadth of sequence coverage across
>        defined "windows" in a genome.
> 
> (see <http://code.google.com/p/bedtools/> for manuals and downloads).
> 
> BEDTools is a suite of 17 commandline executable. The module attempts to
> provide and options comprehensively and can return Bio::SeqIO or
> Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
> where specific handling has not been implemented - please give feedback
> on desired features for this).
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 25 08:27:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 07:27:26 -0600
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
In-Reply-To: <B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>
References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
	<B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>
Message-ID: <D46CA8B2-780B-4AA5-B9E3-07EADC0D79C1@illinois.edu>

That works for me, just want to ensure we're DTRT.  I'll change it over.

chris

On Jan 24, 2010, at 11:58 PM, Jason Stajich wrote:

> It could also return -1 which is used as place holder for NA in other programs that generate distance matrices.
> -jason
> On Jan 24, 2010, at 9:35 PM, Chris Fields wrote:
> 
>> Just a quick question for those using DNAStatistics.  I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data:
>> 
>>> seq1
>> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>>> seq2
>> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>>> seq3
>> GGTACCAGCAGGTGGTCCGCCTA------------------------------
>>> seq4
>> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC
>> 
>> Since seq3 and seq4 don't overlap, the distance can't be calculated.  In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage.  Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere?
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jan 25 08:41:38 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 08:41:38 -0500
Subject: [Bioperl-l] BEDTools module
In-Reply-To: <1264397608.4898.9.camel@epistle>
References: <1264397608.4898.9.camel@epistle>
Message-ID: <8D494783F87E4C32BD797008E260C3C2@NewLife>

Rock 'n' roll, Dan!
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 12:33 AM
Subject: [Bioperl-l] BEDTools module


> Hi All,
> 
> A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
> and Ira Hall is now available in the bioperl-run subversion repository
> (bioperl-run/trunk r16754).
> 
> Using BEDTools you can, among other things:
> 
>      * Intersecting two BED files in search of overlapping features.
>      * Merging overlapping features.
>      * Screening for paired-end (PE) overlaps between PE sequences and
>        existing genomic features.
>      * Calculating the depth and breadth of sequence coverage across
>        defined "windows" in a genome.
> 
> (see <http://code.google.com/p/bedtools/> for manuals and downloads).
> 
> BEDTools is a suite of 17 commandline executable. The module attempts to
> provide and options comprehensively and can return Bio::SeqIO or
> Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
> where specific handling has not been implemented - please give feedback
> on desired features for this).
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From rtbio.2009 at gmail.com  Mon Jan 25 08:43:19 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 14:43:19 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
Message-ID: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>

Hello Mark,Chris and all,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");
- Show quoted text -


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".
time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


From rtbio.2009 at gmail.com  Mon Jan 25 08:44:57 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 14:44:57 +0100
Subject: [Bioperl-l] remote blast bioperl
Message-ID: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>

Hello all,

I have a small problem again. I am working on Remote blast. The program
works well. But the problem is this.  The program accesses the server and
gets the output correctly. I am trying to send the result sequences into an
array and I found that always the first sequence among the Result sequences
is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");
- Show quoted text -


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".
time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


From cjfields at illinois.edu  Mon Jan 25 09:05:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 08:05:44 -0600
Subject: [Bioperl-l] remote blast bioperl
In-Reply-To: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>
References: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>
Message-ID: <7E402CC5-9C66-4315-B437-7C4EC2317371@illinois.edu>

Roopa,

We have received all 4+ of your posts.  There is absolutely no need for you to keep repeatedly posting the same thing to the list.  Be patient, we'll try to get to you as soon as we can!

chris

On Jan 25, 2010, at 7:44 AM, Roopa Raghuveer wrote:

> Hello all,
> 
> I have a small problem again. I am working on Remote blast. The program works well. But the problem is this.  The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is
> 
>  my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]");
> - Show quoted text -
> 
> 
> while (my $input = $str->next_seq())
> {
>    #Blast a sequence against a database:
>     #Alternatively, you could  pass in a file with many
>     #sequences rather than loop through sequence one at a time
>     #Remove the loop starting 'while (my $input = $str->next_seq())'
>     #and swap the two lines below for an example of that.
> 
>              open(OUTFILE,'>',$debugfile);
>                print OUTFILE $input;
>               close(OUTFILE);
> 
> 
>    my $r = $factory->submit_blast($input);
> 
>                 open(OUTFILE,'>',$debugfile);
>              #   print OUTFILE $r;
>                 close(OUTFILE);
> 
> 
>    print STDERR "waiting...." if($v>0);
> 
>   while ( my @rids = $factory->each_rid ) {
>       open(OUTFILE,'>',$debugfile);
>             #   print OUTFILE "while entered";
>               close(OUTFILE);
>      foreach my $rid ( @rids ) {
> 
>                open(OUTFILE,'>',$debugfile);
>  #  print OUTFILE "foreach entered";
>               close(OUTFILE);
> 
>         my $rc = $factory->retrieve_blast($rid);
> 
>         if( !ref($rc) )
>         {
>         if( $rc < 0 )
>         {
>         $factory->remove_rid($rid);
>         }
>          open(OUTFILE,'>',$debugfile);
>               # print OUTFILE "if entered";
>               close(OUTFILE);
>          print STDERR "." if ( $v > 0 );
>          sleep 5;
>         }
>        else {
>               open(OUTFILE,'>',$debugfile);
>               # print OUTFILE "else entered";
>               close(OUTFILE);
> 
>           my $result = $rc->next_result();
>          #save the output
>         $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>           open(BLASTDEBUGFILE,'>',$blastdebugfile);
>           print BLASTDEBUGFILE $result->next_hit();
>           close(BLASTDEBUGFILE);
> 
>         my $filename = $serverpath."/blastdata_".
> time()."\.out";
> 
> 
>          # open(DEBUGFILE,'>',$debugfile);
>          # open(new,'>',$filename);
>          # @arra=<new>;
>          # print DEBUGFILE @arra;
>          # close(DEBUGFILE);
>          # close(new);
> 
>          $factory->save_output($filename);
> 
>        # open(BLASTDEBUGFILE,'>',$debugfile);
>        # print BLASTDEBUGFILE  "Hello $rid";
>        # close(BLASTDEBUGFILE);
> 
>        $factory->remove_rid($rid);
> 
>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>        print BLASTDEBUGFILE  $organism;
>         close(BLASTDEBUGFILE);
> 
>     # open(OUTFILE,'>',$outfile);
>     # print OUTFILE "Test2 $result->database_name()";
>     # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> $dummy=0;
> 
>    while ( my $hit = $result->next_hit ) {
> 
>             next unless ( $v >= 0);
> 
>           #     open(OUTFILE,'>',$debugfile);
>            #    print OUTFILE "$hit in while hits";
>             #  close(OUTFILE);
>  my $sequ = $gb->get_Seq_by_version($hit->name);
>            my $dna = $sequ->seq(); # get the sequence as a string
>         $dummy++;
>              open(OUTFILE,'>',$debugfile);
>           #     print OUTFILE $dummy;
>               close(OUTFILE);
>           push(@seqs,$dna);
>          }
>         }
>       }
>     }
>   }
> 
> $warum=@seqs;
>  open(OUTFILE,'>',$debugfile);
>              #  print OUTFILE $warum;
>                print OUTFILE @seqs;
> 
>               close(OUTFILE);
> return(@seqs);
> }
> 
> open(OUTFILE, '>',$outfile) || die ;
> 
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <meta http-equiv=\"expires\" content=\"0\"></head>\n
> <body>\n
> <p><font face=\"Courier, monospace font set\">
> Inputsequence: <br>";
> 
> 
> Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was  3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences.
> 
> Please help me in sorting out this problem.
> 
> Regards,
> Roopa.


From jiann-jy at hotmail.com  Sun Jan 24 21:03:55 2010
From: jiann-jy at hotmail.com (JY)
Date: Sun, 24 Jan 2010 18:03:55 -0800 (PST)
Subject: [Bioperl-l] how to retrieve accession number by taxon id??
Message-ID: <4cef88b5-fa53-4e63-9167-30075c10a058@k19g2000yqc.googlegroups.com>

i need to retrieve accession number and sequence to complete one of my
part in my project, but how to retrieve accession number  by the taxon
id.


From lpaulet at ual.es  Mon Jan 25 15:25:55 2010
From: lpaulet at ual.es (Lorenzo Carretero-Paulet)
Date: Mon, 25 Jan 2010 21:25:55 +0100
Subject: [Bioperl-l] HTMLResultWriter
Message-ID: <4B5DFE53.2000201@ual.es>

Hi all,

I'm trying to generate a subroutine that performs a BLAST search and 
returns the corresponding reports in txt, xml and html format. I?m 
experiencing problems with the latter, as the program returns the 
following error message:

"Can't call method "next_result" without a package or object reference 
at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e 
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e 
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                 -file   => ">$outputfilenameH");
while( my $result = _$blast_report_->next_result ) { # get a result from 
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


From lpaulet at ual.es  Mon Jan 25 15:31:08 2010
From: lpaulet at ual.es (lpaulet at ual.es)
Date: Mon, 25 Jan 2010 21:31:08 +0100
Subject: [Bioperl-l] HTMLResultWriter
Message-ID: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>

Hi all,

I'm trying to generate a subroutine that performs a BLAST search and  
returns the corresponding reports in txt, xml and html format. I?m  
experiencing problems with the latter, as the program returns the  
following error message:

"Can't call method "next_result" without a package or object reference at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my $blast_report =qx/$blast -p blastp -d $database -i $query -e  
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e  
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                  -file   => ">$outputfilenameH");
while( my $result = $blast_report->next_result ) { # get a result from  
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


From dan.kortschak at adelaide.edu.au  Mon Jan 25 16:00:37 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 26 Jan 2010 07:30:37 +1030
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
Message-ID: <1264453237.4552.3.camel@epistle>

A reverse_translate to IUPAC degenerate codes is not a bad idea,
particularly for PCR primer design.

Dan

On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org
wrote:
> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:
> 
> > ...And there might be a case for adding the analogous
> reverse_translate().  
> 
> Bah.  Meant reverse_transcribe().  Ah well.
> 
> chris


From maj at fortinbras.us  Mon Jan 25 16:07:49 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 16:07:49 -0500
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>
References: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>
Message-ID: <F5772AAC495D475DBEEEF2311B16F941@NewLife>

Lorenzo--
your $blast_report is set to be (some of) the text returned
by a system call of a blast program; this isn't going to be
an object of any kind, and so no functions can be
called from it (as at "$blast_report->next_result"). You need
to parse the text generated by the blast call using Bio::SearchIO
to get a Bio::Search::Result::BlastResult object.
you could do

@blast_lines = qx/ ...your blast call... /;
open my $bf, ">my.blast";
print $bf, @blast_lines;
close $bf;
$blast_result = Bio::SearchIO->new(-file=>'my.blast',
                                                        -format => 'blast');

and carry on from there. But why not look at
Bio::Tools::Run::StandAloneBlast or
Bio::Tools::Run::StandAloneBlastPlus
to run your blasts within perl? These wrap the blast
programs and deliver BioPerl objects, rather than
plain text output.
cheers MAJ
----- Original Message ----- 
From: <lpaulet at ual.es>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 3:31 PM
Subject: [Bioperl-l] HTMLResultWriter


Hi all,

I'm trying to generate a subroutine that performs a BLAST search and
returns the corresponding reports in txt, xml and html format. I?m
experiencing problems with the latter, as the program returns the
following error message:

"Can't call method "next_result" without a package or object reference at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my $blast_report =qx/$blast -p blastp -d $database -i $query -e
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                  -file   => ">$outputfilenameH");
while( my $result = $blast_report->next_result ) { # get a result from
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Mon Jan 25 16:09:24 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 25 Jan 2010 22:09:24 +0100
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <4B5DFE53.2000201@ual.es>
References: <4B5DFE53.2000201@ual.es>
Message-ID: <FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>

> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/;

> while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory


_$blast_report_ is not a valid variable name, as far as I know. Plus there's a space between report and the final '_' in the first of the above two lines.

Does this code compile?

Dave


From Russell.Smithies at agresearch.co.nz  Mon Jan 25 16:14:15 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 26 Jan 2010 10:14:15 +1300
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
References: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>

That's a fair mix of incomplete code you've supplied!!
Did you read the documentation for RemoteBlast? The example there will do 99% of what you want.
http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm

I'm not entirely sure what you're trying to do (as you've left out a bit of your code) but I assume you're trying to retrieve and print the sequence for each hit.

Here's something that works, not sure exactly what/why you want to print but it should get you a bit further.

--Russell


================================
#!perl -w

use Bio::Tools::Run::RemoteBlast;
use Bio::DB::GenBank;

use CGI ':standard';

use strict;

my $q = new CGI;

my @params = (
               -prog         => 'blastn',
               -data         => 'nr',
               -expect       => '1e-30',
               -entrez_query => 'Homo sapiens [ORGN]',
               -readmethod   => 'SearchIO'
);

my $gb = Bio::DB::GenBank->new;

my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

#$v is just to turn on and off the messages
my $v = 1;

my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" );

while ( my $input = $str->next_seq() ) {

  my $r = $factory->submit_blast($input);

  print STDERR "waiting..." if ( $v > 0 );
  while ( my @rids = $factory->each_rid ) {
    foreach my $rid (@rids) {
      my @seqs = ();
      my $rc   = $factory->retrieve_blast($rid);
      if ( !ref($rc) ) {
        if ( $rc < 0 ) {
          $factory->remove_rid($rid);
        }
        print STDERR "." if ( $v > 0 );
        sleep 5;
      }
      else {
        my $result = $rc->next_result();

        #save the blast output
        my $filename = $result->query_accession . '.out';
        $factory->save_output($filename);
        $factory->remove_rid($rid);
        print "\nQuery Name: ", $result->query_name(), "\n";
        while ( my $hit = $result->next_hit ) {

          # store the hit sequences
          push @seqs, $gb->get_Seq_by_version( $hit->name );

          next unless ( $v > 0 );
          print "\thit name is ", $hit->name, "\n";
          while ( my $hsp = $hit->next_hsp ) {
            print "\t\tscore is ", $hsp->score, "\n";
          }
        }

        ## print the seqs you've retrieved??
        open( OUTFILE, '>', $result->query_accession . '.htm' );
        print OUTFILE $q->start_html('RNAi Result'),
          $q->h1('RNAi Result'),
          $q->h2('Input'),
          $q->pre( toString($input) ),
          $q->h2('Output');

        foreach (@seqs) {

          #there's probably a better way of printing the seq
          print OUTFILE $q->pre( toString($_) );
        }
        print OUTFILE $q->end_html;
        close OUTFILE;
      }
    }
  }
}

sub toString {
  my $s = shift;
  return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq;
}


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From biopython at maubp.freeserve.co.uk  Mon Jan 25 16:24:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 25 Jan 2010 21:24:33 +0000
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264453237.4552.3.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
Message-ID: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>

On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak
<dan.kortschak at adelaide.edu.au> wrote:
> A reverse_translate to IUPAC degenerate codes is not a bad idea,
> particularly for PCR primer design.

I would say it could be a bad idea. For any protein string there are
multiple possible back translations, and this cannot be captured
fully as a nucleotide string even using the IUPAC ambiguity chars.

We debated this back and forth for Biopython, and decided to leave it
out. It wasn't possible for a simple back translate to a simple string to
handle the use cases we considered, and other options like returning
a regular expression covering all possible back translations were too
complex (for a core sequence method/function).

Peter


From jason at bioperl.org  Mon Jan 25 16:26:55 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 25 Jan 2010 13:26:55 -0800
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
Message-ID: <98995830-DC7F-4404-A216-874EF5799DB6@bioperl.org>

It was already implemented several years ago -- reverse_translate  
Bio::Tools::CodonTable -> revtanslate


   my $seqobj    = Bio::PrimarySeq->new(-seq => 'FHGERHEL');
   my $iupac_str = $myCodonTable->reverse_translate_all($seqobj);


Chris had meant to say reverse_transcribe of RNA -> DNA FWIW.

-jason
On Jan 25, 2010, at 1:24 PM, Peter wrote:

> On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak
> <dan.kortschak at adelaide.edu.au> wrote:
>> A reverse_translate to IUPAC degenerate codes is not a bad idea,
>> particularly for PCR primer design.
>
> I would say it could be a bad idea. For any protein string there are
> multiple possible back translations, and this cannot be captured
> fully as a nucleotide string even using the IUPAC ambiguity chars.
>
> We debated this back and forth for Biopython, and decided to leave it
> out. It wasn't possible for a simple back translate to a simple  
> string to
> handle the use cases we considered, and other options like returning
> a regular expression covering all possible back translations were too
> complex (for a core sequence method/function).
>
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From maj at fortinbras.us  Mon Jan 25 16:19:24 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 16:19:24 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264453237.4552.3.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
Message-ID: <72B106F0D5FF4F1E858CC9BD1EF33142@NewLife>

I think we have that functionality in Bio::Tools::SeqPattern, 
courtesy of Bruno V---
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 4:00 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


>A reverse_translate to IUPAC degenerate codes is not a bad idea,
> particularly for PCR primer design.
> 
> Dan
> 
> On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org
> wrote:
>> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:
>> 
>> > ...And there might be a case for adding the analogous
>> reverse_translate().  
>> 
>> Bah.  Meant reverse_transcribe().  Ah well.
>> 
>> chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From dan.kortschak at adelaide.edu.au  Mon Jan 25 16:38:44 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 26 Jan 2010 08:08:44 +1030
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
Message-ID: <1264455524.4552.23.camel@epistle>

Good to see that these ideas have been considered.

I'd be interested to see this discussion, or at least the point dealing
with the problems that might arise. I'm at a loss as to how ambiguity
codes can't completely describe all possible coding sequences for any
given codon table (via Bio::Tools::CodonTable - in fact this already has
the revtranslate that could be fitted into a Bio::PrimarySeq method - to
answer Mark and Jason's comments, I think that /if/ a reverse_translate
method exists, it makes logical sense to have it tied to a sequence
object, calling the B:T:CT method on the seq object itself rather than
only in Bio::Tools, 2?). Pete, tcn you provide an example of the
problems?

thanks
Dan

On Mon, 2010-01-25 at 21:24 +0000, Peter wrote:
> I would say it could be a bad idea. For any protein string there are
> multiple possible back translations, and this cannot be captured
> fully as a nucleotide string even using the IUPAC ambiguity chars.


From lpaulet at ual.es  Mon Jan 25 16:53:07 2010
From: lpaulet at ual.es (lpaulet at ual.es)
Date: Mon, 25 Jan 2010 22:53:07 +0100
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>
References: <4B5DFE53.2000201@ual.es>
	<FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>
Message-ID: <20100125225307.2zl2cn2hkcsgccso@webmail.ual.es>

Thanks Dave and Mark.

Quoting Dave Messina <David.Messina at sbc.su.se>:

>> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e   
>> $E_value -b 20000 -o $outputfilenameB/;
>
>> while( my $result = _$blast_report_->next_result ) { # get a result  
>>  from Bio::SearchIO parsing or build it up in memory
>
>
> _$blast_report_ is not a valid variable name, as far as I know. Plus  
>  there's a space between report and the final '_' in the first of  
> the  above two lines.
>
> Does this code compile?
>
> Dave
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From rtbio.2009 at gmail.com  Mon Jan 25 17:35:32 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 23:35:32 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>
References: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>
Message-ID: <c7cac1601001251435k7b75ffbbj64cfa36faf8d89bb@mail.gmail.com>

Hello Russell,

Thank you very much for your reply. My problem is that Remote blast is
getting well executed with my code and I am getting the .out file with
sequences producing significant alignments. But, when I am trying to
retrieve the sequences into an array @seqs, I am able to retrieve all the
sequences except for the first hit. If the number of hits that I get in the
.out file to be 3, I am able to retrieve only 2 hits i.e., I am able to get
only 2 sequences. If there is only one significant hit for my sequence, then
the name and description of the sequence appears in the .out file, but I am
unable to get it into the array,the array count shows 0 and there would not
be any sequence in the array.

I hope that you have got me now.

Here comes my code,

use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds.";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";


open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes <br>
 This page will automatically reload in 30 seconds  <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'},$in{'Organism'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1,"\n";
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

            # open(OUTFILE,'>',$debugfile);
             #  print OUTFILE @params;
             # close(OUTFILE);


my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
"$organ\[ORGN]");

 #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");

while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);
 my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {


        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
       print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output

      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);


       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dna;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=scalar(@seqs);
              open(OUTFILE,'>',$debugfile);
               print OUTFILE $warum;
             #  print OUTFILE @seqs;
              close(OUTFILE);
      return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

        print OUTFILE substr ($in{'Inputseq'}, $i, 1);

        if ( ($i+1)%10==0){
                print OUTFILE " ";
        }
        if ( ($i+1)%60==0){
                print OUTFILE "<br>\n";
        }
}


print OUTFILE "</font> <p>";

$z=@compseqs;

for($k=0;$k<$z;$k++) {
        print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
Sequence: <br>";

        for ($i=0; $i<length ($compseqs[$k]); $i++) {

                print OUTFILE substr ($compseqs[$k], $i, 1);

                if ( ($i+1)%10==0){
                        print OUTFILE " ";
                }
                if ( ($i+1)%60==0){
                        print OUTFILE "<br>\n";
                }
        }
        print OUTFILE "<p></font>";
}

print OUTFILE "<p>
Window: <br>$in{'Windowsize'}
<p>
<p>
Threshold: <br>$in{'Threshold'}
<p>";
my $j=0;

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

        if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
                if ($out[$i]->{similar}<=$in{'Threshold'}){
                        $j=$in{'Windowsize'};
                }
                $height=$out[$i]->{similar}*5;
        }

        if ($j>0) {
                print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
height=\"5\">";
                $outstring .= "<font color=\"green\">".substr
($in{'Inputseq'}, $i, 1)."</font>";
                $j--;
        }
        else {
                print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
height=\"5\">";
                $outstring .= "<font color=\"red\">".substr
($in{'Inputseq'}, $i, 1)."</font>";
        }

        if ( ($i+1)%10==0){
                $outstring .= " ";
        }
        if ( ($i+1)%60==0){
                $outstring .= "<br>\n";

        }
        if ( ($i+1)%800==0){
                print OUTFILE "<br><br>\n";

        }
}

print OUTFILE "<br><br><font face=\"Courier, monospace font
set\">$outstring</font>";

#foreach (@out) {
#print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
#if ($_->{similar}<=$in{'Threshold'}){

#       }
#}

print OUTFILE "</BODY>\n</HTML>\n";

close OUTFILE;

#nameprint();

sub parse_form {
    local ($buffer, @pairs, $pair, $name, $value);
    # Read in text
    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
    if ($ENV{'REQUEST_METHOD'} eq "POST")
    {
      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    }
    else
    {
        $buffer = $ENV{'QUERY_STRING'};
    }
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs)
    {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $in{$name} = $value;
    }
}

Regards,
Roopa.


On Mon, Jan 25, 2010 at 10:14 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

> That's a fair mix of incomplete code you've supplied!!
> Did you read the documentation for RemoteBlast? The example there will do
> 99% of what you want.
> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm<http://search.cpan.org/%7Ecjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm>
>
> I'm not entirely sure what you're trying to do (as you've left out a bit of
> your code) but I assume you're trying to retrieve and print the sequence for
> each hit.
>
> Here's something that works, not sure exactly what/why you want to print
> but it should get you a bit further.
>
> --Russell
>
>
> ================================
> #!perl -w
>
> use Bio::Tools::Run::RemoteBlast;
> use Bio::DB::GenBank;
>
> use CGI ':standard';
>
> use strict;
>
> my $q = new CGI;
>
> my @params = (
>               -prog         => 'blastn',
>               -data         => 'nr',
>               -expect       => '1e-30',
>               -entrez_query => 'Homo sapiens [ORGN]',
>               -readmethod   => 'SearchIO'
> );
>
> my $gb = Bio::DB::GenBank->new;
>
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>
> #$v is just to turn on and off the messages
> my $v = 1;
>
> my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" );
>
> while ( my $input = $str->next_seq() ) {
>
>   my $r = $factory->submit_blast($input);
>
>   print STDERR "waiting..." if ( $v > 0 );
>  while ( my @rids = $factory->each_rid ) {
>     foreach my $rid (@rids) {
>      my @seqs = ();
>       my $rc   = $factory->retrieve_blast($rid);
>      if ( !ref($rc) ) {
>        if ( $rc < 0 ) {
>          $factory->remove_rid($rid);
>        }
>         print STDERR "." if ( $v > 0 );
>        sleep 5;
>      }
>      else {
>         my $result = $rc->next_result();
>
>         #save the blast output
>        my $filename = $result->query_accession . '.out';
>        $factory->save_output($filename);
>        $factory->remove_rid($rid);
>        print "\nQuery Name: ", $result->query_name(), "\n";
>         while ( my $hit = $result->next_hit ) {
>
>           # store the hit sequences
>          push @seqs, $gb->get_Seq_by_version( $hit->name );
>
>          next unless ( $v > 0 );
>          print "\thit name is ", $hit->name, "\n";
>          while ( my $hsp = $hit->next_hsp ) {
>            print "\t\tscore is ", $hsp->score, "\n";
>          }
>        }
>
>        ## print the seqs you've retrieved??
>        open( OUTFILE, '>', $result->query_accession . '.htm' );
>        print OUTFILE $q->start_html('RNAi Result'),
>          $q->h1('RNAi Result'),
>          $q->h2('Input'),
>          $q->pre( toString($input) ),
>          $q->h2('Output');
>
>        foreach (@seqs) {
>
>          #there's probably a better way of printing the seq
>          print OUTFILE $q->pre( toString($_) );
>        }
>        print OUTFILE $q->end_html;
>        close OUTFILE;
>      }
>    }
>  }
> }
>
> sub toString {
>  my $s = shift;
>  return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq;
> }
>
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>


From ajmackey at gmail.com  Tue Jan 26 08:24:43 2010
From: ajmackey at gmail.com (Aaron Mackey)
Date: Tue, 26 Jan 2010 08:24:43 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264455524.4552.23.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org> 
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> 
	<1264455524.4552.23.camel@epistle>
Message-ID: <24c96eca1001260524s3d46e850hfdcc461e22210972@mail.gmail.com>

There's also Bio::Tools::IUPAC; given a sequence with IUPAC ambiguity codes,
it provides a SeqIO stream that enumerates all the possible unambiguous
realizations.  Not the right solution for every situation, but quite useful
when you need it.

-Aaron


On Mon, Jan 25, 2010 at 4:38 PM, Dan Kortschak <
dan.kortschak at adelaide.edu.au> wrote:

> Good to see that these ideas have been considered.
>
> I'd be interested to see this discussion, or at least the point dealing
> with the problems that might arise. I'm at a loss as to how ambiguity
> codes can't completely describe all possible coding sequences for any
> given codon table (via Bio::Tools::CodonTable - in fact this already has
> the revtranslate that could be fitted into a Bio::PrimarySeq method - to
> answer Mark and Jason's comments, I think that /if/ a reverse_translate
> method exists, it makes logical sense to have it tied to a sequence
> object, calling the B:T:CT method on the seq object itself rather than
> only in Bio::Tools, 2?). Pete, tcn you provide an example of the
> problems?
>
> thanks
> Dan
>
> On Mon, 2010-01-25 at 21:24 +0000, Peter wrote:
> > I would say it could be a bad idea. For any protein string there are
> > multiple possible back translations, and this cannot be captured
> > fully as a nucleotide string even using the IUPAC ambiguity chars.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From nml5566 at gmail.com  Tue Jan 26 16:10:54 2010
From: nml5566 at gmail.com (Nathan Liles)
Date: Tue, 26 Jan 2010 15:10:54 -0600
Subject: [Bioperl-l] SVN access
Message-ID: <4B5F5A5E.2070406@gmail.com>

Does anyone know who I need to talk to for getting developer access for 
the Bioperl SVN? I want to submit a patch to the genbank2gff3 converter.

Thanks,
Nathan


From Russell.Smithies at agresearch.co.nz  Tue Jan 26 20:40:40 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 14:40:40 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>

Grrrrrr, I hate eutils!!!!

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
STACK: get_desc.pl:32
-----------------------------------------------------------


Nice error message though :-)


--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> Sent: Monday, 11 January 2010 10:05 a.m.
> To: 'Chris Fields'
> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> I've started to go off eUtils recently (not BioPerl's fault) as I've often
> been finding that with large queries, chunks of the resulting data is
> missing.
> For example, before Xmas I was creating species-specific databases by
> using eUtils to get a list of GI numbers back for a taxid, then retrieving
> the fasta sequences in chunks of 500.
> Very regularly, in the middle of the fasta there would be a message about
> resource unavailable eg.
>   >test_sequence_1
>   TACGATCATCGCTResource UnavailableTACGACTCTGCT
>   >test_sequence_2
>   TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> 
> Often this wasn't detected until formatdb complained about invalid
> characters.
> Inquiries to NCBI as to why this was happening and what to do about it
> returned stupid answers ("do each sequence manually thru the web
> interface", or "use eUtils").
> As we have a nice fast network connection, I now prefer to download very
> large gzip files (i.e. all of refseq) and extract what I need.
> 
> I can't help but think that NCBI could solve a lot of problems if they
> gzipped the output from eUtils queries - it's something I've requested
> regularly for the last 5 years or so!!
> 
> --Russell
> 
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Monday, 11 January 2010 9:50 a.m.
> > To: Smithies, Russell
> > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> >
> > One could also use Bio::DB::Taxonomy, which indexes the same files or
> > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
> > details).
> >
> > chris
> >
> > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >
> > > An alternate non-BioPerly way (that may be faster given NCBI's
> flakiness
> > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
> > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
> and
> > do lookups.
> > > In that same dir, taxdump.tar.gz contains a file called names.dmp
> which
> > lists taxids and descriptions (and synonyms)
> > >
> > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> > could do this:
> > >
> > >   my $taxid  = $gi_taxid_nucl{$accession};
> > >   my $org_name = $names{$taxid};
> > >
> > > --Russell
> > >
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > >> Sent: Saturday, 26 December 2009 4:52 p.m.
> > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >> number?
> > >>
> > >> Bhakti,
> > >> The following example (using EUtilities) may serve your purpose:
> > >>
> > >> use Bio::DB::EUtilities;
> > >>
> > >> my (%taxa, @taxa);
> > >> my (%names, %idmap);
> > >>
> > >> # these are protein ids; nuc ids will work by changing -dbfrom =>
> > >> 'nucleotide',
> > >> # (probably)
> > >>
> > >> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > >>
> > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > >>                                       -db => 'taxonomy',
> > >>                                       -dbfrom => 'protein',
> > >>                                       -correspondence => 1,
> > >>                                       -id => \@ids);
> > >>
> > >> # iterate through the LinkSet objects
> > >> while (my $ds = $factory->next_LinkSet) {
> > >>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > >> }
> > >>
> > >> @taxa = @taxa{@ids};
> > >>
> > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > >>        -db    => 'taxonomy',
> > >>        -id    => \@taxa );
> > >>
> > >> while (local $_ = $factory->next_DocSum) {
> > >>    $names{($_->get_contents_by_name('TaxId'))[0]} =
> > >> ($_->get_contents_by_name('ScientificName'))[0];
> > >> }
> > >>
> > >> foreach (@ids) {
> > >>    $idmap{$_} = $names{$taxa{$_}};
> > >> }
> > >>
> > >> # %idmap is
> > >> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > >> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > >> #    68536103 => 'Corynebacterium jeikeium K411'
> > >> #    730439 => 'Bacillus caldolyticus'
> > >> #    89318838 => undef    (this record has been removed from the db)
> > >>
> > >> 1;
> > >>
> > >> You probably will need to break up your 30000 into chunks
> > >> (say, 1000-3000 each), and do the above on each chunk with a
> > >>
> > >> sleep 3;
> > >>
> > >> or so separating the queries.
> > >> MAJ
> > >> ----- Original Message -----
> > >> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > >> To: <bioperl-l at lists.open-bio.org>
> > >> Sent: Friday, December 25, 2009 9:46 PM
> > >> Subject: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > >>
> > >>
> > >>> Hi,
> > >>>
> > >>> Does anyone know how to retrieve the "Source" or the "Species name"
> > >> given
> > >>> the accession number using Bioperl.   I have these 30,000 accession
> > >> numbers
> > >>> for which I need to get the source organisms.  Any kind of help will
> > be
> > >>> appreciated.
> > >>>
> > >>> Thanks
> > >>>
> > >>> BD
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>
> > >>>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> =======================================================================
> > > Attention: The information contained in this message and/or
> attachments
> > > from AgResearch Limited is intended only for the persons or entities
> > > to which it is addressed and may contain confidential and/or
> privileged
> > > material. Any review, retransmission, dissemination or other use of,
> or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipients is prohibited by
> AgResearch
> > > Limited. If you have received this message in error, please notify the
> > > sender immediately.
> > >
> =======================================================================
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 26 20:46:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 26 Jan 2010 19:46:26 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
Message-ID: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>

It's unfortunate but I have heard this problem popping up quite a bit more frequently lately.  Not to push too many buttons but NCBI isn't very forthcoming with help these days; they have become quite insular.  Not sure if they're short-staffed due to budget or if there are other issues.

chris

On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:

> Grrrrrr, I hate eutils!!!!
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> STACK: get_desc.pl:32
> -----------------------------------------------------------
> 
> 
> Nice error message though :-)
> 
> 
> --Russell
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>> Sent: Monday, 11 January 2010 10:05 a.m.
>> To: 'Chris Fields'
>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> I've started to go off eUtils recently (not BioPerl's fault) as I've often
>> been finding that with large queries, chunks of the resulting data is
>> missing.
>> For example, before Xmas I was creating species-specific databases by
>> using eUtils to get a list of GI numbers back for a taxid, then retrieving
>> the fasta sequences in chunks of 500.
>> Very regularly, in the middle of the fasta there would be a message about
>> resource unavailable eg.
>>> test_sequence_1
>>  TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>> test_sequence_2
>>  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>> 
>> Often this wasn't detected until formatdb complained about invalid
>> characters.
>> Inquiries to NCBI as to why this was happening and what to do about it
>> returned stupid answers ("do each sequence manually thru the web
>> interface", or "use eUtils").
>> As we have a nice fast network connection, I now prefer to download very
>> large gzip files (i.e. all of refseq) and extract what I need.
>> 
>> I can't help but think that NCBI could solve a lot of problems if they
>> gzipped the output from eUtils queries - it's something I've requested
>> regularly for the last 5 years or so!!
>> 
>> --Russell
>> 
>> 
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>> To: Smithies, Russell
>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>> 
>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
>>> details).
>>> 
>>> chris
>>> 
>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>> 
>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>> flakiness
>>> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>> and
>>> do lookups.
>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>> which
>>> lists taxids and descriptions (and synonyms)
>>>> 
>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>> could do this:
>>>> 
>>>>  my $taxid  = $gi_taxid_nucl{$accession};
>>>>  my $org_name = $names{$taxid};
>>>> 
>>>> --Russell
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>> 
>>>>> Bhakti,
>>>>> The following example (using EUtilities) may serve your purpose:
>>>>> 
>>>>> use Bio::DB::EUtilities;
>>>>> 
>>>>> my (%taxa, @taxa);
>>>>> my (%names, %idmap);
>>>>> 
>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>> 'nucleotide',
>>>>> # (probably)
>>>>> 
>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>> 
>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>                                      -db => 'taxonomy',
>>>>>                                      -dbfrom => 'protein',
>>>>>                                      -correspondence => 1,
>>>>>                                      -id => \@ids);
>>>>> 
>>>>> # iterate through the LinkSet objects
>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>   $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>> }
>>>>> 
>>>>> @taxa = @taxa{@ids};
>>>>> 
>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>       -db    => 'taxonomy',
>>>>>       -id    => \@taxa );
>>>>> 
>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>   $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>> }
>>>>> 
>>>>> foreach (@ids) {
>>>>>   $idmap{$_} = $names{$taxa{$_}};
>>>>> }
>>>>> 
>>>>> # %idmap is
>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>> 
>>>>> 1;
>>>>> 
>>>>> You probably will need to break up your 30000 into chunks
>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>> 
>>>>> sleep 3;
>>>>> 
>>>>> or so separating the queries.
>>>>> MAJ
>>>>> ----- Original Message -----
>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>>>> 
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>> given
>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>> numbers
>>>>>> for which I need to get the source organisms.  Any kind of help will
>>> be
>>>>>> appreciated.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> BD
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>> =======================================================================
>>>> Attention: The information contained in this message and/or
>> attachments
>>>> from AgResearch Limited is intended only for the persons or entities
>>>> to which it is addressed and may contain confidential and/or
>> privileged
>>>> material. Any review, retransmission, dissemination or other use of,
>> or
>>>> taking of any action in reliance upon, this information by persons or
>>>> entities other than the intended recipients is prohibited by
>> AgResearch
>>>> Limited. If you have received this message in error, please notify the
>>>> sender immediately.
>>>> 
>> =======================================================================
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Jan 26 20:59:15 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 14:59:15 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>

I've had a wide selection of errors lately:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
STACK: get_desc.pl:32
-----------------------------------------------------------

And I never get a good explanation from NCBI or suggestions on how to avoid it.


--Russell
	

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 27 January 2010 2:46 p.m.
> To: Smithies, Russell
> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> It's unfortunate but I have heard this problem popping up quite a bit more
> frequently lately.  Not to push too many buttons but NCBI isn't very
> forthcoming with help these days; they have become quite insular.  Not
> sure if they're short-staffed due to budget or if there are other issues.
> 
> chris
> 
> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> 
> > Grrrrrr, I hate eutils!!!!
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> (Connection refused)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > STACK: Bio::Tools::EUtilities::parse_data
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > STACK: Bio::Tools::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > STACK: Bio::DB::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > STACK: get_desc.pl:32
> > -----------------------------------------------------------
> >
> >
> > Nice error message though :-)
> >
> >
> > --Russell
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> >> Sent: Monday, 11 January 2010 10:05 a.m.
> >> To: 'Chris Fields'
> >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> I've started to go off eUtils recently (not BioPerl's fault) as I've
> often
> >> been finding that with large queries, chunks of the resulting data is
> >> missing.
> >> For example, before Xmas I was creating species-specific databases by
> >> using eUtils to get a list of GI numbers back for a taxid, then
> retrieving
> >> the fasta sequences in chunks of 500.
> >> Very regularly, in the middle of the fasta there would be a message
> about
> >> resource unavailable eg.
> >>> test_sequence_1
> >>  TACGATCATCGCTResource UnavailableTACGACTCTGCT
> >>> test_sequence_2
> >>  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> >>
> >> Often this wasn't detected until formatdb complained about invalid
> >> characters.
> >> Inquiries to NCBI as to why this was happening and what to do about it
> >> returned stupid answers ("do each sequence manually thru the web
> >> interface", or "use eUtils").
> >> As we have a nice fast network connection, I now prefer to download
> very
> >> large gzip files (i.e. all of refseq) and extract what I need.
> >>
> >> I can't help but think that NCBI could solve a lot of problems if they
> >> gzipped the output from eUtils queries - it's something I've requested
> >> regularly for the last 5 years or so!!
> >>
> >> --Russell
> >>
> >>
> >>> -----Original Message-----
> >>> From: Chris Fields [mailto:cjfields at illinois.edu]
> >>> Sent: Monday, 11 January 2010 9:50 a.m.
> >>> To: Smithies, Russell
> >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >>> number?
> >>>
> >>> One could also use Bio::DB::Taxonomy, which indexes the same files or
> >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> the
> >>> details).
> >>>
> >>> chris
> >>>
> >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >>>
> >>>> An alternate non-BioPerly way (that may be faster given NCBI's
> >> flakiness
> >>> lately) would be to download the gi_taxid_nucl.zip or
> gi_taxid_prot.zip
> >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
> >> and
> >>> do lookups.
> >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> >> which
> >>> lists taxids and descriptions (and synonyms)
> >>>>
> >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> >>> could do this:
> >>>>
> >>>>  my $taxid  = $gi_taxid_nucl{$accession};
> >>>>  my $org_name = $names{$taxid};
> >>>>
> >>>> --Russell
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> >>>>> number?
> >>>>>
> >>>>> Bhakti,
> >>>>> The following example (using EUtilities) may serve your purpose:
> >>>>>
> >>>>> use Bio::DB::EUtilities;
> >>>>>
> >>>>> my (%taxa, @taxa);
> >>>>> my (%names, %idmap);
> >>>>>
> >>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >>>>> 'nucleotide',
> >>>>> # (probably)
> >>>>>
> >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>>>>
> >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>>>>                                      -db => 'taxonomy',
> >>>>>                                      -dbfrom => 'protein',
> >>>>>                                      -correspondence => 1,
> >>>>>                                      -id => \@ids);
> >>>>>
> >>>>> # iterate through the LinkSet objects
> >>>>> while (my $ds = $factory->next_LinkSet) {
> >>>>>   $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >>>>> }
> >>>>>
> >>>>> @taxa = @taxa{@ids};
> >>>>>
> >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>>>>       -db    => 'taxonomy',
> >>>>>       -id    => \@taxa );
> >>>>>
> >>>>> while (local $_ = $factory->next_DocSum) {
> >>>>>   $names{($_->get_contents_by_name('TaxId'))[0]} =
> >>>>> ($_->get_contents_by_name('ScientificName'))[0];
> >>>>> }
> >>>>>
> >>>>> foreach (@ids) {
> >>>>>   $idmap{$_} = $names{$taxa{$_}};
> >>>>> }
> >>>>>
> >>>>> # %idmap is
> >>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> >>>>> #    730439 => 'Bacillus caldolyticus'
> >>>>> #    89318838 => undef    (this record has been removed from the db)
> >>>>>
> >>>>> 1;
> >>>>>
> >>>>> You probably will need to break up your 30000 into chunks
> >>>>> (say, 1000-3000 each), and do the above on each chunk with a
> >>>>>
> >>>>> sleep 3;
> >>>>>
> >>>>> or so separating the queries.
> >>>>> MAJ
> >>>>> ----- Original Message -----
> >>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >>>>> To: <bioperl-l at lists.open-bio.org>
> >>>>> Sent: Friday, December 25, 2009 9:46 PM
> >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> >>> number?
> >>>>>
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
> >>>>> given
> >>>>>> the accession number using Bioperl.   I have these 30,000 accession
> >>>>> numbers
> >>>>>> for which I need to get the source organisms.  Any kind of help
> will
> >>> be
> >>>>>> appreciated.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> BD
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >> =======================================================================
> >>>> Attention: The information contained in this message and/or
> >> attachments
> >>>> from AgResearch Limited is intended only for the persons or entities
> >>>> to which it is addressed and may contain confidential and/or
> >> privileged
> >>>> material. Any review, retransmission, dissemination or other use of,
> >> or
> >>>> taking of any action in reliance upon, this information by persons or
> >>>> entities other than the intended recipients is prohibited by
> >> AgResearch
> >>>> Limited. If you have received this message in error, please notify
> the
> >>>> sender immediately.
> >>>>
> >> =======================================================================
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 26 21:42:22 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 26 Jan 2010 20:42:22 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
Message-ID: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>

Makes me wonder if they're pushing more users towards the SOAP-based services and away from eutils.

chris

On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:

> I've had a wide selection of errors lately:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> STACK: get_desc.pl:32
> -----------------------------------------------------------
> 
> And I never get a good explanation from NCBI or suggestions on how to avoid it.
> 
> 
> --Russell
> 	
> 
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields at illinois.edu]
>> Sent: Wednesday, 27 January 2010 2:46 p.m.
>> To: Smithies, Russell
>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> It's unfortunate but I have heard this problem popping up quite a bit more
>> frequently lately.  Not to push too many buttons but NCBI isn't very
>> forthcoming with help these days; they have become quite insular.  Not
>> sure if they're short-staffed due to budget or if there are other issues.
>> 
>> chris
>> 
>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
>> 
>>> Grrrrrr, I hate eutils!!!!
>>> 
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
>> (Connection refused)
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>> STACK: Bio::Tools::EUtilities::parse_data
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>>> STACK: Bio::Tools::EUtilities::get_ids
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>>> STACK: Bio::DB::EUtilities::get_ids
>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>>> STACK: get_desc.pl:32
>>> -----------------------------------------------------------
>>> 
>>> 
>>> Nice error message though :-)
>>> 
>>> 
>>> --Russell
>>> 
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>>>> Sent: Monday, 11 January 2010 10:05 a.m.
>>>> To: 'Chris Fields'
>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>> number?
>>>> 
>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
>> often
>>>> been finding that with large queries, chunks of the resulting data is
>>>> missing.
>>>> For example, before Xmas I was creating species-specific databases by
>>>> using eUtils to get a list of GI numbers back for a taxid, then
>> retrieving
>>>> the fasta sequences in chunks of 500.
>>>> Very regularly, in the middle of the fasta there would be a message
>> about
>>>> resource unavailable eg.
>>>>> test_sequence_1
>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>>>> test_sequence_2
>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>>>> 
>>>> Often this wasn't detected until formatdb complained about invalid
>>>> characters.
>>>> Inquiries to NCBI as to why this was happening and what to do about it
>>>> returned stupid answers ("do each sequence manually thru the web
>>>> interface", or "use eUtils").
>>>> As we have a nice fast network connection, I now prefer to download
>> very
>>>> large gzip files (i.e. all of refseq) and extract what I need.
>>>> 
>>>> I can't help but think that NCBI could solve a lot of problems if they
>>>> gzipped the output from eUtils queries - it's something I've requested
>>>> regularly for the last 5 years or so!!
>>>> 
>>>> --Russell
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>>>> To: Smithies, Russell
>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>> 
>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
>> the
>>>>> details).
>>>>> 
>>>>> chris
>>>>> 
>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>>>> 
>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>>>> flakiness
>>>>> lately) would be to download the gi_taxid_nucl.zip or
>> gi_taxid_prot.zip
>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>>>> and
>>>>> do lookups.
>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>>>> which
>>>>> lists taxids and descriptions (and synonyms)
>>>>>> 
>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>>>> could do this:
>>>>>> 
>>>>>> my $taxid  = $gi_taxid_nucl{$accession};
>>>>>> my $org_name = $names{$taxid};
>>>>>> 
>>>>>> --Russell
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
>> accession
>>>>>>> number?
>>>>>>> 
>>>>>>> Bhakti,
>>>>>>> The following example (using EUtilities) may serve your purpose:
>>>>>>> 
>>>>>>> use Bio::DB::EUtilities;
>>>>>>> 
>>>>>>> my (%taxa, @taxa);
>>>>>>> my (%names, %idmap);
>>>>>>> 
>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>>>> 'nucleotide',
>>>>>>> # (probably)
>>>>>>> 
>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>>>> 
>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>>>                                     -db => 'taxonomy',
>>>>>>>                                     -dbfrom => 'protein',
>>>>>>>                                     -correspondence => 1,
>>>>>>>                                     -id => \@ids);
>>>>>>> 
>>>>>>> # iterate through the LinkSet objects
>>>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>>>> }
>>>>>>> 
>>>>>>> @taxa = @taxa{@ids};
>>>>>>> 
>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>>>      -db    => 'taxonomy',
>>>>>>>      -id    => \@taxa );
>>>>>>> 
>>>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>>>> }
>>>>>>> 
>>>>>>> foreach (@ids) {
>>>>>>>  $idmap{$_} = $names{$taxa{$_}};
>>>>>>> }
>>>>>>> 
>>>>>>> # %idmap is
>>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>>>> 
>>>>>>> 1;
>>>>>>> 
>>>>>>> You probably will need to break up your 30000 into chunks
>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>>>> 
>>>>>>> sleep 3;
>>>>>>> 
>>>>>>> or so separating the queries.
>>>>>>> MAJ
>>>>>>> ----- Original Message -----
>>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>>>> 
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>>>> given
>>>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>>>> numbers
>>>>>>>> for which I need to get the source organisms.  Any kind of help
>> will
>>>>> be
>>>>>>>> appreciated.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> BD
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> 
>>>> =======================================================================
>>>>>> Attention: The information contained in this message and/or
>>>> attachments
>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>> to which it is addressed and may contain confidential and/or
>>>> privileged
>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>> or
>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>> entities other than the intended recipients is prohibited by
>>>> AgResearch
>>>>>> Limited. If you have received this message in error, please notify
>> the
>>>>>> sender immediately.
>>>>>> 
>>>> =======================================================================
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Tue Jan 26 21:45:58 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 15:45:58 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>

Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today).

--Russell

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 27 January 2010 3:42 p.m.
> To: Smithies, Russell
> Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Makes me wonder if they're pushing more users towards the SOAP-based
> services and away from eutils.
> 
> chris
> 
> On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> 
> > I've had a wide selection of errors lately:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource
> temporarily unavailable)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > STACK: Bio::Tools::EUtilities::parse_data
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > STACK: Bio::Tools::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > STACK: Bio::DB::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > STACK: get_desc.pl:32
> > -----------------------------------------------------------
> >
> > And I never get a good explanation from NCBI or suggestions on how to
> avoid it.
> >
> >
> > --Russell
> >
> >
> >> -----Original Message-----
> >> From: Chris Fields [mailto:cjfields at illinois.edu]
> >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> >> To: Smithies, Russell
> >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> It's unfortunate but I have heard this problem popping up quite a bit
> more
> >> frequently lately.  Not to push too many buttons but NCBI isn't very
> >> forthcoming with help these days; they have become quite insular.  Not
> >> sure if they're short-staffed due to budget or if there are other
> issues.
> >>
> >> chris
> >>
> >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> >>
> >>> Grrrrrr, I hate eutils!!!!
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> >> (Connection refused)
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> >>> STACK: Bio::Tools::EUtilities::parse_data
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> >>> STACK: Bio::Tools::EUtilities::get_ids
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> >>> STACK: Bio::DB::EUtilities::get_ids
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> >>> STACK: get_desc.pl:32
> >>> -----------------------------------------------------------
> >>>
> >>>
> >>> Nice error message though :-)
> >>>
> >>>
> >>> --Russell
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> >>>> To: 'Chris Fields'
> >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> bio.org'
> >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >>>> number?
> >>>>
> >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
> >> often
> >>>> been finding that with large queries, chunks of the resulting data is
> >>>> missing.
> >>>> For example, before Xmas I was creating species-specific databases by
> >>>> using eUtils to get a list of GI numbers back for a taxid, then
> >> retrieving
> >>>> the fasta sequences in chunks of 500.
> >>>> Very regularly, in the middle of the fasta there would be a message
> >> about
> >>>> resource unavailable eg.
> >>>>> test_sequence_1
> >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> >>>>> test_sequence_2
> >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> >>>>
> >>>> Often this wasn't detected until formatdb complained about invalid
> >>>> characters.
> >>>> Inquiries to NCBI as to why this was happening and what to do about
> it
> >>>> returned stupid answers ("do each sequence manually thru the web
> >>>> interface", or "use eUtils").
> >>>> As we have a nice fast network connection, I now prefer to download
> >> very
> >>>> large gzip files (i.e. all of refseq) and extract what I need.
> >>>>
> >>>> I can't help but think that NCBI could solve a lot of problems if
> they
> >>>> gzipped the output from eUtils queries - it's something I've
> requested
> >>>> regularly for the last 5 years or so!!
> >>>>
> >>>> --Russell
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> >>>>> To: Smithies, Russell
> >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> bio.org'
> >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> >>>>> number?
> >>>>>
> >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files
> or
> >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> >> the
> >>>>> details).
> >>>>>
> >>>>> chris
> >>>>>
> >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >>>>>
> >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> >>>> flakiness
> >>>>> lately) would be to download the gi_taxid_nucl.zip or
> >> gi_taxid_prot.zip
> >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a
> hash
> >>>> and
> >>>>> do lookups.
> >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> >>>> which
> >>>>> lists taxids and descriptions (and synonyms)
> >>>>>>
> >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so
> I
> >>>>> could do this:
> >>>>>>
> >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> >>>>>> my $org_name = $names{$taxid};
> >>>>>>
> >>>>>> --Russell
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> >> accession
> >>>>>>> number?
> >>>>>>>
> >>>>>>> Bhakti,
> >>>>>>> The following example (using EUtilities) may serve your purpose:
> >>>>>>>
> >>>>>>> use Bio::DB::EUtilities;
> >>>>>>>
> >>>>>>> my (%taxa, @taxa);
> >>>>>>> my (%names, %idmap);
> >>>>>>>
> >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >>>>>>> 'nucleotide',
> >>>>>>> # (probably)
> >>>>>>>
> >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>>>>>>
> >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>>>>>>                                     -db => 'taxonomy',
> >>>>>>>                                     -dbfrom => 'protein',
> >>>>>>>                                     -correspondence => 1,
> >>>>>>>                                     -id => \@ids);
> >>>>>>>
> >>>>>>> # iterate through the LinkSet objects
> >>>>>>> while (my $ds = $factory->next_LinkSet) {
> >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >>>>>>> }
> >>>>>>>
> >>>>>>> @taxa = @taxa{@ids};
> >>>>>>>
> >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>>>>>>      -db    => 'taxonomy',
> >>>>>>>      -id    => \@taxa );
> >>>>>>>
> >>>>>>> while (local $_ = $factory->next_DocSum) {
> >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> >>>>>>> }
> >>>>>>>
> >>>>>>> foreach (@ids) {
> >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> >>>>>>> }
> >>>>>>>
> >>>>>>> # %idmap is
> >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> >>>>>>> #    730439 => 'Bacillus caldolyticus'
> >>>>>>> #    89318838 => undef    (this record has been removed from the
> db)
> >>>>>>>
> >>>>>>> 1;
> >>>>>>>
> >>>>>>> You probably will need to break up your 30000 into chunks
> >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> >>>>>>>
> >>>>>>> sleep 3;
> >>>>>>>
> >>>>>>> or so separating the queries.
> >>>>>>> MAJ
> >>>>>>> ----- Original Message -----
> >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >>>>>>> To: <bioperl-l at lists.open-bio.org>
> >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> >>>>> number?
> >>>>>>>
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> name"
> >>>>>>> given
> >>>>>>>> the accession number using Bioperl.   I have these 30,000
> accession
> >>>>>>> numbers
> >>>>>>>> for which I need to get the source organisms.  Any kind of help
> >> will
> >>>>> be
> >>>>>>>> appreciated.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> BD
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>
> =======================================================================
> >>>>>> Attention: The information contained in this message and/or
> >>>> attachments
> >>>>>> from AgResearch Limited is intended only for the persons or
> entities
> >>>>>> to which it is addressed and may contain confidential and/or
> >>>> privileged
> >>>>>> material. Any review, retransmission, dissemination or other use
> of,
> >>>> or
> >>>>>> taking of any action in reliance upon, this information by persons
> or
> >>>>>> entities other than the intended recipients is prohibited by
> >>>> AgResearch
> >>>>>> Limited. If you have received this message in error, please notify
> >> the
> >>>>>> sender immediately.
> >>>>>>
> >>>>
> =======================================================================
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Jan 27 10:14:22 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 27 Jan 2010 10:14:22 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com><1CE23DE1068C4FA2BD543D167A1AA901@NewLife><18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz><F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz><4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
Message-ID: <C1C922A99DF24679955608955B2A73B1@NewLife>

Precisely the MO behind SoapEU...get the jump on 'em.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
Cc: <bioperl-l at lists.open-bio.org>; "'Mark A. Jensen'" <maj at fortinbras.us>
Sent: Tuesday, January 26, 2010 9:42 PM
Subject: Re: [Bioperl-l] how to retrieve organism name from accession number?


> Makes me wonder if they're pushing more users towards the SOAP-based services 
> and away from eutils.
>
> chris
>
> On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
>
>> I've had a wide selection of errors lately:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource 
>> temporarily unavailable)
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>> STACK: Bio::Tools::EUtilities::parse_data 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>> STACK: Bio::Tools::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>> STACK: Bio::DB::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>> STACK: get_desc.pl:32
>> -----------------------------------------------------------
>>
>> And I never get a good explanation from NCBI or suggestions on how to avoid 
>> it.
>>
>>
>> --Russell
>>
>>
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Wednesday, 27 January 2010 2:46 p.m.
>>> To: Smithies, Russell
>>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>>
>>> It's unfortunate but I have heard this problem popping up quite a bit more
>>> frequently lately.  Not to push too many buttons but NCBI isn't very
>>> forthcoming with help these days; they have become quite insular.  Not
>>> sure if they're short-staffed due to budget or if there are other issues.
>>>
>>> chris
>>>
>>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
>>>
>>>> Grrrrrr, I hate eutils!!!!
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
>>> (Connection refused)
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>>> STACK: Bio::Tools::EUtilities::parse_data
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>>>> STACK: Bio::Tools::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>>>> STACK: Bio::DB::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>>>> STACK: get_desc.pl:32
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>> Nice error message though :-)
>>>>
>>>>
>>>> --Russell
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>>>>> Sent: Monday, 11 January 2010 10:05 a.m.
>>>>> To: 'Chris Fields'
>>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>>
>>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
>>> often
>>>>> been finding that with large queries, chunks of the resulting data is
>>>>> missing.
>>>>> For example, before Xmas I was creating species-specific databases by
>>>>> using eUtils to get a list of GI numbers back for a taxid, then
>>> retrieving
>>>>> the fasta sequences in chunks of 500.
>>>>> Very regularly, in the middle of the fasta there would be a message
>>> about
>>>>> resource unavailable eg.
>>>>>> test_sequence_1
>>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>>>>> test_sequence_2
>>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>>>>>
>>>>> Often this wasn't detected until formatdb complained about invalid
>>>>> characters.
>>>>> Inquiries to NCBI as to why this was happening and what to do about it
>>>>> returned stupid answers ("do each sequence manually thru the web
>>>>> interface", or "use eUtils").
>>>>> As we have a nice fast network connection, I now prefer to download
>>> very
>>>>> large gzip files (i.e. all of refseq) and extract what I need.
>>>>>
>>>>> I can't help but think that NCBI could solve a lot of problems if they
>>>>> gzipped the output from eUtils queries - it's something I've requested
>>>>> regularly for the last 5 years or so!!
>>>>>
>>>>> --Russell
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>>>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>>>>> To: Smithies, Russell
>>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>
>>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
>>> the
>>>>>> details).
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>>>>>
>>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>>>>> flakiness
>>>>>> lately) would be to download the gi_taxid_nucl.zip or
>>> gi_taxid_prot.zip
>>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>>>>> and
>>>>>> do lookups.
>>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>>>>> which
>>>>>> lists taxids and descriptions (and synonyms)
>>>>>>>
>>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>>>>> could do this:
>>>>>>>
>>>>>>> my $taxid  = $gi_taxid_nucl{$accession};
>>>>>>> my $org_name = $names{$taxid};
>>>>>>>
>>>>>>> --Russell
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
>>> accession
>>>>>>>> number?
>>>>>>>>
>>>>>>>> Bhakti,
>>>>>>>> The following example (using EUtilities) may serve your purpose:
>>>>>>>>
>>>>>>>> use Bio::DB::EUtilities;
>>>>>>>>
>>>>>>>> my (%taxa, @taxa);
>>>>>>>> my (%names, %idmap);
>>>>>>>>
>>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>>>>> 'nucleotide',
>>>>>>>> # (probably)
>>>>>>>>
>>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>>>>>
>>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>>>>                                     -db => 'taxonomy',
>>>>>>>>                                     -dbfrom => 'protein',
>>>>>>>>                                     -correspondence => 1,
>>>>>>>>                                     -id => \@ids);
>>>>>>>>
>>>>>>>> # iterate through the LinkSet objects
>>>>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>>>>> }
>>>>>>>>
>>>>>>>> @taxa = @taxa{@ids};
>>>>>>>>
>>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>>>>      -db    => 'taxonomy',
>>>>>>>>      -id    => \@taxa );
>>>>>>>>
>>>>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>>>>> }
>>>>>>>>
>>>>>>>> foreach (@ids) {
>>>>>>>>  $idmap{$_} = $names{$taxa{$_}};
>>>>>>>> }
>>>>>>>>
>>>>>>>> # %idmap is
>>>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>>>>>
>>>>>>>> 1;
>>>>>>>>
>>>>>>>> You probably will need to break up your 30000 into chunks
>>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>>>>>
>>>>>>>> sleep 3;
>>>>>>>>
>>>>>>>> or so separating the queries.
>>>>>>>> MAJ
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>>>>> given
>>>>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>>>>> numbers
>>>>>>>>> for which I need to get the source organisms.  Any kind of help
>>> will
>>>>>> be
>>>>>>>>> appreciated.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> BD
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>> =======================================================================
>>>>>>> Attention: The information contained in this message and/or
>>>>> attachments
>>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>>> to which it is addressed and may contain confidential and/or
>>>>> privileged
>>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>>> or
>>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>>> entities other than the intended recipients is prohibited by
>>>>> AgResearch
>>>>>>> Limited. If you have received this message in error, please notify
>>> the
>>>>>>> sender immediately.
>>>>>>>
>>>>> =======================================================================
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bhakti.dwivedi at gmail.com  Wed Jan 27 14:42:06 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Wed, 27 Jan 2010 14:42:06 -0500
Subject: [Bioperl-l] Designing primers from multiple sequence alignment of
	amino acid sequences
Message-ID: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>

Hi,

I have to design primers from the multiple sequence alignments of amino acid
sequences.  The sequences I am working with are quite diverged and often the
available primer design programs (such as CODEHOP/iCODEHOP) fail to find any
primer sets.   But, when I look  at the alignment manually, I could see the
regions that I could use to make primers.

So I  designed the degenerate primers the old-fashioned way, starting from
selecting the conserved regions (6-10aa long) from the alignment  to
translating the selected regions to DNA using the appropriate codon usage
table, and then finally checking the primer sets (potential forward and
reverse primers) using tools like OLIGOANALYZER.  In the end, I did find few
good primer sets, but getting them to work in reality is something I will
have to wait and see.

While doing this process manually, I really felt the need to automate it (it
was not just one alignment I did, I worked with several of those).   I was
wondering if there is anyway bioperl can help me here, or making a perl
script is the only way to go.

I would appreciate your suggestions/comments.  Thanks!  (apologize for a
long email..)


Regards
Bhakti


From Kevin.M.Brown at asu.edu  Wed Jan 27 15:23:57 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 27 Jan 2010 13:23:57 -0700
Subject: [Bioperl-l] Designing primers from multiple sequence alignment
	ofamino acid sequences
In-Reply-To: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>
References: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4068498DB@EX02.asurite.ad.asu.edu>

Bioperl is just a collection of tools, not a full blown application.
Most of what you want can be done with the objects available from within
the toolkit, but the application (perl script) would still need to be
written to put the objects to use. You could use clustalw from within
perl to align the sequences (Bio::Tools::Run::Alignment::Clustalw), find
the conserved regions (Bio::SimpleAlign), reverse translate them
(Bio::Tools::CodonTable), then come up with an algorithm for primer
analysis and selction (or even use other apps like primer3
(Bio::Tools::Run::Primer3) from within perl).

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Bhakti Dwivedi
> Sent: Wednesday, January 27, 2010 12:42 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Designing primers from multiple sequence 
> alignment ofamino acid sequences
> 
> Hi,
> 
> I have to design primers from the multiple sequence 
> alignments of amino acid
> sequences.  The sequences I am working with are quite 
> diverged and often the
> available primer design programs (such as CODEHOP/iCODEHOP) 
> fail to find any
> primer sets.   But, when I look  at the alignment manually, I 
> could see the
> regions that I could use to make primers.
> 
> So I  designed the degenerate primers the old-fashioned way, 
> starting from
> selecting the conserved regions (6-10aa long) from the alignment  to
> translating the selected regions to DNA using the appropriate 
> codon usage
> table, and then finally checking the primer sets (potential 
> forward and
> reverse primers) using tools like OLIGOANALYZER.  In the end, 
> I did find few
> good primer sets, but getting them to work in reality is 
> something I will
> have to wait and see.
> 
> While doing this process manually, I really felt the need to 
> automate it (it
> was not just one alignment I did, I worked with several of 
> those).   I was
> wondering if there is anyway bioperl can help me here, or 
> making a perl
> script is the only way to go.
> 
> I would appreciate your suggestions/comments.  Thanks!  
> (apologize for a
> long email..)
> 
> 
> Regards
> Bhakti
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From mike.stubbington at bbsrc.ac.uk  Thu Jan 28 10:41:49 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Thu, 28 Jan 2010 15:41:49 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
Message-ID: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>

Dear all,

I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. 

I have perl code that creates an array of bioperl sequence objects called @primers

I then create a StandAloneBlastPlus factory using the following code?

	my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
		-db_dir => '/Users/stubbing/localBlast/',
		-db_name => 'MouseGenome'
	);

and then attempt to blast my primers using this?

	my @shortPrimers;
	my $count=1;
	foreach (@primers) {
		my $currentSeq = $_;
		print "Checking primer $count/$primerNumber ";
		if ($_->length < 40) {
			push(@shortPrimers,$_);
			print "Too short!\n";
		}
		else {
			print "BLASTing...";
			my $blastResult = $blastFactory->blastn(-query => $currentSeq);
		}
		$count++;
	}

This fails with the following error?

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> line 532.

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

Line 63 in my code is (as you might expect) the one that calls blastn on my factory object.

I'd appreciate any help you might be able to provide to shed light on this.

Thanks in advance,

Mike


From maj at fortinbras.us  Thu Jan 28 10:56:14 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 10:56:14 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
Message-ID: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>

Mike - please try updating your bioperl-live (the core) to the latest code 
(revision 16761 or so).
CommandExts is a work in progress; from the stack errors it looks like you've 
got an older version.
Try it then ping us back, if you would--
Thanks
Mark
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 10:41 AM
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error 
running blastn


Dear all,

I am attempting to blast some primers against the mouse genome. I have created a 
local mouse genome blast database and I can search against it using 'blastn' at 
the command line.

I have perl code that creates an array of bioperl sequence objects called 
@primers

I then create a StandAloneBlastPlus factory using the following code?

my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_dir => '/Users/stubbing/localBlast/',
-db_name => 'MouseGenome'
);

and then attempt to blast my primers using this?

my @shortPrimers;
my $count=1;
foreach (@primers) {
my $currentSeq = $_;
print "Checking primer $count/$primerNumber ";
if ($_->length < 40) {
push(@shortPrimers,$_);
print "Too short!\n";
}
else {
print "BLASTing...";
my $blastResult = $blastFactory->blastn(-query => $currentSeq);
}
$count++;
}

This fails with the following error?

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

Line 63 in my code is (as you might expect) the one that calls blastn on my 
factory object.

I'd appreciate any help you might be able to provide to shed light on this.

Thanks in advance,

Mike


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mike.stubbington at bbsrc.ac.uk  Thu Jan 28 11:18:12 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Thu, 28 Jan 2010 16:18:12 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
In-Reply-To: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
	<56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
Message-ID: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>

Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> line 532.

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code 
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've 
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error 
> running blastn
> 
> 
> Dear all,
> 
> I am attempting to blast some primers against the mouse genome. I have created a 
> local mouse genome blast database and I can search against it using 'blastn' at 
> the command line.
> 
> I have perl code that creates an array of bioperl sequence objects called 
> @primers
> 
> I then create a StandAloneBlastPlus factory using the following code?
> 
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
> 
> and then attempt to blast my primers using this?
> 
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
> 
> This fails with the following error?
> 
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> 
> line 532.
> 
> STACK Bio::Tools::Run::WrapperBase::_run 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
> 
> Line 63 in my code is (as you might expect) the one that calls blastn on my 
> factory object.
> 
> I'd appreciate any help you might be able to provide to shed light on this.
> 
> Thanks in advance,
> 
> Mike
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Thu Jan 28 11:28:52 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 11:28:52 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
	<56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
Message-ID: <C7FF329BCA044F19B3D690FE67319192@NewLife>

Thanks Mike-- will have a look asap- cheers MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 11:18 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
> running blastn
>
>
> Dear all,
>
> I am attempting to blast some primers against the mouse genome. I have created 
> a
> local mouse genome blast database and I can search against it using 'blastn' 
> at
> the command line.
>
> I have perl code that creates an array of bioperl sequence objects called
> @primers
>
> I then create a StandAloneBlastPlus factory using the following code?
>
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
>
> and then attempt to blast my primers using this?
>
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
>
> This fails with the following error?
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> Line 63 in my code is (as you might expect) the one that calls blastn on my
> factory object.
>
> I'd appreciate any help you might be able to provide to shed light on this.
>
> Thanks in advance,
>
> Mike
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Thu Jan 28 13:26:27 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 12:26:27 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
Message-ID: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu>

Russell,

Just curious, but have you tried setting the return email parameter
(-email)?  NCBI recently stated that all queries would eventually
require a return email of some sort (not sure if it's validated or not).
I think that was set for around late spring.  I'm changing the code in
svn to require it for that very purpose.

chris  


 Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today).
> 
> --Russell
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > To: Smithies, Russell
> > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > 
> > Makes me wonder if they're pushing more users towards the SOAP-based
> > services and away from eutils.
> > 
> > chris
> > 
> > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > 
> > > I've had a wide selection of errors lately:
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource
> > temporarily unavailable)
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > STACK: Bio::Tools::EUtilities::parse_data
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > STACK: Bio::Tools::EUtilities::get_ids
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > STACK: Bio::DB::EUtilities::get_ids
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > STACK: get_desc.pl:32
> > > -----------------------------------------------------------
> > >
> > > And I never get a good explanation from NCBI or suggestions on how to
> > avoid it.
> > >
> > >
> > > --Russell
> > >
> > >
> > >> -----Original Message-----
> > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > >> To: Smithies, Russell
> > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >> number?
> > >>
> > >> It's unfortunate but I have heard this problem popping up quite a bit
> > more
> > >> frequently lately.  Not to push too many buttons but NCBI isn't very
> > >> forthcoming with help these days; they have become quite insular.  Not
> > >> sure if they're short-staffed due to budget or if there are other
> > issues.
> > >>
> > >> chris
> > >>
> > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > >>
> > >>> Grrrrrr, I hate eutils!!!!
> > >>>
> > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > >> (Connection refused)
> > >>> STACK: Error::throw
> > >>> STACK: Bio::Root::Root::throw
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > >>> STACK: Bio::Tools::EUtilities::parse_data
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > >>> STACK: Bio::Tools::EUtilities::get_ids
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > >>> STACK: Bio::DB::EUtilities::get_ids
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > >>> STACK: get_desc.pl:32
> > >>> -----------------------------------------------------------
> > >>>
> > >>>
> > >>> Nice error message though :-)
> > >>>
> > >>>
> > >>> --Russell
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > >>>> To: 'Chris Fields'
> > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > bio.org'
> > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >>>> number?
> > >>>>
> > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
> > >> often
> > >>>> been finding that with large queries, chunks of the resulting data is
> > >>>> missing.
> > >>>> For example, before Xmas I was creating species-specific databases by
> > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > >> retrieving
> > >>>> the fasta sequences in chunks of 500.
> > >>>> Very regularly, in the middle of the fasta there would be a message
> > >> about
> > >>>> resource unavailable eg.
> > >>>>> test_sequence_1
> > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > >>>>> test_sequence_2
> > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > >>>>
> > >>>> Often this wasn't detected until formatdb complained about invalid
> > >>>> characters.
> > >>>> Inquiries to NCBI as to why this was happening and what to do about
> > it
> > >>>> returned stupid answers ("do each sequence manually thru the web
> > >>>> interface", or "use eUtils").
> > >>>> As we have a nice fast network connection, I now prefer to download
> > >> very
> > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > >>>>
> > >>>> I can't help but think that NCBI could solve a lot of problems if
> > they
> > >>>> gzipped the output from eUtils queries - it's something I've
> > requested
> > >>>> regularly for the last 5 years or so!!
> > >>>>
> > >>>> --Russell
> > >>>>
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > >>>>> To: Smithies, Russell
> > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > bio.org'
> > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > >>>>> number?
> > >>>>>
> > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files
> > or
> > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> > >> the
> > >>>>> details).
> > >>>>>
> > >>>>> chris
> > >>>>>
> > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > >>>>>
> > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > >>>> flakiness
> > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > >> gi_taxid_prot.zip
> > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a
> > hash
> > >>>> and
> > >>>>> do lookups.
> > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> > >>>> which
> > >>>>> lists taxids and descriptions (and synonyms)
> > >>>>>>
> > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so
> > I
> > >>>>> could do this:
> > >>>>>>
> > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > >>>>>> my $org_name = $names{$taxid};
> > >>>>>>
> > >>>>>> --Russell
> > >>>>>>
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > >> accession
> > >>>>>>> number?
> > >>>>>>>
> > >>>>>>> Bhakti,
> > >>>>>>> The following example (using EUtilities) may serve your purpose:
> > >>>>>>>
> > >>>>>>> use Bio::DB::EUtilities;
> > >>>>>>>
> > >>>>>>> my (%taxa, @taxa);
> > >>>>>>> my (%names, %idmap);
> > >>>>>>>
> > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> > >>>>>>> 'nucleotide',
> > >>>>>>> # (probably)
> > >>>>>>>
> > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > >>>>>>>
> > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > >>>>>>>                                     -db => 'taxonomy',
> > >>>>>>>                                     -dbfrom => 'protein',
> > >>>>>>>                                     -correspondence => 1,
> > >>>>>>>                                     -id => \@ids);
> > >>>>>>>
> > >>>>>>> # iterate through the LinkSet objects
> > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> @taxa = @taxa{@ids};
> > >>>>>>>
> > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > >>>>>>>      -db    => 'taxonomy',
> > >>>>>>>      -id    => \@taxa );
> > >>>>>>>
> > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> foreach (@ids) {
> > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> # %idmap is
> > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > >>>>>>> #    89318838 => undef    (this record has been removed from the
> > db)
> > >>>>>>>
> > >>>>>>> 1;
> > >>>>>>>
> > >>>>>>> You probably will need to break up your 30000 into chunks
> > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > >>>>>>>
> > >>>>>>> sleep 3;
> > >>>>>>>
> > >>>>>>> or so separating the queries.
> > >>>>>>> MAJ
> > >>>>>>> ----- Original Message -----
> > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> > >>>>> number?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > name"
> > >>>>>>> given
> > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > accession
> > >>>>>>> numbers
> > >>>>>>>> for which I need to get the source organisms.  Any kind of help
> > >> will
> > >>>>> be
> > >>>>>>>> appreciated.
> > >>>>>>>>
> > >>>>>>>> Thanks
> > >>>>>>>>
> > >>>>>>>> BD
> > >>>>>>>> _______________________________________________
> > >>>>>>>> Bioperl-l mailing list
> > >>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> Bioperl-l mailing list
> > >>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>
> > >>>>
> > =======================================================================
> > >>>>>> Attention: The information contained in this message and/or
> > >>>> attachments
> > >>>>>> from AgResearch Limited is intended only for the persons or
> > entities
> > >>>>>> to which it is addressed and may contain confidential and/or
> > >>>> privileged
> > >>>>>> material. Any review, retransmission, dissemination or other use
> > of,
> > >>>> or
> > >>>>>> taking of any action in reliance upon, this information by persons
> > or
> > >>>>>> entities other than the intended recipients is prohibited by
> > >>>> AgResearch
> > >>>>>> Limited. If you have received this message in error, please notify
> > >> the
> > >>>>>> sender immediately.
> > >>>>>>
> > >>>>
> > =======================================================================
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> Bioperl-l mailing list
> > >>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> Bioperl-l mailing list
> > >>>> Bioperl-l at lists.open-bio.org
> > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jan 28 13:47:04 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 13:47:04 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
Message-ID: <FD6E9A89F6034CCB856E22553ED893D7@NewLife>

Hi Mike,
Believe I found the real bug causing the problem (was not accounting for
the db_dir parameter). Crashes should now also throw much more helpful
errors. Please try the code at r16774, and shout back.
thanks --
MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 11:18 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
> running blastn
>
>
> Dear all,
>
> I am attempting to blast some primers against the mouse genome. I have created 
> a
> local mouse genome blast database and I can search against it using 'blastn' 
> at
> the command line.
>
> I have perl code that creates an array of bioperl sequence objects called
> @primers
>
> I then create a StandAloneBlastPlus factory using the following code?
>
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
>
> and then attempt to blast my primers using this?
>
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
>
> This fails with the following error?
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> Line 63 in my code is (as you might expect) the one that calls blastn on my
> factory object.
>
> I'd appreciate any help you might be able to provide to shed light on this.
>
> Thanks in advance,
>
> Mike
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jan 28 14:00:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:00:26 -0600
Subject: [Bioperl-l] EUtilities policy change
Message-ID: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>

All,

Per NCBI's recent change in eutils user policy (effective June 1):

http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html

Both the tool and email parameters ('-tool', '-email') are now required
when making requests.  Note this will significantly break all modules
requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
and Taxonomy stuff as well, IIRC).  This also applies to web services
(SOAP-based access).  Mark, not sure how this affects your SOAP-based
modules.

I have reconfigured Bio::DB::EUtilities to follow this policy; the
default tool setting has been 'bioperl' and will remain that way.
However, there has been no default email, therefore setting this is now
required for future requests unless we (the bioperl devs) decide there
is a safe default email to utilize.  My gut tells me, however, that
falling back to a default email opens up a can of worms for the devs and
is very likely a 'BAD IDEA'(TM).  

Regardless, be aware that, after June 1, NCBI will very likely exclude
requests with no email and will notify users who are considered to be
violating their policies.

I will likely make further changes to Bio::DB::EUtilities in the
meantime to ensure that using the tools by default will not violate
NCBI's policy (e.g. override this at your own risk).  

chris


From maj at fortinbras.us  Thu Jan 28 14:05:43 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 14:05:43 -0500
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
Message-ID: <8F49B5ED151143FA86E977B4D4F44265@NewLife>

Thanks Chris-- 
The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
I agree that a default email is a bad idea (tm) (unless maybe it's 
hilmar's...?). I'd say a warning on unset email parameters is a responsible
"there be dragons" sort of treatment.
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
Cc: "Mark A. Jensen" <maj at fortinbras.us>
Sent: Thursday, January 28, 2010 2:00 PM
Subject: EUtilities policy change


> All,
> 
> Per NCBI's recent change in eutils user policy (effective June 1):
> 
> http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
> 
> Both the tool and email parameters ('-tool', '-email') are now required
> when making requests.  Note this will significantly break all modules
> requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
> and Taxonomy stuff as well, IIRC).  This also applies to web services
> (SOAP-based access).  Mark, not sure how this affects your SOAP-based
> modules.
> 
> I have reconfigured Bio::DB::EUtilities to follow this policy; the
> default tool setting has been 'bioperl' and will remain that way.
> However, there has been no default email, therefore setting this is now
> required for future requests unless we (the bioperl devs) decide there
> is a safe default email to utilize.  My gut tells me, however, that
> falling back to a default email opens up a can of worms for the devs and
> is very likely a 'BAD IDEA'(TM).  
> 
> Regardless, be aware that, after June 1, NCBI will very likely exclude
> requests with no email and will notify users who are considered to be
> violating their policies.
> 
> I will likely make further changes to Bio::DB::EUtilities in the
> meantime to ensure that using the tools by default will not violate
> NCBI's policy (e.g. override this at your own risk).  
> 
> chris
> 
> 
>


From cjfields at illinois.edu  Thu Jan 28 14:18:22 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:18:22 -0600
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <8F49B5ED151143FA86E977B4D4F44265@NewLife>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
	<8F49B5ED151143FA86E977B4D4F44265@NewLife>
Message-ID: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu>

I think warning is fine for now.  I've reimplemented that so it occurs
lazily (warns only when a request is actually made).

Will also change the tool to 'BioPerl' (currently 'bioperl', all lc).
We'll obviously have to address this in the test suite as well in some
way, maybe ask for an email if network tests are requested.

chris 

On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote:
> Thanks Chris-- 
> The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
> I agree that a default email is a bad idea (tm) (unless maybe it's 
> hilmar's...?). I'd say a warning on unset email parameters is a responsible
> "there be dragons" sort of treatment.
> MAJ
> ----- Original Message ----- 
> From: "Chris Fields" <cjfields at illinois.edu>
> To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
> Cc: "Mark A. Jensen" <maj at fortinbras.us>
> Sent: Thursday, January 28, 2010 2:00 PM
> Subject: EUtilities policy change
> 
> 
> > All,
> > 
> > Per NCBI's recent change in eutils user policy (effective June 1):
> > 
> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
> > 
> > Both the tool and email parameters ('-tool', '-email') are now required
> > when making requests.  Note this will significantly break all modules
> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
> > and Taxonomy stuff as well, IIRC).  This also applies to web services
> > (SOAP-based access).  Mark, not sure how this affects your SOAP-based
> > modules.
> > 
> > I have reconfigured Bio::DB::EUtilities to follow this policy; the
> > default tool setting has been 'bioperl' and will remain that way.
> > However, there has been no default email, therefore setting this is now
> > required for future requests unless we (the bioperl devs) decide there
> > is a safe default email to utilize.  My gut tells me, however, that
> > falling back to a default email opens up a can of worms for the devs and
> > is very likely a 'BAD IDEA'(TM).  
> > 
> > Regardless, be aware that, after June 1, NCBI will very likely exclude
> > requests with no email and will notify users who are considered to be
> > violating their policies.
> > 
> > I will likely make further changes to Bio::DB::EUtilities in the
> > meantime to ensure that using the tools by default will not violate
> > NCBI's policy (e.g. override this at your own risk).  
> > 
> > chris
> > 
> > 
> >


From Russell.Smithies at agresearch.co.nz  Thu Jan 28 14:25:38 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 29 Jan 2010 08:25:38 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
	<1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>

Yes, I usually set the 'tool' and 'email' parameters.
I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well...

--Russell

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Friday, 29 January 2010 7:26 a.m.
> To: Smithies, Russell
> Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Russell,
> 
> Just curious, but have you tried setting the return email parameter
> (-email)?  NCBI recently stated that all queries would eventually
> require a return email of some sort (not sure if it's validated or not).
> I think that was set for around late spring.  I'm changing the code in
> svn to require it for that very purpose.
> 
> chris
> 
> 
>  Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi
> still works if you don't mind a bit of manual button clicking. It's
> handling chunks of 100,000 records OK (today).
> >
> > --Russell
> >
> > > -----Original Message-----
> > > From: Chris Fields [mailto:cjfields at illinois.edu]
> > > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > > To: Smithies, Russell
> > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > > number?
> > >
> > > Makes me wonder if they're pushing more users towards the SOAP-based
> > > services and away from eutils.
> > >
> > > chris
> > >
> > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > >
> > > > I've had a wide selection of errors lately:
> > > >
> > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11
> (Resource
> > > temporarily unavailable)
> > > > STACK: Error::throw
> > > > STACK: Bio::Root::Root::throw
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > STACK: Bio::Tools::EUtilities::parse_data
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > STACK: Bio::Tools::EUtilities::get_ids
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > STACK: Bio::DB::EUtilities::get_ids
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > STACK: get_desc.pl:32
> > > > -----------------------------------------------------------
> > > >
> > > > And I never get a good explanation from NCBI or suggestions on how
> to
> > > avoid it.
> > > >
> > > >
> > > > --Russell
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > > >> To: Smithies, Russell
> > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> > > >> number?
> > > >>
> > > >> It's unfortunate but I have heard this problem popping up quite a
> bit
> > > more
> > > >> frequently lately.  Not to push too many buttons but NCBI isn't
> very
> > > >> forthcoming with help these days; they have become quite insular.
> Not
> > > >> sure if they're short-staffed due to budget or if there are other
> > > issues.
> > > >>
> > > >> chris
> > > >>
> > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > > >>
> > > >>> Grrrrrr, I hate eutils!!!!
> > > >>>
> > > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > > >> (Connection refused)
> > > >>> STACK: Error::throw
> > > >>> STACK: Bio::Root::Root::throw
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > >>> STACK: Bio::Tools::EUtilities::parse_data
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > >>> STACK: Bio::Tools::EUtilities::get_ids
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > >>> STACK: Bio::DB::EUtilities::get_ids
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > >>> STACK: get_desc.pl:32
> > > >>> -----------------------------------------------------------
> > > >>>
> > > >>>
> > > >>> Nice error message though :-)
> > > >>>
> > > >>>
> > > >>> --Russell
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > > >>>> To: 'Chris Fields'
> > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > > bio.org'
> > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> > > >>>> number?
> > > >>>>
> > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as
> I've
> > > >> often
> > > >>>> been finding that with large queries, chunks of the resulting
> data is
> > > >>>> missing.
> > > >>>> For example, before Xmas I was creating species-specific
> databases by
> > > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > > >> retrieving
> > > >>>> the fasta sequences in chunks of 500.
> > > >>>> Very regularly, in the middle of the fasta there would be a
> message
> > > >> about
> > > >>>> resource unavailable eg.
> > > >>>>> test_sequence_1
> > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > > >>>>> test_sequence_2
> > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > > >>>>
> > > >>>> Often this wasn't detected until formatdb complained about
> invalid
> > > >>>> characters.
> > > >>>> Inquiries to NCBI as to why this was happening and what to do
> about
> > > it
> > > >>>> returned stupid answers ("do each sequence manually thru the web
> > > >>>> interface", or "use eUtils").
> > > >>>> As we have a nice fast network connection, I now prefer to
> download
> > > >> very
> > > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > > >>>>
> > > >>>> I can't help but think that NCBI could solve a lot of problems if
> > > they
> > > >>>> gzipped the output from eUtils queries - it's something I've
> > > requested
> > > >>>> regularly for the last 5 years or so!!
> > > >>>>
> > > >>>> --Russell
> > > >>>>
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > > >>>>> To: Smithies, Russell
> > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > > bio.org'
> > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > accession
> > > >>>>> number?
> > > >>>>>
> > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same
> files
> > > or
> > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD
> for
> > > >> the
> > > >>>>> details).
> > > >>>>>
> > > >>>>> chris
> > > >>>>>
> > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > > >>>>>
> > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > > >>>> flakiness
> > > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > > >> gi_taxid_prot.zip
> > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into
> a
> > > hash
> > > >>>> and
> > > >>>>> do lookups.
> > > >>>>>> In that same dir, taxdump.tar.gz contains a file called
> names.dmp
> > > >>>> which
> > > >>>>> lists taxids and descriptions (and synonyms)
> > > >>>>>>
> > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes
> so
> > > I
> > > >>>>> could do this:
> > > >>>>>>
> > > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > > >>>>>> my $org_name = $names{$taxid};
> > > >>>>>>
> > > >>>>>> --Russell
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> -----Original Message-----
> > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > >> accession
> > > >>>>>>> number?
> > > >>>>>>>
> > > >>>>>>> Bhakti,
> > > >>>>>>> The following example (using EUtilities) may serve your
> purpose:
> > > >>>>>>>
> > > >>>>>>> use Bio::DB::EUtilities;
> > > >>>>>>>
> > > >>>>>>> my (%taxa, @taxa);
> > > >>>>>>> my (%names, %idmap);
> > > >>>>>>>
> > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom
> =>
> > > >>>>>>> 'nucleotide',
> > > >>>>>>> # (probably)
> > > >>>>>>>
> > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > > >>>>>>>
> > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > > >>>>>>>                                     -db => 'taxonomy',
> > > >>>>>>>                                     -dbfrom => 'protein',
> > > >>>>>>>                                     -correspondence => 1,
> > > >>>>>>>                                     -id => \@ids);
> > > >>>>>>>
> > > >>>>>>> # iterate through the LinkSet objects
> > > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> @taxa = @taxa{@ids};
> > > >>>>>>>
> > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > > >>>>>>>      -db    => 'taxonomy',
> > > >>>>>>>      -id    => \@taxa );
> > > >>>>>>>
> > > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> foreach (@ids) {
> > > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> # %idmap is
> > > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > > >>>>>>> #    89318838 => undef    (this record has been removed from
> the
> > > db)
> > > >>>>>>>
> > > >>>>>>> 1;
> > > >>>>>>>
> > > >>>>>>> You probably will need to break up your 30000 into chunks
> > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > > >>>>>>>
> > > >>>>>>> sleep 3;
> > > >>>>>>>
> > > >>>>>>> or so separating the queries.
> > > >>>>>>> MAJ
> > > >>>>>>> ----- Original Message -----
> > > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from
> accession
> > > >>>>> number?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Hi,
> > > >>>>>>>>
> > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > > name"
> > > >>>>>>> given
> > > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > > accession
> > > >>>>>>> numbers
> > > >>>>>>>> for which I need to get the source organisms.  Any kind of
> help
> > > >> will
> > > >>>>> be
> > > >>>>>>>> appreciated.
> > > >>>>>>>>
> > > >>>>>>>> Thanks
> > > >>>>>>>>
> > > >>>>>>>> BD
> > > >>>>>>>> _______________________________________________
> > > >>>>>>>> Bioperl-l mailing list
> > > >>>>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> _______________________________________________
> > > >>>>>>> Bioperl-l mailing list
> > > >>>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>
> > > >>>>
> > >
> =======================================================================
> > > >>>>>> Attention: The information contained in this message and/or
> > > >>>> attachments
> > > >>>>>> from AgResearch Limited is intended only for the persons or
> > > entities
> > > >>>>>> to which it is addressed and may contain confidential and/or
> > > >>>> privileged
> > > >>>>>> material. Any review, retransmission, dissemination or other
> use
> > > of,
> > > >>>> or
> > > >>>>>> taking of any action in reliance upon, this information by
> persons
> > > or
> > > >>>>>> entities other than the intended recipients is prohibited by
> > > >>>> AgResearch
> > > >>>>>> Limited. If you have received this message in error, please
> notify
> > > >> the
> > > >>>>>> sender immediately.
> > > >>>>>>
> > > >>>>
> > >
> =======================================================================
> > > >>>>>>
> > > >>>>>> _______________________________________________
> > > >>>>>> Bioperl-l mailing list
> > > >>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> Bioperl-l mailing list
> > > >>>> Bioperl-l at lists.open-bio.org
> > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Thu Jan 28 14:30:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:30:12 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
	<1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>
Message-ID: <1264707012.5473.51.camel@cjfields.igb.uiuc.edu>

Russell,

Okay, just wanted to make sure.  The email/tool requirements weren't
actually enforced up until now, which is forcing us to do a bit of
re-work on the various tools that don't have it set by default (at least
warn users unaware of it).  

And I agree, gzipped archives would be nice!

chris

On Fri, 2010-01-29 at 08:25 +1300, Smithies, Russell wrote:
> Yes, I usually set the 'tool' and 'email' parameters.
> I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well...
> 
> --Russell
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Friday, 29 January 2010 7:26 a.m.
> > To: Smithies, Russell
> > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > 
> > Russell,
> > 
> > Just curious, but have you tried setting the return email parameter
> > (-email)?  NCBI recently stated that all queries would eventually
> > require a return email of some sort (not sure if it's validated or not).
> > I think that was set for around late spring.  I'm changing the code in
> > svn to require it for that very purpose.
> > 
> > chris
> > 
> > 
> >  Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi
> > still works if you don't mind a bit of manual button clicking. It's
> > handling chunks of 100,000 records OK (today).
> > >
> > > --Russell
> > >
> > > > -----Original Message-----
> > > > From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > > > To: Smithies, Russell
> > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > > > number?
> > > >
> > > > Makes me wonder if they're pushing more users towards the SOAP-based
> > > > services and away from eutils.
> > > >
> > > > chris
> > > >
> > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > > >
> > > > > I've had a wide selection of errors lately:
> > > > >
> > > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11
> > (Resource
> > > > temporarily unavailable)
> > > > > STACK: Error::throw
> > > > > STACK: Bio::Root::Root::throw
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > > STACK: Bio::Tools::EUtilities::parse_data
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > > STACK: Bio::Tools::EUtilities::get_ids
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > > STACK: Bio::DB::EUtilities::get_ids
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > > STACK: get_desc.pl:32
> > > > > -----------------------------------------------------------
> > > > >
> > > > > And I never get a good explanation from NCBI or suggestions on how
> > to
> > > > avoid it.
> > > > >
> > > > >
> > > > > --Russell
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > > > >> To: Smithies, Russell
> > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >> number?
> > > > >>
> > > > >> It's unfortunate but I have heard this problem popping up quite a
> > bit
> > > > more
> > > > >> frequently lately.  Not to push too many buttons but NCBI isn't
> > very
> > > > >> forthcoming with help these days; they have become quite insular.
> > Not
> > > > >> sure if they're short-staffed due to budget or if there are other
> > > > issues.
> > > > >>
> > > > >> chris
> > > > >>
> > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > > > >>
> > > > >>> Grrrrrr, I hate eutils!!!!
> > > > >>>
> > > > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > > > >> (Connection refused)
> > > > >>> STACK: Error::throw
> > > > >>> STACK: Bio::Root::Root::throw
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > >>> STACK: Bio::Tools::EUtilities::parse_data
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > >>> STACK: Bio::Tools::EUtilities::get_ids
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > >>> STACK: Bio::DB::EUtilities::get_ids
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > >>> STACK: get_desc.pl:32
> > > > >>> -----------------------------------------------------------
> > > > >>>
> > > > >>>
> > > > >>> Nice error message though :-)
> > > > >>>
> > > > >>>
> > > > >>> --Russell
> > > > >>>
> > > > >>>> -----Original Message-----
> > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > > > >>>> To: 'Chris Fields'
> > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > > > bio.org'
> > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >>>> number?
> > > > >>>>
> > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as
> > I've
> > > > >> often
> > > > >>>> been finding that with large queries, chunks of the resulting
> > data is
> > > > >>>> missing.
> > > > >>>> For example, before Xmas I was creating species-specific
> > databases by
> > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > > > >> retrieving
> > > > >>>> the fasta sequences in chunks of 500.
> > > > >>>> Very regularly, in the middle of the fasta there would be a
> > message
> > > > >> about
> > > > >>>> resource unavailable eg.
> > > > >>>>> test_sequence_1
> > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > > > >>>>> test_sequence_2
> > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > > > >>>>
> > > > >>>> Often this wasn't detected until formatdb complained about
> > invalid
> > > > >>>> characters.
> > > > >>>> Inquiries to NCBI as to why this was happening and what to do
> > about
> > > > it
> > > > >>>> returned stupid answers ("do each sequence manually thru the web
> > > > >>>> interface", or "use eUtils").
> > > > >>>> As we have a nice fast network connection, I now prefer to
> > download
> > > > >> very
> > > > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > > > >>>>
> > > > >>>> I can't help but think that NCBI could solve a lot of problems if
> > > > they
> > > > >>>> gzipped the output from eUtils queries - it's something I've
> > > > requested
> > > > >>>> regularly for the last 5 years or so!!
> > > > >>>>
> > > > >>>> --Russell
> > > > >>>>
> > > > >>>>
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > > > >>>>> To: Smithies, Russell
> > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > > > bio.org'
> > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > > accession
> > > > >>>>> number?
> > > > >>>>>
> > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same
> > files
> > > > or
> > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD
> > for
> > > > >> the
> > > > >>>>> details).
> > > > >>>>>
> > > > >>>>> chris
> > > > >>>>>
> > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > > > >>>>>
> > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > > > >>>> flakiness
> > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > > > >> gi_taxid_prot.zip
> > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into
> > a
> > > > hash
> > > > >>>> and
> > > > >>>>> do lookups.
> > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called
> > names.dmp
> > > > >>>> which
> > > > >>>>> lists taxids and descriptions (and synonyms)
> > > > >>>>>>
> > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes
> > so
> > > > I
> > > > >>>>> could do this:
> > > > >>>>>>
> > > > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > > > >>>>>> my $org_name = $names{$taxid};
> > > > >>>>>>
> > > > >>>>>> --Russell
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> -----Original Message-----
> > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > > >> accession
> > > > >>>>>>> number?
> > > > >>>>>>>
> > > > >>>>>>> Bhakti,
> > > > >>>>>>> The following example (using EUtilities) may serve your
> > purpose:
> > > > >>>>>>>
> > > > >>>>>>> use Bio::DB::EUtilities;
> > > > >>>>>>>
> > > > >>>>>>> my (%taxa, @taxa);
> > > > >>>>>>> my (%names, %idmap);
> > > > >>>>>>>
> > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom
> > =>
> > > > >>>>>>> 'nucleotide',
> > > > >>>>>>> # (probably)
> > > > >>>>>>>
> > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > > > >>>>>>>
> > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > > > >>>>>>>                                     -db => 'taxonomy',
> > > > >>>>>>>                                     -dbfrom => 'protein',
> > > > >>>>>>>                                     -correspondence => 1,
> > > > >>>>>>>                                     -id => \@ids);
> > > > >>>>>>>
> > > > >>>>>>> # iterate through the LinkSet objects
> > > > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > > > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> @taxa = @taxa{@ids};
> > > > >>>>>>>
> > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > > > >>>>>>>      -db    => 'taxonomy',
> > > > >>>>>>>      -id    => \@taxa );
> > > > >>>>>>>
> > > > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > > > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> foreach (@ids) {
> > > > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> # %idmap is
> > > > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > > > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > > > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > > > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > > > >>>>>>> #    89318838 => undef    (this record has been removed from
> > the
> > > > db)
> > > > >>>>>>>
> > > > >>>>>>> 1;
> > > > >>>>>>>
> > > > >>>>>>> You probably will need to break up your 30000 into chunks
> > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > > > >>>>>>>
> > > > >>>>>>> sleep 3;
> > > > >>>>>>>
> > > > >>>>>>> or so separating the queries.
> > > > >>>>>>> MAJ
> > > > >>>>>>> ----- Original Message -----
> > > > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > > > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >>>>> number?
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>> Hi,
> > > > >>>>>>>>
> > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > > > name"
> > > > >>>>>>> given
> > > > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > > > accession
> > > > >>>>>>> numbers
> > > > >>>>>>>> for which I need to get the source organisms.  Any kind of
> > help
> > > > >> will
> > > > >>>>> be
> > > > >>>>>>>> appreciated.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks
> > > > >>>>>>>>
> > > > >>>>>>>> BD
> > > > >>>>>>>> _______________________________________________
> > > > >>>>>>>> Bioperl-l mailing list
> > > > >>>>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> _______________________________________________
> > > > >>>>>>> Bioperl-l mailing list
> > > > >>>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>>>
> > > > >>>>
> > > >
> > =======================================================================
> > > > >>>>>> Attention: The information contained in this message and/or
> > > > >>>> attachments
> > > > >>>>>> from AgResearch Limited is intended only for the persons or
> > > > entities
> > > > >>>>>> to which it is addressed and may contain confidential and/or
> > > > >>>> privileged
> > > > >>>>>> material. Any review, retransmission, dissemination or other
> > use
> > > > of,
> > > > >>>> or
> > > > >>>>>> taking of any action in reliance upon, this information by
> > persons
> > > > or
> > > > >>>>>> entities other than the intended recipients is prohibited by
> > > > >>>> AgResearch
> > > > >>>>>> Limited. If you have received this message in error, please
> > notify
> > > > >> the
> > > > >>>>>> sender immediately.
> > > > >>>>>>
> > > > >>>>
> > > >
> > =======================================================================
> > > > >>>>>>
> > > > >>>>>> _______________________________________________
> > > > >>>>>> Bioperl-l mailing list
> > > > >>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>
> > > > >>>>
> > > > >>>> _______________________________________________
> > > > >>>> Bioperl-l mailing list
> > > > >>>> Bioperl-l at lists.open-bio.org
> > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 


From maj at fortinbras.us  Thu Jan 28 14:55:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 14:55:31 -0500
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu><8F49B5ED151143FA86E977B4D4F44265@NewLife>
	<1264706302.5473.48.camel@cjfields.igb.uiuc.edu>
Message-ID: <CD70565A9D3F44C4A0D7BA6462E021E0@NewLife>

Ok, SoapEU now warns on no email; passes email onto the fetch stage
during autofetch -- cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl-l" <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 2:18 PM
Subject: Re: [Bioperl-l] EUtilities policy change


>I think warning is fine for now.  I've reimplemented that so it occurs
> lazily (warns only when a request is actually made).
> 
> Will also change the tool to 'BioPerl' (currently 'bioperl', all lc).
> We'll obviously have to address this in the test suite as well in some
> way, maybe ask for an email if network tests are requested.
> 
> chris 
> 
> On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote:
>> Thanks Chris-- 
>> The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
>> I agree that a default email is a bad idea (tm) (unless maybe it's 
>> hilmar's...?). I'd say a warning on unset email parameters is a responsible
>> "there be dragons" sort of treatment.
>> MAJ
>> ----- Original Message ----- 
>> From: "Chris Fields" <cjfields at illinois.edu>
>> To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
>> Cc: "Mark A. Jensen" <maj at fortinbras.us>
>> Sent: Thursday, January 28, 2010 2:00 PM
>> Subject: EUtilities policy change
>> 
>> 
>> > All,
>> > 
>> > Per NCBI's recent change in eutils user policy (effective June 1):
>> > 
>> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
>> > 
>> > Both the tool and email parameters ('-tool', '-email') are now required
>> > when making requests.  Note this will significantly break all modules
>> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
>> > and Taxonomy stuff as well, IIRC).  This also applies to web services
>> > (SOAP-based access).  Mark, not sure how this affects your SOAP-based
>> > modules.
>> > 
>> > I have reconfigured Bio::DB::EUtilities to follow this policy; the
>> > default tool setting has been 'bioperl' and will remain that way.
>> > However, there has been no default email, therefore setting this is now
>> > required for future requests unless we (the bioperl devs) decide there
>> > is a safe default email to utilize.  My gut tells me, however, that
>> > falling back to a default email opens up a can of worms for the devs and
>> > is very likely a 'BAD IDEA'(TM).  
>> > 
>> > Regardless, be aware that, after June 1, NCBI will very likely exclude
>> > requests with no email and will notify users who are considered to be
>> > violating their policies.
>> > 
>> > I will likely make further changes to Bio::DB::EUtilities in the
>> > meantime to ensure that using the tools by default will not violate
>> > NCBI's policy (e.g. override this at your own risk).  
>> > 
>> > chris
>> > 
>> > 
>> >
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From chapmanb at 50mail.com  Thu Jan 28 15:35:05 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 28 Jan 2010 15:35:05 -0500
Subject: [Bioperl-l] OpenBio solution challenge: Project updates at BOSC 2010
Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu>

Hello all;
The BOSC 2010 organizing committee is hard at work getting prepared for this
July's meeting in Boston:

http://www.open-bio.org/wiki/BOSC_2010

One of the items we've traditionally had at the conference is a project 
update from each of the OpenBio affiliated groups. This year, we're thinking
about organizing these talks around a central theme: the OpenBio solution
challenge. We start with a biological question of general interest, and each
of the project talks would focus around how you would solve that problem 
using your toolkit and programming language.

This is meant to provide a challenge for OpenBio contributors, a nice tutorial
style overview of various projects and approaches for other programmers, and a
fun opportunity to compete and learn from other projects. Conference attendees
will vote on their favorite solution, with the winner receiving fame and
fortune (warning: fortune not guaranteed).

For this to be successful, it of course requires interest and enthusiasm from
y'all fine folks involved with the projects. Specifically:

- Is there interest from your group in participating in the challenge? You'll
  want at least a few people to work on it, and someone to give a presentation 
  at BOSC.

- Do you have suggestions on a good theme or specific biological problem to
  tackle? We'll hope to pick something in a sweet spot that is challenging 
  enough to be of interest, yet reasonable for presentation and preparation.

Let's discuss ideas and get this together. Since the schedule for BOSC is
developing rapidly, please give us an idea if you're interested by
February 12th, and copy responses to the BOSC mailing list as a central 
place for discussion.

bosc at open-bio.org

Thanks,
Brad, Michael, and the BOSC organizing committee


From markw at illuminae.com  Thu Jan 28 16:17:44 2010
From: markw at illuminae.com (Mark Wilkinson)
Date: Thu, 28 Jan 2010 13:17:44 -0800
Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project
 updates at BOSC 2010
In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
Message-ID: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>


Brad, this sounds exciting!

One thing strikes me, though - by asking for the sub-projects to propose
the "grand challenge" themselves the one thing you can guarantee is that
the "grand challenge" is solvable (or more likely, already solved!)

Other "grand challenge" kinds of meetings have an independent third party
pose the problem that has to be solved, and then all groups work toward a
solution and compare their results.  This would, IMO, be more revealing of
the "state of the art" in each Open-Bio project, and point out where the
weaknesses are that we should be focusing on...  Someone (for example,
you!) could act as the moderator to ensure that the "grand challenge" was
at least a reasonable one, within the scope of what an Open-Bio project
*should* be able to solve...

Just my CAD $0.02

Mark


On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman <chapmanb at 50mail.com>  
wrote:

> Hello all;
> The BOSC 2010 organizing committee is hard at work getting prepared for  
> this
> July's meeting in Boston:
>
> http://www.open-bio.org/wiki/BOSC_2010
>
> One of the items we've traditionally had at the conference is a project
> update from each of the OpenBio affiliated groups. This year, we're  
> thinking
> about organizing these talks around a central theme: the OpenBio solution
> challenge. We start with a biological question of general interest, and  
> each
> of the project talks would focus around how you would solve that problem
> using your toolkit and programming language.
>
> This is meant to provide a challenge for OpenBio contributors, a nice  
> tutorial
> style overview of various projects and approaches for other programmers,  
> and a
> fun opportunity to compete and learn from other projects. Conference  
> attendees
> will vote on their favorite solution, with the winner receiving fame and
> fortune (warning: fortune not guaranteed).
>
> For this to be successful, it of course requires interest and enthusiasm  
> from
> y'all fine folks involved with the projects. Specifically:
>
> - Is there interest from your group in participating in the challenge?  
> You'll
>   want at least a few people to work on it, and someone to give a  
> presentation
>   at BOSC.
>
> - Do you have suggestions on a good theme or specific biological problem  
> to
>   tackle? We'll hope to pick something in a sweet spot that is  
> challenging
>   enough to be of interest, yet reasonable for presentation and  
> preparation.
>
> Let's discuss ideas and get this together. Since the schedule for BOSC is
> developing rapidly, please give us an idea if you're interested by
> February 12th, and copy responses to the BOSC mailing list as a central
> place for discussion.
>
> bosc at open-bio.org
>
> Thanks,
> Brad, Michael, and the BOSC organizing committee
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


-- 
Mark D Wilkinson, PI Bioinformatics
Assistant Professor, Medical Genetics
The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research
Providence Heart + Lung Institute
University of British Columbia - St. Paul's Hospital
Vancouver, BC, Canada


From HWillis at scripps.edu  Thu Jan 28 20:03:10 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 Jan 2010 20:03:10 -0500
Subject: [Bioperl-l] [Biojava-dev] [MOBY-dev] OpenBio solution
 challenge: Project updates at BOSC 2010
In-Reply-To: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
	<op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu>

Brad

I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution.

Scooter


On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote:

> 
> Brad, this sounds exciting!
> 
> One thing strikes me, though - by asking for the sub-projects to propose
> the "grand challenge" themselves the one thing you can guarantee is that
> the "grand challenge" is solvable (or more likely, already solved!)
> 
> Other "grand challenge" kinds of meetings have an independent third party
> pose the problem that has to be solved, and then all groups work toward a
> solution and compare their results.  This would, IMO, be more revealing of
> the "state of the art" in each Open-Bio project, and point out where the
> weaknesses are that we should be focusing on...  Someone (for example,
> you!) could act as the moderator to ensure that the "grand challenge" was
> at least a reasonable one, within the scope of what an Open-Bio project
> *should* be able to solve...
> 
> Just my CAD $0.02
> 
> Mark
> 
> 
> 
> On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman <chapmanb at 50mail.com>  
> wrote:
> 
>> Hello all;
>> The BOSC 2010 organizing committee is hard at work getting prepared for  
>> this
>> July's meeting in Boston:
>> 
>> http://www.open-bio.org/wiki/BOSC_2010
>> 
>> One of the items we've traditionally had at the conference is a project
>> update from each of the OpenBio affiliated groups. This year, we're  
>> thinking
>> about organizing these talks around a central theme: the OpenBio solution
>> challenge. We start with a biological question of general interest, and  
>> each
>> of the project talks would focus around how you would solve that problem
>> using your toolkit and programming language.
>> 
>> This is meant to provide a challenge for OpenBio contributors, a nice  
>> tutorial
>> style overview of various projects and approaches for other programmers,  
>> and a
>> fun opportunity to compete and learn from other projects. Conference  
>> attendees
>> will vote on their favorite solution, with the winner receiving fame and
>> fortune (warning: fortune not guaranteed).
>> 
>> For this to be successful, it of course requires interest and enthusiasm  
>> from
>> y'all fine folks involved with the projects. Specifically:
>> 
>> - Is there interest from your group in participating in the challenge?  
>> You'll
>>  want at least a few people to work on it, and someone to give a  
>> presentation
>>  at BOSC.
>> 
>> - Do you have suggestions on a good theme or specific biological problem  
>> to
>>  tackle? We'll hope to pick something in a sweet spot that is  
>> challenging
>>  enough to be of interest, yet reasonable for presentation and  
>> preparation.
>> 
>> Let's discuss ideas and get this together. Since the schedule for BOSC is
>> developing rapidly, please give us an idea if you're interested by
>> February 12th, and copy responses to the BOSC mailing list as a central
>> place for discussion.
>> 
>> bosc at open-bio.org
>> 
>> Thanks,
>> Brad, Michael, and the BOSC organizing committee
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
> 
> 
> -- 
> Mark D Wilkinson, PI Bioinformatics
> Assistant Professor, Medical Genetics
> The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research
> Providence Heart + Lung Institute
> University of British Columbia - St. Paul's Hospital
> Vancouver, BC, Canada
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From biopython at maubp.freeserve.co.uk  Fri Jan 29 05:36:40 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 29 Jan 2010 10:36:40 +0000
Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project
	updates at BOSC 2010
In-Reply-To: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
	<op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com>

Hi all,

This is a great topic but should be continue it on just the one mailing list?
Is there a suitable BOSC list, or how about the general Open Bio list?

On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson <markw at illuminae.com> wrote:
>
> Brad, this sounds exciting!
>
> One thing strikes me, though - by asking for the sub-projects to propose
> the "grand challenge" themselves the one thing you can guarantee is that
> the "grand challenge" is solvable (or more likely, already solved!)
>
> Other "grand challenge" kinds of meetings have an independent third party
> pose the problem that has to be solved, and then all groups work toward a
> solution and compare their results. ?This would, IMO, be more revealing of
> the "state of the art" in each Open-Bio project, and point out where the
> weaknesses are that we should be focusing on... ?Someone (for example,
> you!) could act as the moderator to ensure that the "grand challenge" was
> at least a reasonable one, within the scope of what an Open-Bio project
> *should* be able to solve...
>
> Just my CAD $0.02
>
> Mark

One possible problem with having Brad act as moderator is his ties to
Biopython (plus it would be a shame if we'd be one man down for trying
to solve the challenges - grin). Having a project representative "sign off"
on the challenge might work - or simply the whole of the BOSC committee
which is quite balanced. Alternatively some kind of panel of challenges does
seem a good way to reduce individual project bias (as suggest by Scooter),
but there will still need to be a judging committee.

I'm curious what kind of challenges the BOSC committee had in mind -
would something like taking a newly sequence bacteria and producing
an automated annotation as a GenBank, EMBL, or GFF  file be too
ambitious for example? There are already several major projects
to do this e.g. RAST http://rast.nmpdr.org/

Peter
(@Biopython)


From mike.stubbington at bbsrc.ac.uk  Fri Jan 29 08:25:25 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Fri, 29 Jan 2010 13:25:25 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
In-Reply-To: <FD6E9A89F6034CCB856E22553ED893D7@NewLife>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
	<FD6E9A89F6034CCB856E22553ED893D7@NewLife>
Message-ID: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>

Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
	-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with 

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
> error running blastn
> 
> 
> Hi,
> 
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
> 
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
> line 532.
> 
> STACK Bio::Tools::Run::WrapperBase::_run 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
> 
> M
> 
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
> 
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
>> running blastn
>> 
>> 
>> Dear all,
>> 
>> I am attempting to blast some primers against the mouse genome. I have created 
>> a
>> local mouse genome blast database and I can search against it using 'blastn' 
>> at
>> the command line.
>> 
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>> 
>> I then create a StandAloneBlastPlus factory using the following code?
>> 
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>> 
>> and then attempt to blast my primers using this?
>> 
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>> 
>> This fails with the following error?
>> 
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
>> line 532.
>> 
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>> 
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>> 
>> I'd appreciate any help you might be able to provide to shed light on this.
>> 
>> Thanks in advance,
>> 
>> Mike
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Fri Jan 29 08:36:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 29 Jan 2010 08:36:54 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
	<FD6E9A89F6034CCB856E22553ED893D7@NewLife>
	<ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
Message-ID: <DF05D2C7E8CC4CF18E6AE56077EB738A@NewLife>

Hi Mike-
Well, at least we're getting more informative errors. I think it's
still my bad; will look again. Both of your calls should work.
(thanks for the positive control too)
Thanks for your patience and the help--
MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Brian Osborne" <bosborne11 at verizon.net>
Sent: Friday, January 29, 2010 8:25 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file 
found for nucleotide database [MouseGenome] in search path 
[/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I 
construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
> error running blastn
>
>
> Hi,
>
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, 
> <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> M
>
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
>
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
>> error
>> running blastn
>>
>>
>> Dear all,
>>
>> I am attempting to blast some primers against the mouse genome. I have 
>> created
>> a
>> local mouse genome blast database and I can search against it using 'blastn'
>> at
>> the command line.
>>
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>>
>> I then create a StandAloneBlastPlus factory using the following code?
>>
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>>
>> and then attempt to blast my primers using this?
>>
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>>
>> This fails with the following error?
>>
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, 
>> <DATA>
>> line 532.
>>
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>>
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>>
>> I'd appreciate any help you might be able to provide to shed light on this.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From maj at fortinbras.us  Fri Jan 29 08:47:48 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 29 Jan 2010 08:47:48 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife><05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk><FD6E9A89F6034CCB856E22553ED893D7@NewLife>
	<ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
Message-ID: <2B7BF6CD46AE441AB24203E169D9C503@NewLife>

Mike et al--
I've entered this as Bug #3003 on http://bugzilla.bioperl.org;
we'll do further ping-pongs on this issue via the comment facility
there--
cheers MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; <Brian at portal.open-bio.org>; "Osborne" 
<bosborne11 at verizon.net>
Sent: Friday, January 29, 2010 8:25 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file 
found for nucleotide database [MouseGenome] in search path 
[/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I 
construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
> error running blastn
>
>
> Hi,
>
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, 
> <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> M
>
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
>
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
>> error
>> running blastn
>>
>>
>> Dear all,
>>
>> I am attempting to blast some primers against the mouse genome. I have 
>> created
>> a
>> local mouse genome blast database and I can search against it using 'blastn'
>> at
>> the command line.
>>
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>>
>> I then create a StandAloneBlastPlus factory using the following code?
>>
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>>
>> and then attempt to blast my primers using this?
>>
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>>
>> This fails with the following error?
>>
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, 
>> <DATA>
>> line 532.
>>
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>>
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>>
>> I'd appreciate any help you might be able to provide to shed light on this.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From help at gmod.org  Fri Jan 29 17:03:48 2010
From: help at gmod.org (Dave Clements, GMOD Help Desk)
Date: Fri, 29 Jan 2010 14:03:48 -0800
Subject: [Bioperl-l] 2010 GMOD Summer School - Americas
In-Reply-To: <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com>
References: <71ee57c71001291351q47994b82w10dffb390dbf2837@mail.gmail.com>
	<71ee57c71001291354m68548823s3e3fbd2e49e9b332@mail.gmail.com>
	<71ee57c71001291356p5e7f1aadi2bf437c93014a393@mail.gmail.com>
	<71ee57c71001291357h67112e2fkcf835687e59f66ae@mail.gmail.com>
	<71ee57c71001291358k74781b08n232534d8895c5ec1@mail.gmail.com>
	<71ee57c71001291400y28e40eb6i112ea91df977dc67@mail.gmail.com>
	<71ee57c71001291400n6133982eh3a02293ff741900b@mail.gmail.com>
	<71ee57c71001291401y505b56baic61c11754d88a444@mail.gmail.com>
	<71ee57c71001291402s23e3f2e9w2562d6acf85bd4ae@mail.gmail.com>
	<71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com>
Message-ID: <71ee57c71001291403s19be18f3s3a1d5a314c74def@mail.gmail.com>

Hello all,

I am pleased to announce that we are now accepting applications for:

? 2010 GMOD Summer School - Americas
? ? 6-9 May 2010
? ? NESCent, Durham, NC, USA
? ? http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas

This will be a hands-on multi-day course aimed at teaching new GMOD
users/administrators how to get GMOD Components up and running. The
course will introduce participants to the GMOD project and then focus
on installation, configuration and integration of popular GMOD
Components. The course will be held May 6-9, at NESCent in Durham, NC.

These components will be covered:
? ?* Apollo - genome annotation editor
? ?* Chado - a modular and extensible database schema
? ?* Galaxy - workflow system
? ?* GBrowse - the Generic Genome Browser
? ?* GBrowse_syn - A generic synteny browser
? ?* JBrowse - genome browser
? ?* MAKER - genome annotation pipeline
? ?* Tripal - web front end for Chado

The deadline for applying is the end of Friday, February 22. Admission
is competitive and is based on the strength of the application
(especially the statement of interest). In 2009 there were over 50
applications for the 25 slots. Any applications received after the
deadline will be placed on the waiting list.

See the course page for details and an application link:
?http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas

Thanks,

Dave Clements
GMOD Help Desk

PS: We are also investigating holding a GMOD course in the
Asia/Pacific region, sometime this fall. Watch the GMOD mailing lists
and the GMOD News page/RSS feed for updates.
--
Please keep responses on the list!
http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas
http://gmod.org/wiki/GMOD_News
Was this helpful? http://gmod.org/wiki/Help_Desk_Feedback


From bhakti.dwivedi at gmail.com  Sat Jan 30 17:38:40 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sat, 30 Jan 2010 17:38:40 -0500
Subject: [Bioperl-l] how to map blast results on to the genome?
Message-ID: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>

Does anyone know how I can graphically map the blast results (m -8 format)
to the genome using bio-perl?

Thanks

Bhakti


From jason at bioperl.org  Sat Jan 30 18:56:14 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 30 Jan 2010 15:56:14 -0800
Subject: [Bioperl-l] how to map blast results on to the genome?
In-Reply-To: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>
References: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>
Message-ID: <68937A7D-291F-419A-9ED7-7A87D9B4C78A@bioperl.org>

Did you try BioGraphics and read the HOWTO on it -- http://bioperl.org/wiki/HOWTO:Graphics
On Jan 30, 2010, at 2:38 PM, Bhakti Dwivedi wrote:

> Does anyone know how I can graphically map the blast results (m -8  
> format)
> to the genome using bio-perl?
>
> Thanks
>
> Bhakti
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From David.Messina at sbc.su.se  Sun Jan 31 12:43:52 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 31 Jan 2010 18:43:52 +0100
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
	<18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
Message-ID: <BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>

Hey Rui,

My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet.

I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two.

Dave


From bluecurio at gmail.com  Sun Jan 31 22:22:37 2010
From: bluecurio at gmail.com (Daniel Renfro)
Date: Sun, 31 Jan 2010 21:22:37 -0600
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects
Message-ID: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com>

Hello all,

A colleague and I have been working on a (Bio)Perl package to compare two
Seq objects. This is in response to a need we found in our lab -- we wanted
to see the changes to GenBank files through time, but wanted an automated
way to do this. This led to what I'm calling the SeqDiff.pm package. I
thought it would be a good idea to inform the community and get some
feedback.

The package takes two Seq objects as arguments, arbitrarily called "old" and
"new." It then matches the features from the old object with the new object.
This is done based on some criteria -- in our case we decided the features
must be of the same type (have the same primary_tag) and have at least one
matching database cross-reference (db_xref) in common.  The left-over
features (ones that did not have a match) are dropped into arrays called
"lost" and "gained." The matching is done in about NlogN time, as each
matching pair are removed from subsequent searches.

The matched features and iterated through and the differences are
calculated. Each feature is examined recursively and any differences are
reported. Optionally you can give the new() method a flag so that everything
is returned (differences and similarities.) You can set callbacks for
different types of objects (like anything that isa('Bio::LocationI')) if you
want a custom comparison for specific BioPerl objects. This comparison step
is the computationally slow part, and currently everything is held in
memory. I think it'd be better to do this piece-meal, using the BioPerl-ish
next() and last() methods.

Maybe this was a little verbose, but that is the SeqDiff package in a
nutshell. I hope to soon release v1.0. If you have any questions or comments
I'd love to hear them.

-Daniel Renfro

Hu Lab Research Associate
Dept. of Biochemistry and Biophysics
2128 TAMU
Texas A&M Univ.
College Station, TX 77843-2128
979-862-4055


From maj at fortinbras.us  Sun Jan 31 22:47:05 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 31 Jan 2010 22:47:05 -0500
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects
In-Reply-To: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com>
References: <25c0f63d1001311922u134e9321s10f490a335f4a6e@mail.gmail.com>
Message-ID: <5DC96D65B6A447C3802AF5D745FF4AA4@NewLife>

Daniel-- this sounds interesting and useful, I +1 it. Your intuition about
in-memory vs streaming sounds correct to me; features can be many, and
diffing many (MANY) sequences may bork. Maybe our feature-rich users
can chime in. (...however, I did just hear about a magic spell called 
'File::Map',
might check that out on CPAN.)
cheers- MAJ
----- Original Message ----- 
From: "Daniel Renfro" <bluecurio at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 31, 2010 10:22 PM
Subject: [Bioperl-l] New package to compare two SeqI-implementing objects


> Hello all,
>
> A colleague and I have been working on a (Bio)Perl package to compare two
> Seq objects. This is in response to a need we found in our lab -- we wanted
> to see the changes to GenBank files through time, but wanted an automated
> way to do this. This led to what I'm calling the SeqDiff.pm package. I
> thought it would be a good idea to inform the community and get some
> feedback.
>
> The package takes two Seq objects as arguments, arbitrarily called "old" and
> "new." It then matches the features from the old object with the new object.
> This is done based on some criteria -- in our case we decided the features
> must be of the same type (have the same primary_tag) and have at least one
> matching database cross-reference (db_xref) in common.  The left-over
> features (ones that did not have a match) are dropped into arrays called
> "lost" and "gained." The matching is done in about NlogN time, as each
> matching pair are removed from subsequent searches.
>
> The matched features and iterated through and the differences are
> calculated. Each feature is examined recursively and any differences are
> reported. Optionally you can give the new() method a flag so that everything
> is returned (differences and similarities.) You can set callbacks for
> different types of objects (like anything that isa('Bio::LocationI')) if you
> want a custom comparison for specific BioPerl objects. This comparison step
> is the computationally slow part, and currently everything is held in
> memory. I think it'd be better to do this piece-meal, using the BioPerl-ish
> next() and last() methods.
>
> Maybe this was a little verbose, but that is the SeqDiff package in a
> nutshell. I hope to soon release v1.0. If you have any questions or comments
> I'd love to hear them.
>
> -Daniel Renfro
>
> Hu Lab Research Associate
> Dept. of Biochemistry and Biophysics
> 2128 TAMU
> Texas A&M Univ.
> College Station, TX 77843-2128
> 979-862-4055
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From rui.faria at upf.edu  Sun Jan 31 12:17:09 2010
From: rui.faria at upf.edu (Rui Faria)
Date: Sun, 31 Jan 2010 18:17:09 +0100 (CET)
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
Message-ID: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>

Hi Dave,

we reported the bug on codeml about errors when the user gives its own tree file, some time ago. Did you have any chances to look at it?

We basically wanted to know your opinion on where the problem may be, since we are not the most experienced "perlers" on the planet :) 

I'm asking this because we have to deal with that right now. If someone could check where is the problem, to understand if it has an easy solution, that would be of great help.

Best,

Rui


-----Mensaje Original-----
De Dave Messina <David.Messina at sbc.su.se>
Enviado Jue 31/12/2009 11:55 AM
Para Rui Faria <rui.faria at upf.edu>
Cc Jason Stajich <jason at bioperl.org>; sandraneto_ at hotmail.com; bioperl-l List <bioperl-l at lists.open-bio.org>
Asunto Re: question about a PAML module

Hi Rui and Sandra,

Could you file this as a bug report at 

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

?

Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report:
- sample input files (a sequence file and a tree file, probably)
- a script which reproduces the problem
- the output (error messages) like you show below

When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this.

There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon.


Dave


From rui.faria at upf.edu  Sun Jan 31 13:56:56 2010
From: rui.faria at upf.edu (Rui Faria)
Date: Sun, 31 Jan 2010 19:56:56 +0100 (CET)
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
	<18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
	<BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>
Message-ID: <11398434.1264964216856.JavaMail.oracle@rif1.s.upf.edu>

Many thanks!

We hope one day that we become experts we can retribute!

Rui

-----Mensaje Original-----
De Dave Messina <David.Messina at sbc.su.se>
Enviado Dom 31/01/2010 06:43 PM
Para Rui Faria <rui.faria at upf.edu>
Cc Jason Stajich <jason at bioperl.org>; sandraneto_ at hotmail.com; bioperl-l List <bioperl-l at lists.open-bio.org>
Asunto Re: question about a PAML module

Hey Rui,

My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet.

I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two.

Dave


From avilella at gmail.com  Sat Jan  2 08:57:28 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sat, 2 Jan 2010 08:57:28 +0000
Subject: [Bioperl-l] Downloading from dbEST by taxon range
Message-ID: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>

Hi all and happy 2010 for those that follow the Gregorian calendar,

A question that is a bit in between bioperl and NCBI. I would like to use
bioperl to download sequences fom dbEST. For that, my idea is to use
Bio::DB::Genbank and get the sequences by gi id.

Now, I want my script to download sequences for a given NCBI taxonomy clade.

For example, if I want to download all fish (clupeocephala) sequences in dbEST,
I can browse it around with the dbEST webpage using "clupeocephala[taxonomy]",
so I am thinking there should be a way to do it programmatically.

How can I query NCBI dbEST through bioperl to give me the list of GI ids I am
looking for given a taxon id?

Thanks in advance,

Albert.


From jason at bioperl.org  Sat Jan  2 16:35:22 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 2 Jan 2010 08:35:22 -0800
Subject: [Bioperl-l] Downloading from dbEST by taxon range
In-Reply-To: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
Message-ID: <D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>

DId you try Bio::DB::Query::GenBank ?
You'd want to use -db => 'nucest' and then you just put in an Entrez  
query as per the example.  you can include dates in the query so you  
can do updates to your locally retrieved data in a script that runs  
periodically.

-jason
On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:

> Hi all and happy 2010 for those that follow the Gregorian calendar,
>
> A question that is a bit in between bioperl and NCBI. I would like  
> to use
> bioperl to download sequences fom dbEST. For that, my idea is to use
> Bio::DB::Genbank and get the sequences by gi id.
>
> Now, I want my script to download sequences for a given NCBI  
> taxonomy clade.
>
> For example, if I want to download all fish (clupeocephala)  
> sequences in dbEST,
> I can browse it around with the dbEST webpage using  
> "clupeocephala[taxonomy]",
> so I am thinking there should be a way to do it programmatically.
>
> How can I query NCBI dbEST through bioperl to give me the list of GI  
> ids I am
> looking for given a taxon id?
>
> Thanks in advance,
>
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From avilella at gmail.com  Sun Jan  3 09:08:33 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 3 Jan 2010 09:08:33 +0000
Subject: [Bioperl-l] Downloading from dbEST by taxon range
In-Reply-To: <D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>
References: <358f4d651001020057g21c8497eia6559bfeac0b5544@mail.gmail.com>
	<D736401F-345B-43D0-B203-DB72D252355A@bioperl.org>
Message-ID: <358f4d651001030108p6a92fb27k5fa39be6bebb3a9c@mail.gmail.com>

Thanks Jason!
For the sake of completion, here is the script I needed:

---------------------
#!/usr/bin/perl
use strict;

use Bio::SeqIO;
use Bio::DB::Taxonomy;
use Bio::DB::Query::GenBank;
use Bio::DB::GenBank;
use Bio::SeqIO;
use Getopt::Long;

my $keyword_type = 'EST';
my $outdir = '.';
my $taxon_name = undef;
my $db_type = 'nucest';

GetOptions('keyword_type:s' => \$keyword_type,
           't|taxon_name:s' => \$taxon_name,
           'db_type:s' => \$db_type,
           'outdir:s' => \$outdir);

my $query_string = $taxon_name ."[Organism] AND ". $keyword_type ."[Keyword]";
my $db = Bio::DB::Query::GenBank->new
  (-db => $db_type,
   -query => $query_string,
   -mindate => '2007',
   -maxdate => '2010');

my $taxon_name_string = $taxon_name; $taxon_name_string =~ s/\ /\_/g;
my $outfile = $outdir . "/" . $taxon_name_string . ".". $db_type . ".fasta";
my $out = Bio::SeqIO->new(-file => ">$outfile", -format => 'fasta');

print $db->count,"\n";
my $gb = Bio::DB::GenBank->new();
my $stream = $gb->get_Stream_by_query($db);
while (my $seq = $stream->next_seq) {
  # Filtering reads shorter than 800
  next unless (length($seq->seq) > 800);
  $out->write_seq($seq);
}
$out->close;
---------------------

On Sat, Jan 2, 2010 at 4:35 PM, Jason Stajich <jason at bioperl.org> wrote:
> DId you try Bio::DB::Query::GenBank ?
> You'd want to use -db => 'nucest' and then you just put in an Entrez query
> as per the example. ?you can include dates in the query so you can do
> updates to your locally retrieved data in a script that runs periodically.
>
> -jason
> On Jan 2, 2010, at 12:57 AM, Albert Vilella wrote:
>
>> Hi all and happy 2010 for those that follow the Gregorian calendar,
>>
>> A question that is a bit in between bioperl and NCBI. I would like to use
>> bioperl to download sequences fom dbEST. For that, my idea is to use
>> Bio::DB::Genbank and get the sequences by gi id.
>>
>> Now, I want my script to download sequences for a given NCBI taxonomy
>> clade.
>>
>> For example, if I want to download all fish (clupeocephala) sequences in
>> dbEST,
>> I can browse it around with the dbEST webpage using
>> "clupeocephala[taxonomy]",
>> so I am thinking there should be a way to do it programmatically.
>>
>> How can I query NCBI dbEST through bioperl to give me the list of GI ids I
>> am
>> looking for given a taxon id?
>>
>> Thanks in advance,
>>
>> Albert.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>
>


From Jean-Marc.Frigerio at pierroton.inra.fr  Mon Jan  4 14:12:18 2010
From: Jean-Marc.Frigerio at pierroton.inra.fr (Jean-Marc Frigerio INRA)
Date: Mon, 04 Jan 2010 15:12:18 +0100
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
Message-ID: <4B41F742.2030209@pierroton.inra.fr>

> Message: 1
> Date: Thu, 31 Dec 2009 11:26:45 +1800
> From: Peng Yu <pengyu.ut at gmail.com>
> Subject: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: bioperl-l at lists.open-bio.org
> Message-ID:
> 	<366c6f340912300926k5af5cc88nc3c3babda541fd1 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> With Bio::SeqIO, I can only read in the records in a fasta file one by
> one. This is preferable if there are many records in a file.
> 
> But I also want to read all the records in. I could use a while loop
> to read all records in. But could somebody let me know if there is a
> function in bioperl that can read in all the record at once and return
> me an object?
> 
> http://www.bioperl.org/wiki/HOWTO:SeqIO
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Wed, 30 Dec 2009 13:04:53 -0500
> From: Sean Davis <sdavis2 at mail.nih.gov>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: "bioperl-l at lists.open-bio.org" <bioperl-l at lists.open-bio.org>
> Message-ID:
> 	<264855a00912301004t396e0d4fwf9d291c5d82c3fb9 at mail.gmail.com>
> Content-Type: text/plain; charset=UTF-8
> 
> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
> 
> In perl, you can use an array to store the records.  You could also
> use a hash if you have reasonable keys for the entries.
> 
> Sean
> 
> 
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Wed, 30 Dec 2009 11:58:54 -0800
> From: Jason Stajich <jason at bioperl.org>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: Peng Yu <pengyu.ut at gmail.com>
> Cc: BioPerl List <bioperl-l at lists.open-bio.org>
> Message-ID: <3550F192-111F-48A7-B1B7-113FFFAC105B at bioperl.org>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
> 
> or use a database object so you can retrieve sequences that have a  
> particular id. See Bio::DB::Fasta
> On Dec 30, 2009, at 10:04 AM, Sean Davis wrote:
> 
>> On Wed, Dec 30, 2009 at 12:26 PM, Peng Yu <pengyu.ut at gmail.com> wrote:
>>> With Bio::SeqIO, I can only read in the records in a fasta file one  
>>> by
>>> one. This is preferable if there are many records in a file.
>>>
>>> But I also want to read all the records in. I could use a while loop
>>> to read all records in. But could somebody let me know if there is a
>>> function in bioperl that can read in all the record at once and  
>>> return
>>> me an object?
>> In perl, you can use an array to store the records.  You could also
>> use a hash if you have reasonable keys for the entries.
>>
>> Sean
>>
>>
>>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 
> 
> 
> ------------------------------
> 
> Message: 4
> Date: Wed, 30 Dec 2009 16:20:31 -0500
> From: "Mark A. Jensen" <maj at fortinbras.us>
> Subject: Re: [Bioperl-l] How to read in the whole fasta file in the
> 	memory?
> To: "Peng Yu" <pengyu.ut at gmail.com>, <bioperl-l at lists.open-bio.org>
> Message-ID: <2646F627E6D14AADB412A6E6B51E24DA at NewLife>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> 	reply-type=original
> 
> I think you might want Bio::AlignIO:
> 
> $alnio = Bio::AlignIO->new(-file=> 'my.fas' );
> $aln = $alnio->next_aln;
> @seqs = $aln->each_seqs;
> 
> MAJ
> ----- Original Message ----- 
> From: "Peng Yu" <pengyu.ut at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, December 30, 2009 12:26 PM
> Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
> 
> 
>> With Bio::SeqIO, I can only read in the records in a fasta file one by
>> one. This is preferable if there are many records in a file.
>>
>> But I also want to read all the records in. I could use a while loop
>> to read all records in. But could somebody let me know if there is a
>> function in bioperl that can read in all the record at once and return
>> me an object?
>>
>> http://www.bioperl.org/wiki/HOWTO:SeqIO
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


Hi,

I wrote and currently use a module I named Bio::SeqIO::multifasta, which 
is basically a copy of Bio::SeqIO::fasta plus a few methods:
get_by_id(), get_by_order(), first_seq() and previous_seq()

It would need review, validation etc. Do I submit it to Bugzilla ?

	-- jmf


From jason at bioperl.org  Mon Jan  4 16:03:45 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 4 Jan 2010 08:03:45 -0800
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <4B41F742.2030209@pierroton.inra.fr>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
	<4B41F742.2030209@pierroton.inra.fr>
Message-ID: <16D7C8C1-E4BE-406F-9D60-379876178CAB@bioperl.org>

We typically think of SeqIO as parsing a stream of data, not being  
reliant on it being a file which is what these methods would be  
implying I think. Sounds a lot like a database - does Bio::DB::Fasta  
not provide some of the functionality you need by these methods?  I  
realize there isn't a by_order() but the get_by_id() is implemented to  
allow random access.

-jason

>
> Hi,
>
> I wrote and currently use a module I named Bio::SeqIO::multifasta,  
> which is basically a copy of Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
>
> It would need review, validation etc. Do I submit it to Bugzilla ?
>
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From avilella at gmail.com  Mon Jan  4 20:00:24 2010
From: avilella at gmail.com (Albert Vilella)
Date: Mon, 4 Jan 2010 20:00:24 +0000
Subject: [Bioperl-l] indexed fastq files
Message-ID: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>

Hi all,

What is the best way to index fastq files, so that once clustered, I
can provide a list of seq_ids and get
them back in fastq format from the indexed db?

Cheers,

Albert.


From cjfields at illinois.edu  Mon Jan  4 21:59:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 4 Jan 2010 15:59:50 -0600
Subject: [Bioperl-l] indexed fastq files
In-Reply-To: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>
References: <358f4d651001041200m1f715b83k743f4d2ee6b6312b@mail.gmail.com>
Message-ID: <07EBA105-6A34-490C-B0B9-4772DF386CBA@illinois.edu>

Bio::Index::Fastq, maybe?  To tell the truth, I haven't tried it since we refactored FASTQ parsing, so let us know if it doesn't work.

chris

On Jan 4, 2010, at 2:00 PM, Albert Vilella wrote:

> Hi all,
> 
> What is the best way to index fastq files, so that once clustered, I
> can provide a list of seq_ids and get
> them back in fastq format from the indexed db?
> 
> Cheers,
> 
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan  5 03:54:03 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 4 Jan 2010 21:54:03 -0600
Subject: [Bioperl-l] How to read in the whole fasta file in the memory?
In-Reply-To: <4B41F742.2030209@pierroton.inra.fr>
References: <mailman.15.1262278806.16038.bioperl-l@lists.open-bio.org>
	<4B41F742.2030209@pierroton.inra.fr>
Message-ID: <1BAE5508-0DB7-41B4-92E3-49256582131F@illinois.edu>

Jean-Marc,

You can do that, yes.  Just curious, but have you looked at the various flat file indexing modules for FASTA?  Bio::DB::Fasta and Bio::Index::Fasta are commonly used and allow lookups by primary ID (and I think in some cases secondary IDs).

chris

On Jan 4, 2010, at 8:12 AM, Jean-Marc Frigerio INRA wrote:

> ...
> 
> Hi,
> 
> I wrote and currently use a module I named Bio::SeqIO::multifasta, which is basically a copy of Bio::SeqIO::fasta plus a few methods:
> get_by_id(), get_by_order(), first_seq() and previous_seq()
> 
> It would need review, validation etc. Do I submit it to Bugzilla ?
> 
> 	-- jmf
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From fs5 at sanger.ac.uk  Wed Jan  6 22:16:13 2010
From: fs5 at sanger.ac.uk (Frank Schwach)
Date: Wed, 06 Jan 2010 22:16:13 +0000
Subject: [Bioperl-l] Bio::DB::Sam strange behaviour for read pairs
Message-ID: <4B450BAD.3050807@sanger.ac.uk>

I'm trying to extract paired reads from a BAM file that span a given 
region. I would then like to get the two read ends of the sequenced 
clone that spans the region.
I use Bio::DB::Sam->get_features_by_location for this and it does give 
me the correct read pairs as a region match but it doesn't give me both 
read pairs in all cases.

Here is the script:

#!/usr/bin/perl
use Bio::DB::Sam;

my $usage = "usage: $0 BAMFILE CHROMOSOME STARTPOS ENDPOS\n" ;
my ($bam_file,$chrom,$start,$end) = @ARGV ;
die $usage unless $bam_file && $chrom && $start && $end;

my $bam = Bio::DB::Sam->new(-bam => $bam_file);

my @pairs = $bam->get_features_by_location(
    -type   => 'read_pair',
    -seq_id => $chrom,
    -start  => $start,
    -end    => $end);

print "region: $chrom:$start..$end\n" ;
foreach my $pair (@pairs) {
  print "  pair: id: ".$pair->id.", start".$pair->start.', 
end:'.$pair->end."\n";
  my ($first_mate,$second_mate) = $pair->get_SeqFeatures;
  print "    first_mate: start:".$first_mate->start.', 
end:'.$first_mate->end."\n";
  if ($second_mate){
    print "    second_mate: start:".$second_mate->start.', 
end:'.$second_mate->end."\n";
  } else {
    print "    no second mate\n";
  }
}

And here are the matching pairs that it produces with one of my files 
for the region tal12:22479..29232:
region: 
tal12:22479..29232                                                                                                                          

  pair: id: tal-2446c08, start17496, 
end:29423                                                                                                      

    first_mate: start:28540, 
end:29423                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2463d10, start23534, 
end:31363                                                                                                      

    first_mate: start:23534, 
end:24448                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2371c09, start20860, 
end:28230                                                                                                      

    first_mate: start:27604, 
end:28230                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2440b06, start19232, 
end:27099                                                                                                      

    first_mate: start:26025, 
end:27099                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2327g09, start18909, 
end:26129                                                                                                      

    first_mate: start:25354, 
end:26129                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2381b05, start25658, 
end:35054                                                                                                      

    first_mate: start:25658, 
end:26295                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2377c11, start20898, 
end:28230                                                                                                      

    first_mate: start:27473, 
end:28230                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2426e12, start21975, 
end:27562                                                                                                      

    first_mate: start:21975, 
end:23008                                                                                                               

    second_mate: start:26396, 
end:27562                                                                                                              

  pair: id: tal-2365h10, start22843, 
end:31944                                                                                                      

    first_mate: start:22843, 
end:23184                                                                                                               

    no second 
mate                                                                                                                                   

  pair: id: tal-2388h09, start19016, 
end:28238                                                                                                      

    first_mate: start:27475, 
end:28238                                                                                                               

    no second mate                   


So it finds a lot of pairs that span the region and the start/end from 
the pair is also correct but it only gives me both individual mates in 
one case:
  pair: id: tal-2426e12, start21975, 
end:27562                                                                                                      

    first_mate: start:21975, 
end:23008                                                                                                               

    second_mate: start:26396, end:27562

In this case, both pairs are actually inside the query region (at least 
partially) whereas in the other cases, one of the mates is not inside, 
e.g. this one:

  pair: id: tal-2388h09, start19016, 
end:28238                                                                                                      

    first_mate: start:27475, 
end:28238                                                                                                               

    no second mate
  
 > get this read pair from the BAM file:
$ samtools view clones.bam | grep tal-2388h09

tal-2388h09    99      tal12  19016   205     
36H9M1D14M1D664M1D16M1D21M1D28M1D15M1D10M1D12M1D7M1D8M1D5M      =       
27475   9223    
CTTTGGATGAAATAGTTTTTAAATAATACTTATTAAATATTAAATATATAACACATAAATAAGTATTGATGCAAATTTTAAAGTATTATAGAAAACTAGGTTTGATTATATTGTTATACTGTACTTTAAGAGGAGAGAGATAAGATATCTTTGCTCTTTTAATATATAAATTTAGATAAATATTCGTTAAATTTTCTACATAGTTATTTTTTATCTTATATATTATACTGCTATAGTTATCAATGTATATACATTCAAATAATTTATTAAAAATTCTATATTATATTAATTCTATGATAAAATAATCCTGTTTGTGATTTAAAAAATGATGATTCAATAAAAACTAATAATATAATACGAGTTAATATGGAATAATAAAATGGCATTTAACATGAATTTAGTCTTTAACCTTTTCTTTGTTTGTCAAGTTTTTTAAAACATAAAACCACACATTTCAAAATGGATTTTTAGCAAATATATAAAAATTATACATTTATAATGTATTGTTATGCGTCTTTTCGATAGAATCAATATTTAATTATATGAAGTTTCCACAATAAAATAATATTTAATATTATTTATTAGTAGAGTATTTGATTATATATATAGGCATATAATAATAACTCTAGTTCTATCTACCATATTATTTATAATTATTATAACAAAATGTGATATGAAATTTTATTATATACTTATATTATTTTTTTAACTATTTTAAAATATATTTATTTATACCTCAAAACTATAAAATTGAAATTATTAATAATAATCTAATATATACCTTTATAAAAATAAACGTATAAACTAAT   
 ><:4/+1+*)+4>BEH=9-,,66IIIIIIIIEDA>>>>A at DDFFIHHHHHITIIIIIHIIHHHHHHIYYYYYTTTYDDDHDDDDDDDIIINNTNHHHHHIYYYYIIIIIINNNNTTIIIIIIIIIIITTNTTTTTYYYTTTTTTYYNNNNNNLLLLLLLLLLLNNNNNNTTTTTTTTTTTTTNNTNNNTTTYYTLLLLLLTTTTTTTTYTTTNNNNNNTTTTTTTNNTTNNTTTTTTTTTTYYTTTTTTTNNNNNNTTTTTTTYYTTTTTTYYYTTTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTYYYYYYYYYYYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTTTYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYTTNNINTTYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYTTTTTOOOIFFFIFIIOICC>>II@>>>>>>C>>>>>>CIBECCCHIIOOOOOOOOTTTIIFDDEIQQA:55839AA>99>@IIIIII>>::;;I;>>CC>>>>>@III<::=>AAA<>>>>I>:>>99:>842225006824855;5>68844//.//00:>::338:99<:/-+*-./0)((((+00+..,++(((+-()(*((((()*)***))3)''')*..+*++((*1++--+*''''((+/)*42.((***)+,+('*'''*((''''((,'%%''''''''(     
AS:i:614        MS:i:50
tal-2388h09    147     tal12  27475   205     1H764M40H       =       
19016   -9223   
ATTAAATCGGTATCGCCAACACAATGAGTATAATCATTGTCAAATATGCGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATATTCATTGTCACATTCACGTTTGTAAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTGTAAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTCATTTTTATGAGTATAATCATTGGAACGTTCATTTTTATGAGTATAATCATTGGTACGTTCATTTTTGTGAGTATAATCATTGGTACGTTAATTTTTGTGAGTATAATCATTGGAACG  
(((0))*,-1-../2((())03---03266300271+*.-0-*''''+*,+/+))*-05330+)..4>7=77273911**((+20+03688633:93036<8;::5:<99379>>::>>>:57:<:7--)))1435::333228>::>II>::>A>>3/.958677AA=AA:>:==IIII8338<>A>>>>IIIIIIIIYYYYYKKYYYMIFFFFEIIIMI::4..8AIIC>9>=EIQQQMCAAAAAACIIIIAICIIIOOYTIIIMOQQMIIIIC>>AAABCCCCCEAI>C>>IQQIIIIIIIIIIKKYYYYYYYYYYYYYYYYYYYYYTIIIIIIYYYYTNINNNTYYYYYYYYYYYYYTTTTTYYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTYYYYYYYYYYYYYYYSSYYYYYYYTTTTTTYYYYYYYYYYYYYYYYYYYYYYYYTTTTTTTTTTTTTTTTTTYYYYYTTTTTTYYYTTNNNNTTYYYYYYYYYTTTTLLTTNNTTTTYTTTTTTYYYYYYYYYYYTTOOKKKLKOOTYYYYYYYYYYYYYYYYTNNNNNNNNNTTTNYNNNNTNNNNTTYYYYYYYYTTNNNNTTYNNNNNITTTTTYYYYYYYYYYTTNNIIIIIDIIIIHTNNNNTTYYYYTNNNIIIIIITTTINIIIINNNNTTTYYYYIHHHDDHHDDIHDDGDFFFTIIINTTYYYYTTTTHHHHCCIIIHIHHHHCAI9:++**1168>ACCIIDDDDDDI>>>>>?NNN  
AS:i:688        MS:i:50

So the read in the first line starts before the start of the query 
region and is not accessible via $pair->get_SeqFeatures although this is 
a valid pair.
Am I doing something wrong, is this the desired behaviour or is it a bug?

Thanks for your help!


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From hlapp at drycafe.net  Thu Jan  7 16:55:00 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Thu, 7 Jan 2010 11:55:00 -0500
Subject: [Bioperl-l] Data missing into Annotation object using
	Bio::SeqIO (Genbank)
In-Reply-To: <29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
References: <4B28EB44.3080006@pasteur.fr>
	<29CB0088-99C1-417E-BB3B-56FE7EC135F9@illinois.edu>
Message-ID: <240F198A-83FA-4304-ACA8-80A702A68D8C@drycafe.net>

I don't know to what extent this was followed up on further and I  
guess it's too long ago to be of much help, but if it hasn't been  
mentioned before I wanted to point out  
Bio::SeqFeature::AnnotationAdaptor which integrates tag/value  
annotation and Bio::Annotation annotation into one  
AnnotationCollection, so it doesn't matter whether something is  
attached as a tag or as an annotation object.

	-hilmar

On Dec 16, 2009, at 10:09 AM, Chris Fields wrote:

> Emmanuel,
>
> The previous behavior in the 1.5.x series was to store feature tags  
> as Bio::Annotation.  The problem had been the way this was  
> implemented was considered unsatisfactory for various reasons, so we  
> reverted back to using simple tag-value pairs as the default.  You  
> can get at the data this way (from the Feature/Annotation HOWTO):
>
> for my $feat_object ($seq_object->get_SeqFeatures) {
>    print "primary tag: ", $feat_object->primary_tag, "\n";
>    for my $tag ($feat_object->get_all_tags) {
>        print "  tag: ", $tag, "\n";
>        for my $value ($feat_object->get_tag_values($tag)) {
>            print "    value: ", $value, "\n";
>        }
>    }
> }
>
> You can also convert all the tag-value data into a  
> Bio::Annotation::Collection using the  
> Bio::SeqFeature::AnnotationAdaptor, but this is completely optional.
>
> chris
>
> On Dec 16, 2009, at 8:14 AM, Emmanuel Quevillon wrote:
>
>> Hi,
>>
>> I've wrote a small Genbank parser few months ago before BioPerl  
>> release 1.6.0.
>> I tried to use my code once again but now the output of my parser  
>> is empty.
>> It looks like Annotation from seqfeatures is not filled anymore.
>>
>> Here is the code I used previously:
>>
>> while(my $seq = $streamer->next_seq()){
>>
>>   #We only want to retrieve CDS features...
>>   foreach my $feat (grep { $_->primary_tag() eq 'CDS' } $seq- 
>> >get_SeqFeatures()){
>>       print $ofh join("#",
>>                       $feat->annotation()- 
>> >get_Annotations('locus_tag'),    # Acc num
>>                       $feat->annotation()->get_Annotations('gene')
>>                         ? $feat->annotation()- 
>> >get_Annotations('gene')      # Gene name
>>                         : $feat->annotation()- 
>> >get_Annotations('locus_tag'),
>>                       $feat->annotation()- 
>> >get_Annotations('product'),      # Description
>>                      ),"\n";
>>   }
>> }
>>
>> $feat is a Bio::SeqFeature::Generic object
>>
>> If I print Dumper($feat->annotation()) here is the output :
>>
>> $VAR1 = bless( {
>>                '_typemap' => bless( {
>>                                       '_type' => {
>>                                                    'comment' =>  
>> 'Bio::Annotation::Comment',
>>                                                    'reference' =>  
>> 'Bio::Annotation::Reference',
>>                                                    'dblink' =>  
>> 'Bio::Annotation::DBLink'
>>                                                  }
>>                                     },  
>> 'Bio::Annotation::TypeManager' ),
>>                '_annotation' => {}
>>              }, 'Bio::Annotation::Collection' );
>>
>> Have some changes been made into the way annotation object is  
>> populated?
>>
>> Thanks for any clue and sorry if my question look stupid
>>
>> Regards
>>
>> Emmanuel
>>
>> -- 
>> -------------------------
>> Emmanuel Quevillon
>> Biological Software and Databases Group
>> Institut Pasteur
>> +33 1 44 38 95 98
>> tuco at_ pasteur dot fr
>> -------------------------
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From rtbio.2009 at gmail.com  Fri Jan  8 15:00:21 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 8 Jan 2010 16:00:21 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
Message-ID: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>

Hello all,

I was trying Remote blast using Bioperl. My input data is a Trypanosoma
brucei sequence in Fasta format. When I was trying to submit to BLAST using
the step
$r=$factory->submit_blast($input)
It was not returning anything which I checked by debugging the code. It is
not blasting my input sequence even though I mentioned all the parameters.I
would paste the code below.

Please help me in solving put this problem. It is very urgent.

Regards
Roopa.

#!/usr/bin/perl

#path for extra camel module
use lib "/srv/www/htdocs/rain/RNAi/";
use Roopablast;


use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds. Roopa";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";


open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
 This page will automatically reload in 30 seconds Roopa <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

#$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1;
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= 'Trypanosoma Brucei';

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

            # open(OUTFILE,'>',$debugfile);
             #  print OUTFILE @params;
             # close(OUTFILE);


 my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => 'Trypanosoma Brucei' );


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);    #The program stops here it
does not return any value and it does not enter the While loop,Please help
me in this regard.#
                open(OUTFILE,'>',$debugfile);
                print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
               print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
               print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
               print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
               print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);

}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

    print OUTFILE substr ($in{'Inputseq'}, $i, 1);

    if ( ($i+1)%10==0){
        print OUTFILE " ";
    }
    if ( ($i+1)%60==0){
        print OUTFILE "<br>\n";
    }
}


print OUTFILE "</font> <p>";

$z=@compseqs;

for($k=1;$k<$z;$k++) {
    print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
Sequence: <br>";

    for ($i=0; $i<length ($compseqs[$k]); $i++) {

        print OUTFILE substr ($compseqs[$k], $i, 1);

        if ( ($i+1)%10==0){
            print OUTFILE " ";
        }
        if ( ($i+1)%60==0){
            print OUTFILE "<br>\n";
        }
    }
    print OUTFILE "<p></font>";
}

print OUTFILE "<p>
Window: <br>$in{'Windowsize'}
<p>
<p>
Threshold: <br>$in{'Threshold'}
<p>";
my $j=0;

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

    if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
        if ($out[$i]->{similar}<=$in{'Threshold'}){
            $j=$in{'Windowsize'};
        }
        $height=$out[$i]->{similar}*5;
    }

    if ($j>0) {
        print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
height=\"5\">";
        $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
1)."</font>";
        $j--;
    }
    else {
        print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
height=\"5\">";
        $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
1)."</font>";
    }

    if ( ($i+1)%10==0){
        $outstring .= " ";
    }
    if ( ($i+1)%60==0){
        $outstring .= "<br>\n";

    }
    if ( ($i+1)%800==0){
        print OUTFILE "<br><br>\n";

    }
}

print OUTFILE "<br><br><font face=\"Courier, monospace font
set\">$outstring</font>";

#foreach (@out) {
#print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
#if ($_->{similar}<=$in{'Threshold'}){

#    }
#}

print OUTFILE "</BODY>\n</HTML>\n";

close OUTFILE;

#nameprint();

sub parse_form {
    local ($buffer, @pairs, $pair, $name, $value);
    # Read in text
    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
    if ($ENV{'REQUEST_METHOD'} eq "POST")
    {
        read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    }
    else
    {
        $buffer = $ENV{'QUERY_STRING'};
    }
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs)
    {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $in{$name} = $value;
    }
}


From maj at fortinbras.us  Fri Jan  8 15:36:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 8 Jan 2010 10:36:41 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
Message-ID: <F19004692A4A4350856B23DF25E09074@NewLife>

Hi Roopa--

I got your code to work with the following changes:

+# the input should be a valid FASTA file...
 ...
 open(NUC,'>',$nuc);
+print NUC ">seq (need a name line for valid fasta)\n";
 print NUC $inpu1, "\n";
 close(NUC);
...

+# you can set these header parms in the call itself...
- my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
+ my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY => 
''Trypanosoma Brucei[ORGN]');

  #change a paramter
+# commented this out...
+# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma 
Brucei[ORGN]';

MAJ
----- Original Message ----- 
From: "Roopa Raghuveer" <rtbio.2009 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 08, 2010 10:00 AM
Subject: [Bioperl-l] Regarding blast in Bioperl


> Hello all,
>
> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
> brucei sequence in Fasta format. When I was trying to submit to BLAST using
> the step
> $r=$factory->submit_blast($input)
> It was not returning anything which I checked by debugging the code. It is
> not blasting my input sequence even though I mentioned all the parameters.I
> would paste the code below.
>
> Please help me in solving put this problem. It is very urgent.
>
> Regards
> Roopa.
>
> #!/usr/bin/perl
>
> #path for extra camel module
> use lib "/srv/www/htdocs/rain/RNAi/";
> use Roopablast;
>
>
> use Bio::SearchIO;
> use Bio::Search::Result::BlastResult;
> use Bio::Perl;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
>
> $serverpath = "/srv/www/htdocs/rain/RNAi";
> $serverurl = "http://141.84.66.66/rain/RNAi";
> $outfile = $serverpath."/rnairesult_".time().".html";
> $nuc = $serverpath."/nuc".time().".txt";
> $debugfile = $serverpath."/debug_".time().".txt";
> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>
> my $outstring ="";
>
> &parse_form;
>
> print "Content-type: text/html\n\n";
> print "<HTML>\n";
> print "<head><title>RNAi Result</title>";
> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl/rnairesult_".time().".html\"> \n";
> print "</head>\n";
> print "<body>\n";
> print " Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>";
> print " Please be patient, runtime can be up to 5 minutes<br>";
> print " This page will automatically reload in 30 seconds. Roopa";
> print "</BODY>\n";
> print "</HTML>\n";
>
> defined(my $pid = fork) or die "Can't fork: $!";
> exit if $pid;
> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>
>
>
> open(OUTFILE, '>',$outfile);
>
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl//rnairesult_".time().".html\"> \n
> <meta http-equiv=\"expires\" content=\"0\">
> </head>\n
> <body>\n
>  Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>
>  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
> This page will automatically reload in 30 seconds Roopa <br>
> </BODY>\n
> </HTML>\n";
>
> close(OUTFILE);
>
>
> @compseqs = blastcode($in{'Inputseq'});
>
> $in{'Inputseq'} =~ s/>.*$//m;
> $in{'Inputseq'} =~ s/[^TAGC]//gim;
> $in{'Inputseq'} =~ tr/actg/ACTG/;
>
> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
> $in{'Threshold'});
>
>
> sub blastcode
> {
>
> $inpu1= $_[0];
>
> #$organ= $_[1];
>
> open(NUC,'>',$nuc);
> print NUC $inpu1;
> close(NUC);
>
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= 'Trypanosoma Brucei';
>
> $gb = new Bio::DB::GenBank;
>
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>         '-Organism'   => $organism );
>
>            # open(OUTFILE,'>',$debugfile);
>             #  print OUTFILE @params;
>             # close(OUTFILE);
>
>
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>
>  #change a paramter
>
> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
>
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>
>  my $v = 1;
>  #$v is just to turn on and off the messages
>
> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => 'Trypanosoma Brucei' );
>
>
> while (my $input = $str->next_seq())
> {
>   #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
>
>             open(OUTFILE,'>',$debugfile);
>               print OUTFILE $input;
>              close(OUTFILE);
>
>
>   my $r = $factory->submit_blast($input);    #The program stops here it
> does not return any value and it does not enter the While loop,Please help
> me in this regard.#
>                open(OUTFILE,'>',$debugfile);
>                print OUTFILE $r;
>                close(OUTFILE);
>
>
>   print STDERR "waiting...." if($v>0);
>
>  while ( my @rids = $factory->each_rid ) {
>      open(OUTFILE,'>',$debugfile);
>               print OUTFILE "while entered";
>              close(OUTFILE);
>     foreach my $rid ( @rids ) {
>
>               open(OUTFILE,'>',$debugfile);
>               print OUTFILE "foreach entered";
>              close(OUTFILE);
>
>        my $rc = $factory->retrieve_blast($rid);
>
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         open(OUTFILE,'>',$debugfile);
>               print OUTFILE "if entered";
>              close(OUTFILE);
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>              open(OUTFILE,'>',$debugfile);
>               print OUTFILE "else entered";
>              close(OUTFILE);
>
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>
>          open(BLASTDEBUGFILE,'>',$blastdebugfile);
>          print BLASTDEBUGFILE $result->next_hit();
>          close(BLASTDEBUGFILE);
>
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
>
>         $factory->save_output($filename);
>
>       # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
>
>       $factory->remove_rid($rid);
>
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
>
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
>
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
>
>   while ( my $hit = $result->next_hit ) {
>
>            next unless ( $v > 0);
>
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
>
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
>
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
>
> return(@seqs);
>
> }
>
> open(OUTFILE, '>',$outfile) || die ;
>
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <meta http-equiv=\"expires\" content=\"0\"></head>\n
> <body>\n
> <p><font face=\"Courier, monospace font set\">
> Inputsequence: <br>";
>
> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>
>    print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>
>    if ( ($i+1)%10==0){
>        print OUTFILE " ";
>    }
>    if ( ($i+1)%60==0){
>        print OUTFILE "<br>\n";
>    }
> }
>
>
>
> print OUTFILE "</font> <p>";
>
> $z=@compseqs;
>
> for($k=1;$k<$z;$k++) {
>    print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
> Sequence: <br>";
>
>    for ($i=0; $i<length ($compseqs[$k]); $i++) {
>
>        print OUTFILE substr ($compseqs[$k], $i, 1);
>
>        if ( ($i+1)%10==0){
>            print OUTFILE " ";
>        }
>        if ( ($i+1)%60==0){
>            print OUTFILE "<br>\n";
>        }
>    }
>    print OUTFILE "<p></font>";
> }
>
> print OUTFILE "<p>
> Window: <br>$in{'Windowsize'}
> <p>
> <p>
> Threshold: <br>$in{'Threshold'}
> <p>";
> my $j=0;
>
> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>
>    if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>        if ($out[$i]->{similar}<=$in{'Threshold'}){
>            $j=$in{'Windowsize'};
>        }
>        $height=$out[$i]->{similar}*5;
>    }
>
>    if ($j>0) {
>        print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
> height=\"5\">";
>        $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
> 1)."</font>";
>        $j--;
>    }
>    else {
>        print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
> height=\"5\">";
>        $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
> 1)."</font>";
>    }
>
>    if ( ($i+1)%10==0){
>        $outstring .= " ";
>    }
>    if ( ($i+1)%60==0){
>        $outstring .= "<br>\n";
>
>    }
>    if ( ($i+1)%800==0){
>        print OUTFILE "<br><br>\n";
>
>    }
> }
>
> print OUTFILE "<br><br><font face=\"Courier, monospace font
> set\">$outstring</font>";
>
> #foreach (@out) {
> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
> #if ($_->{similar}<=$in{'Threshold'}){
>
> #    }
> #}
>
> print OUTFILE "</BODY>\n</HTML>\n";
>
> close OUTFILE;
>
> #nameprint();
>
> sub parse_form {
>    local ($buffer, @pairs, $pair, $name, $value);
>    # Read in text
>    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>    if ($ENV{'REQUEST_METHOD'} eq "POST")
>    {
>        read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>    }
>    else
>    {
>        $buffer = $ENV{'QUERY_STRING'};
>    }
>    @pairs = split(/&/, $buffer);
>    foreach $pair (@pairs)
>    {
>        ($name, $value) = split(/=/, $pair);
>        $value =~ tr/+/ /;
>        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>        $in{$name} = $value;
>    }
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From julian.onions at gmail.com  Fri Jan  8 16:53:50 2010
From: julian.onions at gmail.com (Julian Onions)
Date: Fri, 8 Jan 2010 16:53:50 +0000
Subject: [Bioperl-l] Cladogram construction
Message-ID: <cbeabfd41001080853m50c75779q4155cd02af17670a@mail.gmail.com>

Does anyone have any sample code for building cladograms based on Pars (one
of Phylip tools) type format (or any other format actually)
I've got something sort of working but I get no weights on the tree -
everything appears as nan. I'd also like to set one of the species to be an
outgroup. This is the closest sample I've found so far.


#!/usr/bin/perl -w
use strict;
use Bio::AlignIO;
use Bio::Tree::DistanceFactory;
use Bio::Align::ProteinStatistics;
use Bio::TreeIO;
use Bio::Tree::Draw::Cladogram;
my $alnfile = shift @ARGV || die "need a file to run";

my $input= Bio::AlignIO->new(-format => 'fasta',
    -file    => $alnfile);

if( my $aln = $input->next_aln ) {
 my $dfactory = Bio::Tree::DistanceFactory->new(-method => 'NJ');
 my $stats = Bio::Align::ProteinStatistics->new;
 my $distmat = $stats->distance(-align => $aln,
         -method => 'Kimura');
 my $treeout = Bio::TreeIO->new(-format => 'newick');
 my $tree = $dfactory->make_tree($distmat);
 $treeout->write_tree($tree);
  my $obj1 = Bio::Tree::Draw::Cladogram->new(-tree    => $tree,
                                             -compact => 0);
  $obj1->print(-file => "tree.eps");
} else {
 die "could not find any alignments in the file $alnfile";
}


Pars input looks like
3 4
Robin   101
Blackbird 100
Sparrow 100


Thanks,
Julian.


From rtbio.2009 at gmail.com  Sat Jan  9 16:57:09 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Sat, 9 Jan 2010 17:57:09 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <F19004692A4A4350856B23DF25E09074@NewLife>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<F19004692A4A4350856B23DF25E09074@NewLife>
Message-ID: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>

Hello all,

Thanks alot for your reply Mark. It was working for Trypanosoma brucei as
the organism parameter,but when I tried to use the Organism parameter from
the user,it was not working i.e., I was unable to get the target sequences.
Please help me in this regard. My code is

#!/usr/bin/perl

#path for extra camel module
use lib "/srv/www/htdocs/rain/RNAi/";
use Roopablast;


use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds. Roopa";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";

open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
 This page will automatically reload in 30 seconds Roopa <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'},$in{'Organism'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1,"\n";
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
        '-Organism'   => $organism );

             open(OUTFILE,'>',$debugfile);
             print OUTFILE $inpu1;
              close(OUTFILE);


my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
'$organ[ORGN]');

 #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => $organ );


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             #open(OUTFILE,'>',$debugfile);
              # print OUTFILE $input;
              #close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);

   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
   #   open(OUTFILE,'>',$debugfile);
    #           print OUTFILE "while entered";
     #         close(OUTFILE);
     foreach my $rid ( @rids ) {

      #         open(OUTFILE,'>',$debugfile);
       #        print OUTFILE "foreach entered";
        #      close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
         #      print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          #    open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "else entered";
            #  close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename =
$serverpath."/blastdata_".time().$result->query_name()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);
  # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v > 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);

       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq();        # get the sequence as a string
                  push(@seqs,$dna);
          }
        }
      }
    }
  }

  #open(OUTFILE,'>',$debugfile);
  #print OUTFILE $seqs[0];
  #close(OUTFILE);

return(@seqs);

}

Regards,
Roopa.


On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> Hi Roopa--
>
> I got your code to work with the following changes:
>
> +# the input should be a valid FASTA file...
> ...
> open(NUC,'>',$nuc);
> +print NUC ">seq (need a name line for valid fasta)\n";
> print NUC $inpu1, "\n";
> close(NUC);
> ...
>
> +# you can set these header parms in the call itself...
> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
> ''Trypanosoma Brucei[ORGN]');
>
>  #change a paramter
> +# commented this out...
> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
>
> MAJ
> ----- Original Message ----- From: "Roopa Raghuveer" <rtbio.2009 at gmail.com
> >
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 08, 2010 10:00 AM
> Subject: [Bioperl-l] Regarding blast in Bioperl
>
>
>  Hello all,
>>
>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
>> brucei sequence in Fasta format. When I was trying to submit to BLAST
>> using
>> the step
>> $r=$factory->submit_blast($input)
>> It was not returning anything which I checked by debugging the code. It is
>> not blasting my input sequence even though I mentioned all the
>> parameters.I
>> would paste the code below.
>>
>> Please help me in solving put this problem. It is very urgent.
>>
>> Regards
>> Roopa.
>>
>> #!/usr/bin/perl
>>
>> #path for extra camel module
>> use lib "/srv/www/htdocs/rain/RNAi/";
>> use Roopablast;
>>
>>
>> use Bio::SearchIO;
>> use Bio::Search::Result::BlastResult;
>> use Bio::Perl;
>> use Bio::Tools::Run::RemoteBlast;
>> use Bio::Seq;
>> use Bio::SeqIO;
>> use Bio::DB::GenBank;
>>
>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>> $serverurl = "http://141.84.66.66/rain/RNAi";
>> $outfile = $serverpath."/rnairesult_".time().".html";
>> $nuc = $serverpath."/nuc".time().".txt";
>> $debugfile = $serverpath."/debug_".time().".txt";
>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>
>> my $outstring ="";
>>
>> &parse_form;
>>
>> print "Content-type: text/html\n\n";
>> print "<HTML>\n";
>> print "<head><title>RNAi Result</title>";
>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>> print "</head>\n";
>> print "<body>\n";
>> print " Your results will appear <a
>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>> print " Please be patient, runtime can be up to 5 minutes<br>";
>> print " This page will automatically reload in 30 seconds. Roopa";
>> print "</BODY>\n";
>> print "</HTML>\n";
>>
>> defined(my $pid = fork) or die "Can't fork: $!";
>> exit if $pid;
>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>
>>
>>
>> open(OUTFILE, '>',$outfile);
>>
>> print OUTFILE "<HTML>\n
>> <head><title>RNAi Result</title>
>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>> URL=$serverurl//rnairesult_".time().".html\"> \n
>> <meta http-equiv=\"expires\" content=\"0\">
>> </head>\n
>> <body>\n
>>  Your results will appear <a
>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>  Please be patient, runtime can be up to 5 minutes wait wait
>> wait......<br>
>> This page will automatically reload in 30 seconds Roopa <br>
>> </BODY>\n
>> </HTML>\n";
>>
>> close(OUTFILE);
>>
>>
>> @compseqs = blastcode($in{'Inputseq'});
>>
>> $in{'Inputseq'} =~ s/>.*$//m;
>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>
>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>> $in{'Threshold'});
>>
>>
>> sub blastcode
>> {
>>
>> $inpu1= $_[0];
>>
>> #$organ= $_[1];
>>
>> open(NUC,'>',$nuc);
>> print NUC $inpu1;
>> close(NUC);
>>
>> my $prog = 'blastn';
>> my $db   = 'refseq_rna';
>> my $e_val= '1e-10';
>> my $organism= 'Trypanosoma Brucei';
>>
>> $gb = new Bio::DB::GenBank;
>>
>> my @params = ( '-prog' => $prog,
>>        '-data' => $db,
>>        '-expect' => $e_val,
>>        '-readmethod' => 'SearchIO',
>>        '-Organism'   => $organism );
>>
>>           # open(OUTFILE,'>',$debugfile);
>>            #  print OUTFILE @params;
>>            # close(OUTFILE);
>>
>>
>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>
>>  #change a paramter
>>
>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>> Brucei[ORGN]';
>>
>> #change a paramter
>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>>
>>  my $v = 1;
>>  #$v is just to turn on and off the messages
>>
>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>> '-organism' => 'Trypanosoma Brucei' );
>>
>>
>> while (my $input = $str->next_seq())
>> {
>>  #Blast a sequence against a database:
>>   #Alternatively, you could  pass in a file with many
>>   #sequences rather than loop through sequence one at a time
>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>   #and swap the two lines below for an example of that.
>>
>>            open(OUTFILE,'>',$debugfile);
>>              print OUTFILE $input;
>>             close(OUTFILE);
>>
>>
>>  my $r = $factory->submit_blast($input);    #The program stops here it
>> does not return any value and it does not enter the While loop,Please help
>> me in this regard.#
>>               open(OUTFILE,'>',$debugfile);
>>               print OUTFILE $r;
>>               close(OUTFILE);
>>
>>
>>  print STDERR "waiting...." if($v>0);
>>
>>  while ( my @rids = $factory->each_rid ) {
>>     open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "while entered";
>>             close(OUTFILE);
>>    foreach my $rid ( @rids ) {
>>
>>              open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "foreach entered";
>>             close(OUTFILE);
>>
>>       my $rc = $factory->retrieve_blast($rid);
>>
>>       if( !ref($rc) )
>>       {
>>       if( $rc < 0 )
>>       {
>>       $factory->remove_rid($rid);
>>       }
>>        open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "if entered";
>>             close(OUTFILE);
>>        print STDERR "." if ( $v > 0 );
>>        sleep 5;
>>       }
>>      else {
>>             open(OUTFILE,'>',$debugfile);
>>              print OUTFILE "else entered";
>>             close(OUTFILE);
>>
>>         my $result = $rc->next_result();
>>        #save the output
>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>
>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>         print BLASTDEBUGFILE $result->next_hit();
>>         close(BLASTDEBUGFILE);
>>
>>       my $filename =
>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>
>>        # open(DEBUGFILE,'>',$debugfile);
>>        # open(new,'>',$filename);
>>        # @arra=<new>;
>>        # print DEBUGFILE @arra;
>>        # close(DEBUGFILE);
>>        # close(new);
>>
>>        $factory->save_output($filename);
>>
>>      # open(BLASTDEBUGFILE,'>',$debugfile);
>>      # print BLASTDEBUGFILE  "Hello $rid";
>>      # close(BLASTDEBUGFILE);
>>
>>      $factory->remove_rid($rid);
>>
>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>      print BLASTDEBUGFILE  $organism;
>>       close(BLASTDEBUGFILE);
>>
>>   # open(OUTFILE,'>',$outfile);
>>   # print OUTFILE "Test2 $result->database_name()";
>>   # close(OUTFILE);
>>
>> #$hit = $result->next_hit;
>> #open(new,'>',$debugfile);
>> #print $hit;
>> #close(new);
>>
>>  while ( my $hit = $result->next_hit ) {
>>
>>           next unless ( $v > 0);
>>
>>         #     open(OUTFILE,'>',$debugfile);
>>          #    print OUTFILE "$hit in while hits";
>>           #  close(OUTFILE);
>>
>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>          my $dna = $sequ->seq();        # get the sequence as a string
>>                 push(@seqs,$dna);
>>         }
>>       }
>>     }
>>   }
>>  }
>>
>>  #open(OUTFILE,'>',$debugfile);
>>  #print OUTFILE $seqs[0];
>>  #close(OUTFILE);
>>
>> return(@seqs);
>>
>> }
>>
>> open(OUTFILE, '>',$outfile) || die ;
>>
>> print OUTFILE "<HTML>\n
>> <head><title>RNAi Result</title>
>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>> <body>\n
>> <p><font face=\"Courier, monospace font set\">
>> Inputsequence: <br>";
>>
>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>
>>   print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>
>>   if ( ($i+1)%10==0){
>>       print OUTFILE " ";
>>   }
>>   if ( ($i+1)%60==0){
>>       print OUTFILE "<br>\n";
>>   }
>> }
>>
>>
>>
>> print OUTFILE "</font> <p>";
>>
>> $z=@compseqs;
>>
>> for($k=1;$k<$z;$k++) {
>>   print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
>> Sequence: <br>";
>>
>>   for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>
>>       print OUTFILE substr ($compseqs[$k], $i, 1);
>>
>>       if ( ($i+1)%10==0){
>>           print OUTFILE " ";
>>       }
>>       if ( ($i+1)%60==0){
>>           print OUTFILE "<br>\n";
>>       }
>>   }
>>   print OUTFILE "<p></font>";
>> }
>>
>> print OUTFILE "<p>
>> Window: <br>$in{'Windowsize'}
>> <p>
>> <p>
>> Threshold: <br>$in{'Threshold'}
>> <p>";
>> my $j=0;
>>
>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>
>>   if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>       if ($out[$i]->{similar}<=$in{'Threshold'}){
>>           $j=$in{'Windowsize'};
>>       }
>>       $height=$out[$i]->{similar}*5;
>>   }
>>
>>   if ($j>0) {
>>       print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>> height=\"5\">";
>>       $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
>> 1)."</font>";
>>       $j--;
>>   }
>>   else {
>>       print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>> height=\"5\">";
>>       $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
>> 1)."</font>";
>>   }
>>
>>   if ( ($i+1)%10==0){
>>       $outstring .= " ";
>>   }
>>   if ( ($i+1)%60==0){
>>       $outstring .= "<br>\n";
>>
>>   }
>>   if ( ($i+1)%800==0){
>>       print OUTFILE "<br><br>\n";
>>
>>   }
>> }
>>
>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>> set\">$outstring</font>";
>>
>> #foreach (@out) {
>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
>> #if ($_->{similar}<=$in{'Threshold'}){
>>
>> #    }
>> #}
>>
>> print OUTFILE "</BODY>\n</HTML>\n";
>>
>> close OUTFILE;
>>
>> #nameprint();
>>
>> sub parse_form {
>>   local ($buffer, @pairs, $pair, $name, $value);
>>   # Read in text
>>   $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>   if ($ENV{'REQUEST_METHOD'} eq "POST")
>>   {
>>       read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>   }
>>   else
>>   {
>>       $buffer = $ENV{'QUERY_STRING'};
>>   }
>>   @pairs = split(/&/, $buffer);
>>   foreach $pair (@pairs)
>>   {
>>       ($name, $value) = split(/=/, $pair);
>>       $value =~ tr/+/ /;
>>       $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>       $in{$name} = $value;
>>   }
>> }
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>>
>


From maj at fortinbras.us  Sat Jan  9 18:05:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sat, 9 Jan 2010 13:05:41 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com><F19004692A4A4350856B23DF25E09074@NewLife>
	<c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
Message-ID: <4C2E8133F916495B876628EF3E8FCBB2@NewLife>

I see it immediately (from making same bug many times) :

 my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
- '$organ[ORGN]');
+"$organ[ORGN]");

MAJ

----- Original Message ----- 
From: "Roopa Raghuveer" <rtbio.2009 at gmail.com>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Saturday, January 09, 2010 11:57 AM
Subject: Re: [Bioperl-l] Regarding blast in Bioperl


> Hello all,
> 
> Thanks alot for your reply Mark. It was working for Trypanosoma brucei as
> the organism parameter,but when I tried to use the Organism parameter from
> the user,it was not working i.e., I was unable to get the target sequences.
> Please help me in this regard. My code is
> 
> #!/usr/bin/perl
> 
> #path for extra camel module
> use lib "/srv/www/htdocs/rain/RNAi/";
> use Roopablast;
> 
> 
> use Bio::SearchIO;
> use Bio::Search::Result::BlastResult;
> use Bio::Perl;
> use Bio::Tools::Run::RemoteBlast;
> use Bio::Seq;
> use Bio::SeqIO;
> use Bio::DB::GenBank;
> 
> $serverpath = "/srv/www/htdocs/rain/RNAi";
> $serverurl = "http://141.84.66.66/rain/RNAi";
> $outfile = $serverpath."/rnairesult_".time().".html";
> $nuc = $serverpath."/nuc".time().".txt";
> $debugfile = $serverpath."/debug_".time().".txt";
> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
> my $outstring ="";
> 
> &parse_form;
> 
> print "Content-type: text/html\n\n";
> print "<HTML>\n";
> print "<head><title>RNAi Result</title>";
> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl/rnairesult_".time().".html\"> \n";
> print "</head>\n";
> print "<body>\n";
> print " Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>";
> print " Please be patient, runtime can be up to 5 minutes<br>";
> print " This page will automatically reload in 30 seconds. Roopa";
> print "</BODY>\n";
> print "</HTML>\n";
> 
> defined(my $pid = fork) or die "Can't fork: $!";
> exit if $pid;
> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
> 
> open(OUTFILE, '>',$outfile);
> 
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
> URL=$serverurl//rnairesult_".time().".html\"> \n
> <meta http-equiv=\"expires\" content=\"0\">
> </head>\n
> <body>\n
>  Your results will appear <a
> href=$serverurl/rnairesult_".time().".html>here</a><br>
>  Please be patient, runtime can be up to 5 minutes wait wait wait......<br>
> This page will automatically reload in 30 seconds Roopa <br>
> </BODY>\n
> </HTML>\n";
> 
> close(OUTFILE);
> 
> 
> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
> 
> $in{'Inputseq'} =~ s/>.*$//m;
> $in{'Inputseq'} =~ s/[^TAGC]//gim;
> $in{'Inputseq'} =~ tr/actg/ACTG/;
> 
> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
> $in{'Threshold'});
> 
> 
> sub blastcode
> {
> 
> $inpu1= $_[0];
> 
> $organ= $_[1];
> 
> open(NUC,'>',$nuc);
> print NUC $inpu1,"\n";
> close(NUC);
> 
> my $prog = 'blastn';
> my $db   = 'refseq_rna';
> my $e_val= '1e-10';
> my $organism= $organ;
> 
> $gb = new Bio::DB::GenBank;
> 
> my @params = ( '-prog' => $prog,
>         '-data' => $db,
>         '-expect' => $e_val,
>         '-readmethod' => 'SearchIO',
>        '-Organism'   => $organism );
> 
>             open(OUTFILE,'>',$debugfile);
>             print OUTFILE $inpu1;
>              close(OUTFILE);
> 
> 
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
> '$organ[ORGN]');
> 
> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
> 
>  #change a paramter
> 
> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
> Brucei[ORGN]';
> 
> #change a paramter
> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
> 
>  my $v = 1;
>  #$v is just to turn on and off the messages
> 
> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => $organ );
> 
> 
> while (my $input = $str->next_seq())
> {
>   #Blast a sequence against a database:
>    #Alternatively, you could  pass in a file with many
>    #sequences rather than loop through sequence one at a time
>    #Remove the loop starting 'while (my $input = $str->next_seq())'
>    #and swap the two lines below for an example of that.
> 
>             #open(OUTFILE,'>',$debugfile);
>              # print OUTFILE $input;
>              #close(OUTFILE);
> 
> 
>   my $r = $factory->submit_blast($input);
> 
>                open(OUTFILE,'>',$debugfile);
>             #   print OUTFILE $r;
>                close(OUTFILE);
> 
>   print STDERR "waiting...." if($v>0);
> 
>  while ( my @rids = $factory->each_rid ) {
>   #   open(OUTFILE,'>',$debugfile);
>    #           print OUTFILE "while entered";
>     #         close(OUTFILE);
>     foreach my $rid ( @rids ) {
> 
>      #         open(OUTFILE,'>',$debugfile);
>       #        print OUTFILE "foreach entered";
>        #      close(OUTFILE);
> 
>        my $rc = $factory->retrieve_blast($rid);
> 
>        if( !ref($rc) )
>        {
>        if( $rc < 0 )
>        {
>        $factory->remove_rid($rid);
>        }
>         open(OUTFILE,'>',$debugfile);
>         #      print OUTFILE "if entered";
>              close(OUTFILE);
>         print STDERR "." if ( $v > 0 );
>         sleep 5;
>        }
>       else {
>          #    open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "else entered";
>            #  close(OUTFILE);
> 
>          my $result = $rc->next_result();
>         #save the output
>        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>          open(BLASTDEBUGFILE,'>',$blastdebugfile);
>          print BLASTDEBUGFILE $result->next_hit();
>          close(BLASTDEBUGFILE);
> 
>        my $filename =
> $serverpath."/blastdata_".time().$result->query_name()."\.out";
> 
>         # open(DEBUGFILE,'>',$debugfile);
>         # open(new,'>',$filename);
>         # @arra=<new>;
>         # print DEBUGFILE @arra;
>         # close(DEBUGFILE);
>         # close(new);
> 
>         $factory->save_output($filename);
>  # open(BLASTDEBUGFILE,'>',$debugfile);
>       # print BLASTDEBUGFILE  "Hello $rid";
>       # close(BLASTDEBUGFILE);
> 
>       $factory->remove_rid($rid);
> 
>       open(BLASTDEBUGFILE,'>',$blastdebugfile);
>       print BLASTDEBUGFILE  $organism;
>        close(BLASTDEBUGFILE);
> 
>    # open(OUTFILE,'>',$outfile);
>    # print OUTFILE "Test2 $result->database_name()";
>    # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> 
>   while ( my $hit = $result->next_hit ) {
> 
>            next unless ( $v > 0);
> 
>          #     open(OUTFILE,'>',$debugfile);
>           #    print OUTFILE "$hit in while hits";
>            #  close(OUTFILE);
> 
>       my $sequ = $gb->get_Seq_by_version($hit->name);
>           my $dna = $sequ->seq();        # get the sequence as a string
>                  push(@seqs,$dna);
>          }
>        }
>      }
>    }
>  }
> 
>  #open(OUTFILE,'>',$debugfile);
>  #print OUTFILE $seqs[0];
>  #close(OUTFILE);
> 
> return(@seqs);
> 
> }
> 
> Regards,
> Roopa.
> 
> 
> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> 
>> Hi Roopa--
>>
>> I got your code to work with the following changes:
>>
>> +# the input should be a valid FASTA file...
>> ...
>> open(NUC,'>',$nuc);
>> +print NUC ">seq (need a name line for valid fasta)\n";
>> print NUC $inpu1, "\n";
>> close(NUC);
>> ...
>>
>> +# you can set these header parms in the call itself...
>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
>> ''Trypanosoma Brucei[ORGN]');
>>
>>  #change a paramter
>> +# commented this out...
>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>> Brucei[ORGN]';
>>
>> MAJ
>> ----- Original Message ----- From: "Roopa Raghuveer" <rtbio.2009 at gmail.com
>> >
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 08, 2010 10:00 AM
>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>
>>
>>  Hello all,
>>>
>>> I was trying Remote blast using Bioperl. My input data is a Trypanosoma
>>> brucei sequence in Fasta format. When I was trying to submit to BLAST
>>> using
>>> the step
>>> $r=$factory->submit_blast($input)
>>> It was not returning anything which I checked by debugging the code. It is
>>> not blasting my input sequence even though I mentioned all the
>>> parameters.I
>>> would paste the code below.
>>>
>>> Please help me in solving put this problem. It is very urgent.
>>>
>>> Regards
>>> Roopa.
>>>
>>> #!/usr/bin/perl
>>>
>>> #path for extra camel module
>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>> use Roopablast;
>>>
>>>
>>> use Bio::SearchIO;
>>> use Bio::Search::Result::BlastResult;
>>> use Bio::Perl;
>>> use Bio::Tools::Run::RemoteBlast;
>>> use Bio::Seq;
>>> use Bio::SeqIO;
>>> use Bio::DB::GenBank;
>>>
>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>> $nuc = $serverpath."/nuc".time().".txt";
>>> $debugfile = $serverpath."/debug_".time().".txt";
>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>
>>> my $outstring ="";
>>>
>>> &parse_form;
>>>
>>> print "Content-type: text/html\n\n";
>>> print "<HTML>\n";
>>> print "<head><title>RNAi Result</title>";
>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>> print "</head>\n";
>>> print "<body>\n";
>>> print " Your results will appear <a
>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>> print " This page will automatically reload in 30 seconds. Roopa";
>>> print "</BODY>\n";
>>> print "</HTML>\n";
>>>
>>> defined(my $pid = fork) or die "Can't fork: $!";
>>> exit if $pid;
>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>
>>>
>>>
>>> open(OUTFILE, '>',$outfile);
>>>
>>> print OUTFILE "<HTML>\n
>>> <head><title>RNAi Result</title>
>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>> <meta http-equiv=\"expires\" content=\"0\">
>>> </head>\n
>>> <body>\n
>>>  Your results will appear <a
>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>> wait......<br>
>>> This page will automatically reload in 30 seconds Roopa <br>
>>> </BODY>\n
>>> </HTML>\n";
>>>
>>> close(OUTFILE);
>>>
>>>
>>> @compseqs = blastcode($in{'Inputseq'});
>>>
>>> $in{'Inputseq'} =~ s/>.*$//m;
>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>
>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>> $in{'Threshold'});
>>>
>>>
>>> sub blastcode
>>> {
>>>
>>> $inpu1= $_[0];
>>>
>>> #$organ= $_[1];
>>>
>>> open(NUC,'>',$nuc);
>>> print NUC $inpu1;
>>> close(NUC);
>>>
>>> my $prog = 'blastn';
>>> my $db   = 'refseq_rna';
>>> my $e_val= '1e-10';
>>> my $organism= 'Trypanosoma Brucei';
>>>
>>> $gb = new Bio::DB::GenBank;
>>>
>>> my @params = ( '-prog' => $prog,
>>>        '-data' => $db,
>>>        '-expect' => $e_val,
>>>        '-readmethod' => 'SearchIO',
>>>        '-Organism'   => $organism );
>>>
>>>           # open(OUTFILE,'>',$debugfile);
>>>            #  print OUTFILE @params;
>>>            # close(OUTFILE);
>>>
>>>
>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>
>>>  #change a paramter
>>>
>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>> Brucei[ORGN]';
>>>
>>> #change a paramter
>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';
>>>
>>>  my $v = 1;
>>>  #$v is just to turn on and off the messages
>>>
>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>> '-organism' => 'Trypanosoma Brucei' );
>>>
>>>
>>> while (my $input = $str->next_seq())
>>> {
>>>  #Blast a sequence against a database:
>>>   #Alternatively, you could  pass in a file with many
>>>   #sequences rather than loop through sequence one at a time
>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>   #and swap the two lines below for an example of that.
>>>
>>>            open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE $input;
>>>             close(OUTFILE);
>>>
>>>
>>>  my $r = $factory->submit_blast($input);    #The program stops here it
>>> does not return any value and it does not enter the While loop,Please help
>>> me in this regard.#
>>>               open(OUTFILE,'>',$debugfile);
>>>               print OUTFILE $r;
>>>               close(OUTFILE);
>>>
>>>
>>>  print STDERR "waiting...." if($v>0);
>>>
>>>  while ( my @rids = $factory->each_rid ) {
>>>     open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "while entered";
>>>             close(OUTFILE);
>>>    foreach my $rid ( @rids ) {
>>>
>>>              open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "foreach entered";
>>>             close(OUTFILE);
>>>
>>>       my $rc = $factory->retrieve_blast($rid);
>>>
>>>       if( !ref($rc) )
>>>       {
>>>       if( $rc < 0 )
>>>       {
>>>       $factory->remove_rid($rid);
>>>       }
>>>        open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "if entered";
>>>             close(OUTFILE);
>>>        print STDERR "." if ( $v > 0 );
>>>        sleep 5;
>>>       }
>>>      else {
>>>             open(OUTFILE,'>',$debugfile);
>>>              print OUTFILE "else entered";
>>>             close(OUTFILE);
>>>
>>>         my $result = $rc->next_result();
>>>        #save the output
>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>
>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>         print BLASTDEBUGFILE $result->next_hit();
>>>         close(BLASTDEBUGFILE);
>>>
>>>       my $filename =
>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>
>>>        # open(DEBUGFILE,'>',$debugfile);
>>>        # open(new,'>',$filename);
>>>        # @arra=<new>;
>>>        # print DEBUGFILE @arra;
>>>        # close(DEBUGFILE);
>>>        # close(new);
>>>
>>>        $factory->save_output($filename);
>>>
>>>      # open(BLASTDEBUGFILE,'>',$debugfile);
>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>      # close(BLASTDEBUGFILE);
>>>
>>>      $factory->remove_rid($rid);
>>>
>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>      print BLASTDEBUGFILE  $organism;
>>>       close(BLASTDEBUGFILE);
>>>
>>>   # open(OUTFILE,'>',$outfile);
>>>   # print OUTFILE "Test2 $result->database_name()";
>>>   # close(OUTFILE);
>>>
>>> #$hit = $result->next_hit;
>>> #open(new,'>',$debugfile);
>>> #print $hit;
>>> #close(new);
>>>
>>>  while ( my $hit = $result->next_hit ) {
>>>
>>>           next unless ( $v > 0);
>>>
>>>         #     open(OUTFILE,'>',$debugfile);
>>>          #    print OUTFILE "$hit in while hits";
>>>           #  close(OUTFILE);
>>>
>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>                 push(@seqs,$dna);
>>>         }
>>>       }
>>>     }
>>>   }
>>>  }
>>>
>>>  #open(OUTFILE,'>',$debugfile);
>>>  #print OUTFILE $seqs[0];
>>>  #close(OUTFILE);
>>>
>>> return(@seqs);
>>>
>>> }
>>>
>>> open(OUTFILE, '>',$outfile) || die ;
>>>
>>> print OUTFILE "<HTML>\n
>>> <head><title>RNAi Result</title>
>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>> <body>\n
>>> <p><font face=\"Courier, monospace font set\">
>>> Inputsequence: <br>";
>>>
>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>
>>>   print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>
>>>   if ( ($i+1)%10==0){
>>>       print OUTFILE " ";
>>>   }
>>>   if ( ($i+1)%60==0){
>>>       print OUTFILE "<br>\n";
>>>   }
>>> }
>>>
>>>
>>>
>>> print OUTFILE "</font> <p>";
>>>
>>> $z=@compseqs;
>>>
>>> for($k=1;$k<$z;$k++) {
>>>   print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
>>> Sequence: <br>";
>>>
>>>   for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>
>>>       print OUTFILE substr ($compseqs[$k], $i, 1);
>>>
>>>       if ( ($i+1)%10==0){
>>>           print OUTFILE " ";
>>>       }
>>>       if ( ($i+1)%60==0){
>>>           print OUTFILE "<br>\n";
>>>       }
>>>   }
>>>   print OUTFILE "<p></font>";
>>> }
>>>
>>> print OUTFILE "<p>
>>> Window: <br>$in{'Windowsize'}
>>> <p>
>>> <p>
>>> Threshold: <br>$in{'Threshold'}
>>> <p>";
>>> my $j=0;
>>>
>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>
>>>   if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>       if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>           $j=$in{'Windowsize'};
>>>       }
>>>       $height=$out[$i]->{similar}*5;
>>>   }
>>>
>>>   if ($j>0) {
>>>       print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>> height=\"5\">";
>>>       $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'}, $i,
>>> 1)."</font>";
>>>       $j--;
>>>   }
>>>   else {
>>>       print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>> height=\"5\">";
>>>       $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'}, $i,
>>> 1)."</font>";
>>>   }
>>>
>>>   if ( ($i+1)%10==0){
>>>       $outstring .= " ";
>>>   }
>>>   if ( ($i+1)%60==0){
>>>       $outstring .= "<br>\n";
>>>
>>>   }
>>>   if ( ($i+1)%800==0){
>>>       print OUTFILE "<br><br>\n";
>>>
>>>   }
>>> }
>>>
>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>> set\">$outstring</font>";
>>>
>>> #foreach (@out) {
>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>
>>> #    }
>>> #}
>>>
>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>
>>> close OUTFILE;
>>>
>>> #nameprint();
>>>
>>> sub parse_form {
>>>   local ($buffer, @pairs, $pair, $name, $value);
>>>   # Read in text
>>>   $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>   if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>   {
>>>       read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>   }
>>>   else
>>>   {
>>>       $buffer = $ENV{'QUERY_STRING'};
>>>   }
>>>   @pairs = split(/&/, $buffer);
>>>   foreach $pair (@pairs)
>>>   {
>>>       ($name, $value) = split(/=/, $pair);
>>>       $value =~ tr/+/ /;
>>>       $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>       $in{$name} = $value;
>>>   }
>>> }
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From robert.bradbury at gmail.com  Sat Jan  9 19:52:53 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sat, 9 Jan 2010 14:52:53 -0500
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<F19004692A4A4350856B23DF25E09074@NewLife>
	<c7cac1601001090857p68a2c1c3x65de9979f47b3a5d@mail.gmail.com>
Message-ID: <deaa866a1001091152u4e85b1eboc99feb52a5b45b5@mail.gmail.com>

Roopa,

Mark is correct, you have to be very careful of single vs. double quotes in
perl. Double quoted strings are "interpreted" while single quoted strings
are taken literally is my current understanding.

I tried to run your script (with fixes) but without the supporting files it
appears to be impossible.

What I am curious about is what it is trying to do, I was particularly i
particularly intrigued by some apparent efforts to parse blast results into
color enhanced HTML and without thinking about the code in detail it seems
easier to simply ask what you are trying to do?  I find "classical" blast
results particularly tedious and long for blast results that display concise
information as the NCBI homologene cross-species comparisons do.
Unfortunately NCBI has deemed their methods (I have asked them) "too complex
to disclose (for a person comfortable in dealing with assembly language, or
even gate level electronics -- "too complex" is a very relative concept)".
One has the option of using NCBI with a limited number of species but good
display methodologies or Ensembl with many more species but less desirable
display methodologies (phylogenetic tree derived from cross species
comparisons).  And for the WRN protein which may play a key role in aging
(through the activity of its exonuclease domain mutating DNA sequences and
inducing microdeletions and microinsertions this gets important because it
appears that the *C. elegans* genome is missing the exonuclease domain (so
it may be useless from the perspective of studying aging), and the other 4
nematode species which have been sequenced aren't even in the NCBI nor the
Ensembl databases.  Needless to say, if we manage in the near future, given
the drop in sequencing costs, to sequence the nematodes which are
freeze/thaw tolerant (which induces DSB that have to be repaired) those
genomes will be unlikely to be in the NCBI/Ensembl databases either.  So
there is a requirement for the user to develop the ability to mix and match
public and obscure databases in creative ways to provide easy to interpret
information.

Robert Bradbury


From robert.bradbury at gmail.com  Sat Jan  9 20:27:54 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sat, 9 Jan 2010 15:27:54 -0500
Subject: [Bioperl-l] Ensembl problems
Message-ID: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>

I am trying to get the examples provided by EMBL/Ensembl to work and am
encountering problems.

For example, about 1/3 of the way through the Compara API tutorial [1] there
is what is supposed to be a completely functional script.  It does not
work.  This is in contrast to some of the earlier simple scripts (listing
the species in  Ensmbl etc.) which do work on my machine, so I have all the
libraries do dah installed correctly).

Very poor form to document scripts which do not function on a properly setup
system.

I have modified my invocation of the script slightly:
  Align.pl --set_of_species \
"Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"

which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
an undefined value at ./Align.pl line 132." (Align.pl is my slightly
modified example of the Compara Tutoraial code.)
As these are slightly modified perl scripts from the documantation, the line
numbers may be variable.

I can print out the genome_dbs, and it gives me a list of genome names (hash
tables) though it appears that is problematic in the Align.pl script.
in spite of the fact that just previously to that call I dumped "genome_dbs"
and got back some 25 hash tables (expected).  I believe this occurs whether
one is comparing "human:mouse" or the more complex species set I have
outlined above.


Has anyone else attempted to run the code documented in the Ensembl API
Tutorial?
Any suggestions as to what direction to go in would be appreciated -- when
one is trying to copy code out of a tutorial and it fails its kind of hard
to know where to go.)

There do appear to be some problems in the specifications of a Compara
version/database and there don't appear to be a lot of resources informing
one of what resources are currently available.

Robert


1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html


From ak at ebi.ac.uk  Sat Jan  9 22:01:21 2010
From: ak at ebi.ac.uk (Andreas =?iso-8859-1?B?S+Ro5HJp?=)
Date: Sat, 9 Jan 2010 22:01:21 +0000
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
Message-ID: <20100109220121.GA9521@quux.windows.ebi.ac.uk>

On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
> I am trying to get the examples provided by EMBL/Ensembl to work and am
> encountering problems.

Hi Robert,

The ensembl-dev list is the appropriate forum for this type of questions
as it has nothing to do with bioperl.

There is also the Ensembl helpdesk.  If you send your problem to
<helpdesk at ensembl.org> I'm sure that it will be picked up by the
appropriate people (I do myself not know enough about the Compara API to
be able to diagnose this problem straight away I'm afraid).

Be sure to submit a minimal script that still exhibit the problem, and
information about what version of the APIs you're using (we will assume
that you're not mixing newer version of the API with older databases or
vice versa).

We are generally very happy to have bugs in documentation or code
pointed out to us, and will correct errors as we are made aware of them.


Kind regards,
Andreas

> For example, about 1/3 of the way through the Compara API tutorial [1] there
> is what is supposed to be a completely functional script.  It does not
> work.  This is in contrast to some of the earlier simple scripts (listing
> the species in  Ensmbl etc.) which do work on my machine, so I have all the
> libraries do dah installed correctly).
> 
> Very poor form to document scripts which do not function on a properly setup
> system.
> 
> I have modified my invocation of the script slightly:
>   Align.pl --set_of_species \
> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
> 
> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
> modified example of the Compara Tutoraial code.)
> As these are slightly modified perl scripts from the documantation, the line
> numbers may be variable.
> 
> I can print out the genome_dbs, and it gives me a list of genome names (hash
> tables) though it appears that is problematic in the Align.pl script.
> in spite of the fact that just previously to that call I dumped "genome_dbs"
> and got back some 25 hash tables (expected).  I believe this occurs whether
> one is comparing "human:mouse" or the more complex species set I have
> outlined above.
> 
> 
> 
> Has anyone else attempted to run the code documented in the Ensembl API
> Tutorial?
> Any suggestions as to what direction to go in would be appreciated -- when
> one is trying to copy code out of a tutorial and it fails its kind of hard
> to know where to go.)
> 
> There do appear to be some problems in the specifications of a Compara
> version/database and there don't appear to be a lot of resources informing
> one of what resources are currently available.
> 
> Robert
> 
> 
> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 

-- 
Andreas K?h?ri, Ensembl Software Developer
European Bioinformatics Institute (EMBL-EBI)
Wellcome Trust Genome Campus, Hinxton
Cambridge CB10 1SD, United Kingdom


From cjfields at illinois.edu  Sat Jan  9 22:01:19 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Jan 2010 16:01:19 -0600
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
Message-ID: <743C998D-BBB5-4832-BA25-24D7D7288F78@illinois.edu>

Robert,

Ensembl errors probably should be redirected to the ensembl mail list.  I can't speak to the problems with it (they appear specific to the Ensembl tool set).

chris

On Jan 9, 2010, at 2:27 PM, Robert Bradbury wrote:

> I am trying to get the examples provided by EMBL/Ensembl to work and am
> encountering problems.
> 
> For example, about 1/3 of the way through the Compara API tutorial [1] there
> is what is supposed to be a completely functional script.  It does not
> work.  This is in contrast to some of the earlier simple scripts (listing
> the species in  Ensmbl etc.) which do work on my machine, so I have all the
> libraries do dah installed correctly).
> 
> Very poor form to document scripts which do not function on a properly setup
> system.
> 
> I have modified my invocation of the script slightly:
>  Align.pl --set_of_species \
> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis familiaris:Sus
> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
> 
> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs" on
> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
> modified example of the Compara Tutoraial code.)
> As these are slightly modified perl scripts from the documantation, the line
> numbers may be variable.
> 
> I can print out the genome_dbs, and it gives me a list of genome names (hash
> tables) though it appears that is problematic in the Align.pl script.
> in spite of the fact that just previously to that call I dumped "genome_dbs"
> and got back some 25 hash tables (expected).  I believe this occurs whether
> one is comparing "human:mouse" or the more complex species set I have
> outlined above.
> 
> 
> 
> Has anyone else attempted to run the code documented in the Ensembl API
> Tutorial?
> Any suggestions as to what direction to go in would be appreciated -- when
> one is trying to copy code out of a tutorial and it fails its kind of hard
> to know where to go.)
> 
> There do appear to be some problems in the specifications of a Compara
> version/database and there don't appear to be a lot of resources informing
> one of what resources are currently available.
> 
> Robert
> 
> 
> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From robert.bradbury at gmail.com  Sun Jan 10 19:47:00 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Sun, 10 Jan 2010 14:47:00 -0500
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <20100109220121.GA9521@quux.windows.ebi.ac.uk>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
	<20100109220121.GA9521@quux.windows.ebi.ac.uk>
Message-ID: <deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>

As it turns out the example from the file I cited (the compara API
tutorial does work).  The code that I started with may have been from
a "MS-WORD" document distributed with the documentation (which could
quite well be out-of-date).

But even the corrected code does not work for various uncommon
comparisons between species (which they may not have archived in
Ensembl).  I also don't understand enough about the functions yet as
to whether they are comparing the same regions from the same
chromosomes that just happen to be identical or whether they are
comparing the same region with a homologous region on a different
chromosome (i.e. conserved genes).  I'm going to have to dig into this
some more to figure out what is going on.

Thanks for the pointers, I'll refer future questions to the Ensembl
list/help-desk.

However, if anyone knows Ensembl very well, the database has in it
some of these interspecies comparisons already.  They are accessed
when one does a phylogeny tree for specific genes (and generally for
highly conserved gene you will get a tree that includes nearly all 50
species in the database).  As I don't think they are computed
on-the-fly, the information must be precomputed and stored someplace
in the database.  I would very much like to know how to access this
information.

Thanks,
Robert


On 1/9/10, Andreas K?h?ri <ak at ebi.ac.uk> wrote:
> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
>> I am trying to get the examples provided by EMBL/Ensembl to work and am
>> encountering problems.
>
> Hi Robert,
>
> The ensembl-dev list is the appropriate forum for this type of questions
> as it has nothing to do with bioperl.
>
> There is also the Ensembl helpdesk.  If you send your problem to
> <helpdesk at ensembl.org> I'm sure that it will be picked up by the
> appropriate people (I do myself not know enough about the Compara API to
> be able to diagnose this problem straight away I'm afraid).
>
> Be sure to submit a minimal script that still exhibit the problem, and
> information about what version of the APIs you're using (we will assume
> that you're not mixing newer version of the API with older databases or
> vice versa).
>
> We are generally very happy to have bugs in documentation or code
> pointed out to us, and will correct errors as we are made aware of them.
>
>
> Kind regards,
> Andreas
>
>> For example, about 1/3 of the way through the Compara API tutorial [1]
>> there
>> is what is supposed to be a completely functional script.  It does not
>> work.  This is in contrast to some of the earlier simple scripts (listing
>> the species in  Ensmbl etc.) which do work on my machine, so I have all
>> the
>> libraries do dah installed correctly).
>>
>> Very poor form to document scripts which do not function on a properly
>> setup
>> system.
>>
>> I have modified my invocation of the script slightly:
>>   Align.pl --set_of_species \
>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis
>> familiaris:Sus
>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
>>
>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs"
>> on
>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
>> modified example of the Compara Tutoraial code.)
>> As these are slightly modified perl scripts from the documantation, the
>> line
>> numbers may be variable.
>>
>> I can print out the genome_dbs, and it gives me a list of genome names
>> (hash
>> tables) though it appears that is problematic in the Align.pl script.
>> in spite of the fact that just previously to that call I dumped
>> "genome_dbs"
>> and got back some 25 hash tables (expected).  I believe this occurs
>> whether
>> one is comparing "human:mouse" or the more complex species set I have
>> outlined above.
>>
>>
>>
>> Has anyone else attempted to run the code documented in the Ensembl API
>> Tutorial?
>> Any suggestions as to what direction to go in would be appreciated -- when
>> one is trying to copy code out of a tutorial and it fails its kind of hard
>> to know where to go.)
>>
>> There do appear to be some problems in the specifications of a Compara
>> version/database and there don't appear to be a lot of resources informing
>> one of what resources are currently available.
>>
>> Robert
>>
>>
>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> --
> Andreas K?h?ri, Ensembl Software Developer
> European Bioinformatics Institute (EMBL-EBI)
> Wellcome Trust Genome Campus, Hinxton
> Cambridge CB10 1SD, United Kingdom
>


From Russell.Smithies at agresearch.co.nz  Sun Jan 10 20:34:39 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 11 Jan 2010 09:34:39 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>

An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. 
In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)

If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:

   my $taxid  = $gi_taxid_nucl{$accession};
   my $org_name = $names{$taxid};

--Russell


> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> Sent: Saturday, 26 December 2009 4:52 p.m.
> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Bhakti,
> The following example (using EUtilities) may serve your purpose:
> 
> use Bio::DB::EUtilities;
> 
> my (%taxa, @taxa);
> my (%names, %idmap);
> 
> # these are protein ids; nuc ids will work by changing -dbfrom =>
> 'nucleotide',
> # (probably)
> 
> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> 
> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>                                        -db => 'taxonomy',
>                                        -dbfrom => 'protein',
>                                        -correspondence => 1,
>                                        -id => \@ids);
> 
> # iterate through the LinkSet objects
> while (my $ds = $factory->next_LinkSet) {
>     $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> }
> 
> @taxa = @taxa{@ids};
> 
> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>         -db    => 'taxonomy',
>         -id    => \@taxa );
> 
> while (local $_ = $factory->next_DocSum) {
>     $names{($_->get_contents_by_name('TaxId'))[0]} =
> ($_->get_contents_by_name('ScientificName'))[0];
> }
> 
> foreach (@ids) {
>     $idmap{$_} = $names{$taxa{$_}};
> }
> 
> # %idmap is
> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> #    68536103 => 'Corynebacterium jeikeium K411'
> #    730439 => 'Bacillus caldolyticus'
> #    89318838 => undef    (this record has been removed from the db)
> 
> 1;
> 
> You probably will need to break up your 30000 into chunks
> (say, 1000-3000 each), and do the above on each chunk with a
> 
> sleep 3;
> 
> or so separating the queries.
> MAJ
> ----- Original Message -----
> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, December 25, 2009 9:46 PM
> Subject: [Bioperl-l] how to retrieve organism name from accession number?
> 
> 
> > Hi,
> >
> > Does anyone know how to retrieve the "Source" or the "Species name"
> given
> > the accession number using Bioperl.   I have these 30,000 accession
> numbers
> > for which I need to get the source organisms.  Any kind of help will be
> > appreciated.
> >
> > Thanks
> >
> > BD
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Sun Jan 10 20:49:40 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 10 Jan 2010 14:49:40 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
Message-ID: <F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>

One could also use Bio::DB::Taxonomy, which indexes the same files or (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the details).

chris

On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:

> An alternate non-BioPerly way (that may be faster given NCBI's flakiness lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and do lookups. 
> In that same dir, taxdump.tar.gz contains a file called names.dmp which lists taxids and descriptions (and synonyms)
> 
> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I could do this:
> 
>   my $taxid  = $gi_taxid_nucl{$accession};
>   my $org_name = $names{$taxid};
> 
> --Russell
> 
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>> Sent: Saturday, 26 December 2009 4:52 p.m.
>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> Bhakti,
>> The following example (using EUtilities) may serve your purpose:
>> 
>> use Bio::DB::EUtilities;
>> 
>> my (%taxa, @taxa);
>> my (%names, %idmap);
>> 
>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>> 'nucleotide',
>> # (probably)
>> 
>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>> 
>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>                                       -db => 'taxonomy',
>>                                       -dbfrom => 'protein',
>>                                       -correspondence => 1,
>>                                       -id => \@ids);
>> 
>> # iterate through the LinkSet objects
>> while (my $ds = $factory->next_LinkSet) {
>>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>> }
>> 
>> @taxa = @taxa{@ids};
>> 
>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>        -db    => 'taxonomy',
>>        -id    => \@taxa );
>> 
>> while (local $_ = $factory->next_DocSum) {
>>    $names{($_->get_contents_by_name('TaxId'))[0]} =
>> ($_->get_contents_by_name('ScientificName'))[0];
>> }
>> 
>> foreach (@ids) {
>>    $idmap{$_} = $names{$taxa{$_}};
>> }
>> 
>> # %idmap is
>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>> #    68536103 => 'Corynebacterium jeikeium K411'
>> #    730439 => 'Bacillus caldolyticus'
>> #    89318838 => undef    (this record has been removed from the db)
>> 
>> 1;
>> 
>> You probably will need to break up your 30000 into chunks
>> (say, 1000-3000 each), and do the above on each chunk with a
>> 
>> sleep 3;
>> 
>> or so separating the queries.
>> MAJ
>> ----- Original Message -----
>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, December 25, 2009 9:46 PM
>> Subject: [Bioperl-l] how to retrieve organism name from accession number?
>> 
>> 
>>> Hi,
>>> 
>>> Does anyone know how to retrieve the "Source" or the "Species name"
>> given
>>> the accession number using Bioperl.   I have these 30,000 accession
>> numbers
>>> for which I need to get the source organisms.  Any kind of help will be
>>> appreciated.
>>> 
>>> Thanks
>>> 
>>> BD
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Sun Jan 10 21:05:06 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Mon, 11 Jan 2010 10:05:06 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>

I've started to go off eUtils recently (not BioPerl's fault) as I've often been finding that with large queries, chunks of the resulting data is missing.
For example, before Xmas I was creating species-specific databases by using eUtils to get a list of GI numbers back for a taxid, then retrieving the fasta sequences in chunks of 500.
Very regularly, in the middle of the fasta there would be a message about resource unavailable eg.
  >test_sequence_1
  TACGATCATCGCTResource UnavailableTACGACTCTGCT
  >test_sequence_2
  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT

Often this wasn't detected until formatdb complained about invalid characters.
Inquiries to NCBI as to why this was happening and what to do about it returned stupid answers ("do each sequence manually thru the web interface", or "use eUtils").
As we have a nice fast network connection, I now prefer to download very large gzip files (i.e. all of refseq) and extract what I need.

I can't help but think that NCBI could solve a lot of problems if they gzipped the output from eUtils queries - it's something I've requested regularly for the last 5 years or so!!

--Russell


> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Monday, 11 January 2010 9:50 a.m.
> To: Smithies, Russell
> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> One could also use Bio::DB::Taxonomy, which indexes the same files or
> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
> details).
> 
> chris
> 
> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> 
> > An alternate non-BioPerly way (that may be faster given NCBI's flakiness
> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash and
> do lookups.
> > In that same dir, taxdump.tar.gz contains a file called names.dmp which
> lists taxids and descriptions (and synonyms)
> >
> > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> could do this:
> >
> >   my $taxid  = $gi_taxid_nucl{$accession};
> >   my $org_name = $names{$taxid};
> >
> > --Russell
> >
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >> Sent: Saturday, 26 December 2009 4:52 p.m.
> >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> Bhakti,
> >> The following example (using EUtilities) may serve your purpose:
> >>
> >> use Bio::DB::EUtilities;
> >>
> >> my (%taxa, @taxa);
> >> my (%names, %idmap);
> >>
> >> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >> 'nucleotide',
> >> # (probably)
> >>
> >> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>
> >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>                                       -db => 'taxonomy',
> >>                                       -dbfrom => 'protein',
> >>                                       -correspondence => 1,
> >>                                       -id => \@ids);
> >>
> >> # iterate through the LinkSet objects
> >> while (my $ds = $factory->next_LinkSet) {
> >>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >> }
> >>
> >> @taxa = @taxa{@ids};
> >>
> >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>        -db    => 'taxonomy',
> >>        -id    => \@taxa );
> >>
> >> while (local $_ = $factory->next_DocSum) {
> >>    $names{($_->get_contents_by_name('TaxId'))[0]} =
> >> ($_->get_contents_by_name('ScientificName'))[0];
> >> }
> >>
> >> foreach (@ids) {
> >>    $idmap{$_} = $names{$taxa{$_}};
> >> }
> >>
> >> # %idmap is
> >> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >> #    68536103 => 'Corynebacterium jeikeium K411'
> >> #    730439 => 'Bacillus caldolyticus'
> >> #    89318838 => undef    (this record has been removed from the db)
> >>
> >> 1;
> >>
> >> You probably will need to break up your 30000 into chunks
> >> (say, 1000-3000 each), and do the above on each chunk with a
> >>
> >> sleep 3;
> >>
> >> or so separating the queries.
> >> MAJ
> >> ----- Original Message -----
> >> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Friday, December 25, 2009 9:46 PM
> >> Subject: [Bioperl-l] how to retrieve organism name from accession
> number?
> >>
> >>
> >>> Hi,
> >>>
> >>> Does anyone know how to retrieve the "Source" or the "Species name"
> >> given
> >>> the accession number using Bioperl.   I have these 30,000 accession
> >> numbers
> >>> for which I need to get the source organisms.  Any kind of help will
> be
> >>> appreciated.
> >>>
> >>> Thanks
> >>>
> >>> BD
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>
> >>>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > =======================================================================
> > Attention: The information contained in this message and/or attachments
> > from AgResearch Limited is intended only for the persons or entities
> > to which it is addressed and may contain confidential and/or privileged
> > material. Any review, retransmission, dissemination or other use of, or
> > taking of any action in reliance upon, this information by persons or
> > entities other than the intended recipients is prohibited by AgResearch
> > Limited. If you have received this message in error, please notify the
> > sender immediately.
> > =======================================================================
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From avilella at gmail.com  Sun Jan 10 21:05:13 2010
From: avilella at gmail.com (Albert Vilella)
Date: Sun, 10 Jan 2010 21:05:13 +0000
Subject: [Bioperl-l] Ensembl problems
In-Reply-To: <deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>
References: <deaa866a1001091227x1052f9bcmf9933898aefd82a@mail.gmail.com>
	<20100109220121.GA9521@quux.windows.ebi.ac.uk>
	<deaa866a1001101147k1c221134n6949ed64566c9a96@mail.gmail.com>
Message-ID: <358f4d651001101305q1b75cfe3q558a245ab1ab1238@mail.gmail.com>

> However, if anyone knows Ensembl very well, the database has in it
> some of these interspecies comparisons already. ?They are accessed
> when one does a phylogeny tree for specific genes (and generally for
> highly conserved gene you will get a tree that includes nearly all 50
> species in the database). ?As I don't think they are computed
> on-the-fly, the information must be precomputed and stored someplace
> in the database. ?I would very much like to know how to access this
> information.

Yes, they are. You can access the data programmatically by installing
the ensembl and ensembl-compara Perl APIs.
There are a few example scripts for the GeneTrees:

ensembl-compara/scripts/examples/homology*.pl

Cheers,

Albert.

> Thanks,
> Robert
>
>
>
>
> On 1/9/10, Andreas K?h?ri <ak at ebi.ac.uk> wrote:
>> On Sat, Jan 09, 2010 at 03:27:54PM -0500, Robert Bradbury wrote:
>>> I am trying to get the examples provided by EMBL/Ensembl to work and am
>>> encountering problems.
>>
>> Hi Robert,
>>
>> The ensembl-dev list is the appropriate forum for this type of questions
>> as it has nothing to do with bioperl.
>>
>> There is also the Ensembl helpdesk. ?If you send your problem to
>> <helpdesk at ensembl.org> I'm sure that it will be picked up by the
>> appropriate people (I do myself not know enough about the Compara API to
>> be able to diagnose this problem straight away I'm afraid).
>>
>> Be sure to submit a minimal script that still exhibit the problem, and
>> information about what version of the APIs you're using (we will assume
>> that you're not mixing newer version of the API with older databases or
>> vice versa).
>>
>> We are generally very happy to have bugs in documentation or code
>> pointed out to us, and will correct errors as we are made aware of them.
>>
>>
>> Kind regards,
>> Andreas
>>
>>> For example, about 1/3 of the way through the Compara API tutorial [1]
>>> there
>>> is what is supposed to be a completely functional script. ?It does not
>>> work. ?This is in contrast to some of the earlier simple scripts (listing
>>> the species in ?Ensmbl etc.) which do work on my machine, so I have all
>>> the
>>> libraries do dah installed correctly).
>>>
>>> Very poor form to document scripts which do not function on a properly
>>> setup
>>> system.
>>>
>>> I have modified my invocation of the script slightly:
>>> ? Align.pl --set_of_species \
>>> "Homo sapiens:Pan troglodytes:Gorilla gorilla:Macaca mulatta:Otolemur
>>> garnettii:Pongo pygmaeus:Equus caballus:Bos taurus:Loxodonta
>>> africana:Pteropus vampyrus:Myotis lucifugus:Felis catus:Canis
>>> familiaris:Sus
>>> scrofa:Rattus norvegicus:Mus musculus:Gallus gallus:Xenopus
>>> tropicalis:Takifugu rubripes:Tetraodon nigroviridis:Danio rerio:Tupaia
>>> belangeri:Caenorhabditis elegans:Saccharomyces cerevisiae"
>>>
>>> which results in "Can't call method "fetch_by_method_link_type_GenomeDBs"
>>> on
>>> an undefined value at ./Align.pl line 132." (Align.pl is my slightly
>>> modified example of the Compara Tutoraial code.)
>>> As these are slightly modified perl scripts from the documantation, the
>>> line
>>> numbers may be variable.
>>>
>>> I can print out the genome_dbs, and it gives me a list of genome names
>>> (hash
>>> tables) though it appears that is problematic in the Align.pl script.
>>> in spite of the fact that just previously to that call I dumped
>>> "genome_dbs"
>>> and got back some 25 hash tables (expected). ?I believe this occurs
>>> whether
>>> one is comparing "human:mouse" or the more complex species set I have
>>> outlined above.
>>>
>>>
>>>
>>> Has anyone else attempted to run the code documented in the Ensembl API
>>> Tutorial?
>>> Any suggestions as to what direction to go in would be appreciated -- when
>>> one is trying to copy code out of a tutorial and it fails its kind of hard
>>> to know where to go.)
>>>
>>> There do appear to be some problems in the specifications of a Compara
>>> version/database and there don't appear to be a lot of resources informing
>>> one of what resources are currently available.
>>>
>>> Robert
>>>
>>>
>>> 1. http://pre.ensembl.org/info/docs/api/compara/compara_tutorial.html
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>> --
>> Andreas K?h?ri, Ensembl Software Developer
>> European Bioinformatics Institute (EMBL-EBI)
>> Wellcome Trust Genome Campus, Hinxton
>> Cambridge CB10 1SD, United Kingdom
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From alessandra.bilardi at gmail.com  Sun Jan 10 23:21:12 2010
From: alessandra.bilardi at gmail.com (Alessandra)
Date: Mon, 11 Jan 2010 00:21:12 +0100
Subject: [Bioperl-l] GBrowse.org project
In-Reply-To: <e0996aca1001101515q8121c87o9b90310691fcd640@mail.gmail.com>
References: <e0996aca1001101515q8121c87o9b90310691fcd640@mail.gmail.com>
Message-ID: <e0996aca1001101521p30b46829p93ee75dd797829b1@mail.gmail.com>

 Hi all,

   I'm Alessandra and I run GBrowse.org.
GBrowse.org is a resource for using and setting up GBrowse genome
browsers. The site provides one location where biologists and
bioinformaticians can find:

  1. Genome browser web sites for any organism that has them. If a
species has a genome browser anywhere on the web, then we aim to link
to it.
  2. Links to sequence and annotation files that are available online.
  3. Links to genome browser configuration files, when available
  4. An FTP site containing genome annotation and configuration files
for each annotated genome that does not have its own web site.

GBrowse.org emphasizes the GBrowse genome browser in its organization,
but also links to sites that use other browser packages such as UCSC,
Ensembl, and JBrowse.

Also, we are currently conducting a survey seeking input on future
project direction. Please take a few minutes now to provide your
feedback.

   Survey link: http://gbrowse.org/survey/index.php?sid=64264&lang=en
   GBrowse.org introdution link:
http://gmod.org/wiki/August_2009_GMOD_Meeting#GBrowse.org

   Thank you for your help,

   Alessandra Bilardi.
   http://gbrowse.org/
   CRIBI Genomics, University of Padua
   http://genomics.cribi.unipd.it/


From cjfields at illinois.edu  Mon Jan 11 03:04:13 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 10 Jan 2010 21:04:13 -0600
Subject: [Bioperl-l] GMOD BioPerl Meeting
Message-ID: <7D72ECC2-E856-4C09-B67A-62AFFB59B377@illinois.edu>

Just a quick reminder that we're having a BioPerl satellite meeting after the PAG Conference (just prior to the GMOD Meeting).  The meeting is this Wednesday, Jan. 13, starting at 11:30am, at the Best Western Seven Seas in San Diego.  I will update the relevant BioPerl and GMOD pages with more details as they become available.  At the moment, we will be meeting in the hotel lobby prior to starting the meeting and possible hackathon.  

http://www.bioperl.org/wiki/GMOD_2010_Meeting
http://gmod.org/wiki/January_2010_GMOD_Meeting#Satellite_Meetings

Thanks!

chris


From bernd.jagla at pasteur.fr  Mon Jan 11 10:11:16 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 11 Jan 2010 11:11:16 +0100
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
Message-ID: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>

Hi,

 
First off, I am not sure if this is supposed to be addressed to the Bioperl
or Gbrowse mailing list, so apologies if this is the wrong list and please
let me know.

 
I am writing a program in Java that needs to access genome annotation data.
Since I am using Gbrowse already I was thinking that I could combine both
approaches making life eventually easier for me. I am mainly interested in
getting a gene/feature name for a given position. The position is stored in
the feature table and through linking typelist, locationlist, (maybe
sequence), and feature I can get all the information I need. Unfortunately
it seems that the feature name is stored in the object blog of the feature
table. 

 
That is a bit suspicious to me because I don't understand why searching for
a name can be so fast if it is not indexed through mysql when searching
using GBrowse.

 
So my question is how to I parse the Bio::DB::SeqFeature object in JAVA
correctly to get the name of the feature and possible also any further
information.

 
Any suggestions are greatly appreciated. Maybe there is a better solution
than parsing Perl code with Java.?

 
Thanks a lot,

 
Bernd


From biopython at maubp.freeserve.co.uk  Mon Jan 11 10:48:52 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 10:48:52 +0000
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
In-Reply-To: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
Message-ID: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>

On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla <bernd.jagla at pasteur.fr> wrote:
> Hi,
>
> First off, I am not sure if this is supposed to be addressed to the Bioperl
> or Gbrowse mailing list, so apologies if this is the wrong list and please
> let me know.
>
> I am writing a program in Java that needs to access genome annotation data.
> Since I am using Gbrowse already I was thinking that I could combine both
> approaches making life eventually easier for me. I am mainly interested in
> getting a gene/feature name for a given position. The position is stored in
> the feature table and through linking typelist, locationlist, (maybe
> sequence), and feature I can get all the information I need. Unfortunately
> it seems that the feature name is stored in the object blog of the feature
> table.

How are you storing the data in Gbrowse? There are several back ends,
and this will make a big difference for accessing the raw data.

One option would be to use Gbrowse with BioSQL as the backend.
You can then use BioJava (or BioPerl, or BioPython, etc) to access the
database. The only downside is Gbrowse isn't working 100% on top
of BioSQL right now (I'd like to see this fixed, but I don't know Perl).
There is an open bug on this [ gmod-Bugs-2168597 ].

Peter


From bernd.jagla at pasteur.fr  Mon Jan 11 10:53:20 2010
From: bernd.jagla at pasteur.fr (Bernd Jagla)
Date: Mon, 11 Jan 2010 11:53:20 +0100
Subject: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
In-Reply-To: <320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>
References: <6D85585C10F94E25898249D2D7CAC0D7@zillumina>
	<320fb6e01001110248t628f0837qa5e057fd53b58eac@mail.gmail.com>
Message-ID: <9056164A8A744A77B6CD1E8E4E20B104@zillumina>

I am using bp_seqfeature_load.pl to load my features. That is using
Bio:DB:SeqFeature(Store) and MySql as a backend... That's all I
understood...

B

> -----Original Message-----
> From: p.j.a.cock at googlemail.com [mailto:p.j.a.cock at googlemail.com] On
> Behalf Of Peter
> Sent: Monday, January 11, 2010 11:49 AM
> To: Bernd Jagla
> Cc: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio:DB:SeqFeature - MySql - Java
> 
> On Mon, Jan 11, 2010 at 10:11 AM, Bernd Jagla <bernd.jagla at pasteur.fr>
> wrote:
> > Hi,
> >
> > First off, I am not sure if this is supposed to be addressed to the
> Bioperl
> > or Gbrowse mailing list, so apologies if this is the wrong list and
> please
> > let me know.
> >
> > I am writing a program in Java that needs to access genome annotation
> data.
> > Since I am using Gbrowse already I was thinking that I could combine
> both
> > approaches making life eventually easier for me. I am mainly interested
> in
> > getting a gene/feature name for a given position. The position is stored
> in
> > the feature table and through linking typelist, locationlist, (maybe
> > sequence), and feature I can get all the information I need.
> Unfortunately
> > it seems that the feature name is stored in the object blog of the
> feature
> > table.
> 
> How are you storing the data in Gbrowse? There are several back ends,
> and this will make a big difference for accessing the raw data.
> 
> One option would be to use Gbrowse with BioSQL as the backend.
> You can then use BioJava (or BioPerl, or BioPython, etc) to access the
> database. The only downside is Gbrowse isn't working 100% on top
> of BioSQL right now (I'd like to see this fixed, but I don't know Perl).
> There is an open bug on this [ gmod-Bugs-2168597 ].
> 
> Peter


From awitney at sgul.ac.uk  Mon Jan 11 12:21:07 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 11 Jan 2010 12:21:07 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
Message-ID: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>

Hi,

I am writing a script to automate the running of Phylip Pars. In the process i have to create a Bio::AlignIO object from a set of data that i have in a hash.

I could write the hash data into a phylip file and then load the Bio::AlignIO from that file, but i wondered if i could skip the writing and then reading of a temporary file ?

thanks for any help

adam


From roy.chaudhuri at gmail.com  Mon Jan 11 13:54:25 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 11 Jan 2010 13:54:25 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <4B4B2A51.9040602@gmail.com>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com>
Message-ID: <4B4B2D91.70906@gmail.com>

Actually, I guess some sample code would be more helpful:

use Bio::LocatableSeq;
use Bio::SimpleAlign;
use Bio::AlignIO;
my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, 
-end=>4);
my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, 
-end=>3);
my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, 
-end=>5);
my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);

Cheers,
Roy.


On 11/01/2010 13:40, Roy Chaudhuri wrote:
> Hi Adam,
>
> I'm guessing you actually want to create a Bio::SimpleAlign object
> (representing an alignment), rather than a Bio::AlignIO object (which is
> just for reading/writing alignment files). Bio::SimpleAlign has a
> documented new method that allows you to construct an alignment from
> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but
> include gaps and start/end coordinates to describe their relationship to
> other sequences in the alignment.
>
> Roy.
>
> On 11/01/2010 12:21, Adam Witney wrote:
>> Hi,
>>
>> I am writing a script to automate the running of Phylip Pars. In the
>> process i have to create a Bio::AlignIO object from a set of data
>> that i have in a hash.
>>
>> I could write the hash data into a phylip file and then load the
>> Bio::AlignIO from that file, but i wondered if i could skip the
>> writing and then reading of a temporary file ?
>>
>> thanks for any help
>>
>> adam _______________________________________________ Bioperl-l
>> mailing list Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From roy.chaudhuri at gmail.com  Mon Jan 11 13:40:33 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Mon, 11 Jan 2010 13:40:33 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
Message-ID: <4B4B2A51.9040602@gmail.com>

Hi Adam,

I'm guessing you actually want to create a Bio::SimpleAlign object 
(representing an alignment), rather than a Bio::AlignIO object (which is 
just for reading/writing alignment files). Bio::SimpleAlign has a 
documented new method that allows you to construct an alignment from 
Bio::LocatableSeq objects, which are similar to Bio::Seq objects but 
include gaps and start/end coordinates to describe their relationship to 
other sequences in the alignment.

Roy.

On 11/01/2010 12:21, Adam Witney wrote:
> Hi,
>
> I am writing a script to automate the running of Phylip Pars. In the
> process i have to create a Bio::AlignIO object from a set of data
> that i have in a hash.
>
> I could write the hash data into a phylip file and then load the
> Bio::AlignIO from that file, but i wondered if i could skip the
> writing and then reading of a temporary file ?
>
> thanks for any help
>
> adam _______________________________________________ Bioperl-l
> mailing list Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Mon Jan 11 14:16:45 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 14:16:45 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
Message-ID: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>

Hi,

I'm running bioperl-live from SVN, just updated to revision 16648.

$ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
1.0069

I am trying to get Bio::SeqIO to convert a multiple record EMBL
file into GenBank format, piping the data via stdin/stdout using
the following trivial Perl script:

#!/usr/bin/env perl
use Bio::SeqIO;
my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
my $out = Bio::SeqIO->new(-format => 'genbank');
while (my $seq = $in->next_seq) { $out->write_seq($seq) };

This only seems to find the first EMBL record in my example
files. For example, this simple file has just two contig records:
http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl

This is just the first two records taken from a much larger EMBL file
rel_con_hum_01_r102.dat downloaded and uncompressed from:
ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz

Trying both these examples as input, BioPerl just gives a single
GenBank record as output (the first EMBL entry in the input).

Is this a BioPerl bug, or am I missing something?

Peter


From maj at fortinbras.us  Mon Jan 11 15:04:00 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 11 Jan 2010 10:04:00 -0500
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>

Hi Peter, 
I found the issue-- there are no SQ lines in the data, and 
having them is a key stop condition in the parser (line 438 embl.pm).
We evidently need to be more liberal in what we accept, even as we 
are strict in what we emit. Could you make a bug report?
thanks for the heads-up--
MAJ
----- Original Message ----- 
From: "Peter" <biopython at maubp.freeserve.co.uk>
To: "bioperl-l list" <bioperl-l at lists.open-bio.org>
Sent: Monday, January 11, 2010 9:16 AM
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records


> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz
> 
> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From biopython at maubp.freeserve.co.uk  Mon Jan 11 15:17:37 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 15:17:37 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
Message-ID: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>

On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
> Hi Peter, I found the issue-- there are no SQ lines in the data, and having
> them is a key stop condition in the parser (line 438 embl.pm).
> We evidently need to be more liberal in what we accept, even as we are
> strict in what we emit. Could you make a bug report?
> thanks for the heads-up--
> MAJ

Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982

These are EMBL contig records, so they don't have SQ lines,
but instead CO lines.

Peter


From cjfields at illinois.edu  Mon Jan 11 15:24:24 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 09:24:24 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<D5C1944EC4E1439AAEA13E378B8FAF7A@NewLife>
	<320fb6e01001110717g93f11ccn13c1010cefeb3a5b@mail.gmail.com>
Message-ID: <CDB3F40D-0298-410B-9814-3D9721380EBA@illinois.edu>


On Jan 11, 2010, at 9:17 AM, Peter wrote:

> On Mon, Jan 11, 2010 at 3:04 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>> 
>> Hi Peter, I found the issue-- there are no SQ lines in the data, and having
>> them is a key stop condition in the parser (line 438 embl.pm).
>> We evidently need to be more liberal in what we accept, even as we are
>> strict in what we emit. Could you make a bug report?
>> thanks for the heads-up--
>> MAJ
> 
> Done: http://bugzilla.open-bio.org/show_bug.cgi?id=2982
> 
> These are EMBL contig records, so they don't have SQ lines,
> but instead CO lines.
> 
> Peter

Peter, 

Just curious, but have you tried the experimental EMBL parser 'embldriver'?  I don't think it's bound to the same strictures, but I may be mistaken.

chris


From cjfields at illinois.edu  Mon Jan 11 15:23:00 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 09:23:00 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <0D0D9DB5-56FA-414E-8D1D-3FE18198F7EC@illinois.edu>

Just saw that mark responded, so if possible submit a bug.  We may be doing a mini-hackathon this Wednesday, so we can probably tackle it in the process (possibly along with a few other pressing issues).

chris

On Jan 11, 2010, at 8:16 AM, Peter wrote:

> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz
> 
> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From biopython at maubp.freeserve.co.uk  Mon Jan 11 15:55:26 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 11 Jan 2010 15:55:26 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <C771056E.6204%hrh@fmi.ch>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
Message-ID: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>

On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>
> These entries form the CON data class, see:
> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
> and they don't contain any sequence information.

I know - GenBank files have a similar system with CONTIG
lines instead of sequences. I was expecting BioPerl to be
able to convert these EMBL files with CO lines into GenBank
files with CONTIG lines.

> If you take the 'expanded' entries from
> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
> your script will work.

That's a useful tip - thanks.

Peter


From hrh at fmi.ch  Mon Jan 11 15:42:22 2010
From: hrh at fmi.ch (Hotz, Hans-Rudolf)
Date: Mon, 11 Jan 2010 16:42:22 +0100
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
Message-ID: <C771056E.6204%hrh@fmi.ch>


On 1/11/10 3:16 PM, "Peter" <biopython at maubp.freeserve.co.uk> wrote:

> Hi,
> 
> I'm running bioperl-live from SVN, just updated to revision 16648.
> 
> $ perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'
> 1.0069
> 
> I am trying to get Bio::SeqIO to convert a multiple record EMBL
> file into GenBank format, piping the data via stdin/stdout using
> the following trivial Perl script:
> 
> #!/usr/bin/env perl
> use Bio::SeqIO;
> my $in  = Bio::SeqIO->new(-fh => \*STDIN, -format => 'embl');
> my $out = Bio::SeqIO->new(-format => 'genbank');
> while (my $seq = $in->next_seq) { $out->write_seq($seq) };
> 
> This only seems to find the first EMBL record in my example
> files. For example, this simple file has just two contig records:
> http://biopython.open-bio.org/SRC/biopython/Tests/EMBL/Human_contigs.embl
> 
> This is just the first two records taken from a much larger EMBL file
> rel_con_hum_01_r102.dat downloaded and uncompressed from:
> ftp://ftp.ebi.ac.uk/pub/databases/embl/release/rel_con_hum_01_r102.dat.gz

These entries form the CON data class, see:
http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
and they don't contain any sequence information.

If you take the 'expanded' entries from
ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r
102.dat.gz
your script will work.


Hans


> Trying both these examples as input, BioPerl just gives a single
> GenBank record as output (the first EMBL entry in the input).
> 
> Is this a BioPerl bug, or am I missing something?
> 
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From awitney at sgul.ac.uk  Mon Jan 11 16:27:15 2010
From: awitney at sgul.ac.uk (Adam Witney)
Date: Mon, 11 Jan 2010 16:27:15 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <4B4B2D91.70906@gmail.com>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
Message-ID: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>


Ah excellent, thanks Roy. I was indeed thinking about it the wrong way.

In the process of writing this i have created a 

Bio::Tools::Run::Phylo::Phylip::Pars class

which is essentially just a modified copy of ProtPars. I have also fixed a few typos and possible bugs in

Bio/Tools/Run/Phylo/Phylip/Base.pm
Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm
Bio/AlignIO/phylip.pm
Bio/Tools/Run/Alignment/Clustalw.pm

I am of course happy to send these back in to the project... how would i best do this?

Cheers

adam


On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote:

> Actually, I guess some sample code would be more helpful:
> 
> use Bio::LocatableSeq;
> use Bio::SimpleAlign;
> use Bio::AlignIO;
> my $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1, -end=>4);
> my $seq2=Bio::LocatableSeq->new(-id=>'two', -seq=>'A--CG', -start=>1, -end=>3);
> my $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG', -start=>1, -end=>5);
> my $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);
> 
> Cheers,
> Roy.
> 
> 
> On 11/01/2010 13:40, Roy Chaudhuri wrote:
>> Hi Adam,
>> 
>> I'm guessing you actually want to create a Bio::SimpleAlign object
>> (representing an alignment), rather than a Bio::AlignIO object (which is
>> just for reading/writing alignment files). Bio::SimpleAlign has a
>> documented new method that allows you to construct an alignment from
>> Bio::LocatableSeq objects, which are similar to Bio::Seq objects but
>> include gaps and start/end coordinates to describe their relationship to
>> other sequences in the alignment.
>> 
>> Roy.
>> 
>> On 11/01/2010 12:21, Adam Witney wrote:
>>> Hi,
>>> 
>>> I am writing a script to automate the running of Phylip Pars. In the
>>> process i have to create a Bio::AlignIO object from a set of data
>>> that i have in a hash.
>>> 
>>> I could write the hash data into a phylip file and then load the
>>> Bio::AlignIO from that file, but i wondered if i could skip the
>>> writing and then reading of a temporary file ?
>>> 
>>> thanks for any help
>>> 
>>> adam _______________________________________________ Bioperl-l
>>> mailing list Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 


From Russell.Smithies at agresearch.co.nz  Tue Jan 12 03:41:02 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 12 Jan 2010 16:41:02 +1300
Subject: [Bioperl-l] BioPerl version?
In-Reply-To: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>

Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ?

--Russell

=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Tue Jan 12 03:59:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 11 Jan 2010 21:59:44 -0600
Subject: [Bioperl-l] BioPerl version?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
	<18DF7D20DFEC044098A1062202F5FFF32C619262C8@exchsth.agresearch.co.nz>
Message-ID: <795BD926-4AE9-4478-AAD5-E36558350745@illinois.edu>

Not dumb, but a frequently asked one: that's a FAQ question ;>

http://www.bioperl.org/wiki/FAQ#How_can_I_tell_what_version_of_BioPerl_is_installed.3F

perl -MBio::Root::Version -e 'print $Bio::Root::Version::VERSION,"\n"'

chris

On Jan 11, 2010, at 9:41 PM, Smithies, Russell wrote:

> Probably a dumb question but how do I find the version of an existing BioPerl installation without resorting to reading thru Bio/Root/Version.pm ?
> 
> --Russell
> 
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 12 16:02:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 10:02:02 -0600
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
	<320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
Message-ID: <ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>

On Jan 11, 2010, at 9:55 AM, Peter wrote:

> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>> 
>> These entries form the CON data class, see:
>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>> and they don't contain any sequence information.
> 
> I know - GenBank files have a similar system with CONTIG
> lines instead of sequences. I was expecting BioPerl to be
> able to convert these EMBL files with CO lines into GenBank
> files with CONTIG lines.

IIRC the contig information for GenBank is stored in annotation.  We can try to ensure the data is carried over to EMBL properly.

>> If you take the 'expanded' entries from
>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>> your script will work.
> 
> That's a useful tip - thanks.
> 
> Peter

NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).

chris


From biopython at maubp.freeserve.co.uk  Tue Jan 12 16:19:32 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 12 Jan 2010 16:19:32 +0000
Subject: [Bioperl-l] BioPerl 1.6 and parsing multiple EMBL records
In-Reply-To: <ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>
References: <320fb6e01001110616j6ddcfd7bpe5b5852cee133798@mail.gmail.com>
	<C771056E.6204%hrh@fmi.ch>
	<320fb6e01001110755u50fd4255i1f6dae40a608a562@mail.gmail.com>
	<ECE66D72-737E-467B-9799-72CC78B17DAF@illinois.edu>
Message-ID: <320fb6e01001120819u50e73fa8k9bde8aa1abdf942d@mail.gmail.com>

On Tue, Jan 12, 2010 at 4:02 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Jan 11, 2010, at 9:55 AM, Peter wrote:
>
>> On Mon, Jan 11, 2010 at 3:42 PM, Hotz, Hans-Rudolf <hrh at fmi.ch> wrote:
>>>
>>> These entries form the CON data class, see:
>>> http://www.ebi.ac.uk/embl/Documentation/User_manual/usrman.html#3_4_14
>>> and they don't contain any sequence information.
>>
>> I know - GenBank files have a similar system with CONTIG
>> lines instead of sequences. I was expecting BioPerl to be
>> able to convert these EMBL files with CO lines into GenBank
>> files with CONTIG lines.
>
> IIRC the contig information for GenBank is stored in annotation.
> We can try to ensure the data is carried over to EMBL properly.

For contig records (where there is no sequence) I think we just
need to map the GenBank CONTIG lines to the EMBL CO lines,
and vice versa. At least, that's what Biopython now does (trunk
code, not yet released).

>>> If you take the 'expanded' entries from
>>> ftp://ftp.ebi.ac.uk/pub/databases/embl/expanded_con/release/rel_con_hum_01_r102.dat.gz
>>> your script will work.
>>
>> That's a useful tip - thanks.
>>
>> Peter
>
> NCBI's eutil option 'gbwithparts' is similar (always retrieves the sequence).

Indeed. This is a useful work around for when a parser couldn't
cope with the contig version of a GenBank file for some reason, e.g.
http://bugzilla.open-bio.org/show_bug.cgi?id=2745

Peter


From maj at fortinbras.us  Tue Jan 12 17:33:30 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 12:33:30 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
Message-ID: <231A8D9473704E7697F7A486A0CDA86A@NewLife>

Hi All--

The beta of Bio::DB::SoapEUtilities is now available in the
bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
service. The system is fully WSDL based, and all eutils are
available. The best thing (IMHO) are the result adaptors, which
provide conversion and iteration of SOAP results into BioPerl
objects. Schau, mal:

 use Bio::DB::EUtilities;
 my $fac = Bio::DB::EUtilities->new(); # step 1
 my $seqio = $fac->esearch(
       -db => 'nucleotide', 
       -term => 'HIV1 and CCR5 and Brazil'
    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
 # yes, it's already done the efetch under the hood...
 while ( my $seq = $seqio->next_seq ) { # step 4
  # do something with $seq, a Bio::Seq object...
 }

or this:

 my $links = $fac->elink( -db => 'protein', 
                          -dbfrom => 'nucleotide',
                          -id => \@nucids )->run( -auto_adapt => 1 );
 
 # maybe more than one associated id...
 my @prot_0 = $links->id_map( $nucids[0] );
   
 while ( my $ls = $links->next_linkset ) {
    @ids = $ls->ids;
    @submitted_ids = $ls->submitted_ids;
    # etc.
 }

and much, much more. See

http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service

and of course, the POD, for all the details, including 
download/installation. Tests in bioperl-run/t.

cheers, 
MAJ

-- No new dependencies were added or animals mistreated 
-- during the making of these modules.


From sheldon.mckay at gmail.com  Tue Jan 12 18:02:53 2010
From: sheldon.mckay at gmail.com (Sheldon McKay)
Date: Tue, 12 Jan 2010 10:02:53 -0800
Subject: [Bioperl-l] code.open-bio.org timing out?
Message-ID: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>

Hi all,

I keep timing out trying to do an svn checkout of bioperl-live from
code.open-bio.org.  Any suggestions?

Thanks,
Sheldon

----
Sheldon McKay, PhD
Lead, iPlant Tree of Life Engagement Team;
Research Investigator
Cold Spring Harbor Laboratory
http://mckay.cshl.edu
Google Voice:  (203) 701-9204


On Tue, Nov 3, 2009 at 9:09 AM, Aaron Mackey <amackey at virginia.edu> wrote:
> [ajm6q at lc4 bioperl-live]$ svn update
> svn: Decompression of svndiff data failed
>
>
> I'll admit to not having svn updated in awhile; A clean, anonymous svn co
> failed with the same message:
>
> [...]
> A ? ?bioperl-live/Bio/Structure/StructureI.pm
> A ? ?bioperl-live/Bio/Structure/IO
> svn: Decompression of svndiff data failed
>
> -Aaron
>
> P.S. I used this command: svn co svn://
> code.open-bio.org/bioperl/bioperl-live/trunk bioperl-live
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From biopython at maubp.freeserve.co.uk  Tue Jan 12 18:12:46 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 12 Jan 2010 18:12:46 +0000
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
Message-ID: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>

On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com> wrote:
> Hi all,
>
> I keep timing out trying to do an svn checkout of bioperl-live from
> code.open-bio.org. ?Any suggestions?
>
> Thanks,
> Sheldon

The OBF team know about this (its being discussed on root-l),
hopefully they'll have it fixed before too long.

Peter


From cjfields at illinois.edu  Tue Jan 12 18:18:45 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 12:18:45 -0600
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
	<320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
Message-ID: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>

On Jan 12, 2010, at 12:12 PM, Peter wrote:

> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com> wrote:
>> Hi all,
>> 
>> I keep timing out trying to do an svn checkout of bioperl-live from
>> code.open-bio.org.  Any suggestions?
>> 
>> Thanks,
>> Sheldon
> 
> The OBF team know about this (its being discussed on root-l),
> hopefully they'll have it fixed before too long.
> 
> Peter

We probably need to set up some automatic syncing of our read-only code.google.com repo as a backup.  Jason had originally set that up, hopefully he'll respond.

chris


From jason at bioperl.org  Tue Jan 12 18:27:55 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 12 Jan 2010 10:27:55 -0800
Subject: [Bioperl-l] code.open-bio.org timing out?
In-Reply-To: <8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>
References: <bdd134571001121002x5ce156e2udb322af1be0a36d4@mail.gmail.com>
	<320fb6e01001121012r68166814o764df62c0a5a6224@mail.gmail.com>
	<8E18DCA9-5C72-4201-A213-BF53A6AAAAD2@illinois.edu>
Message-ID: <C9DDBB08-DB88-4596-AED3-B3FD89893C55@bioperl.org>

Hi - I had setup the google code sync, but then the unfortunately  
realization that the revision numbers are shared among the wiki and  
the code SVN (all 1 repo) so when I added a wiki page on the site I  
screwed up the numbering and it wasn't possible to sync anymore (that  
I could figure out) without resetting it and I haven't gone back to  
that. Sorry - I wasn't sure if we had figured out what we wanted to  
for repositories so I sort of stopped worrying about it.


-jason
On Jan 12, 2010, at 10:18 AM, Chris Fields wrote:

> On Jan 12, 2010, at 12:12 PM, Peter wrote:
>
>> On Tue, Jan 12, 2010 at 6:02 PM, Sheldon McKay <sheldon.mckay at gmail.com 
>> > wrote:
>>> Hi all,
>>>
>>> I keep timing out trying to do an svn checkout of bioperl-live from
>>> code.open-bio.org.  Any suggestions?
>>>
>>> Thanks,
>>> Sheldon
>>
>> The OBF team know about this (its being discussed on root-l),
>> hopefully they'll have it fixed before too long.
>>
>> Peter
>
> We probably need to set up some automatic syncing of our read-only  
> code.google.com repo as a backup.  Jason had originally set that up,  
> hopefully he'll respond.
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From virajj at gmail.com  Wed Jan  6 18:20:39 2010
From: virajj at gmail.com (Vijayaraj Nagarajan)
Date: Wed, 6 Jan 2010 13:20:39 -0500
Subject: [Bioperl-l] targetp request
Message-ID: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>

Hi,

I am trying to use targetP in bioperl.
the documentation at the bioperl site is a bit confusing to me...

I would appreciate if you could give a very small example, as to how to use
"Bio::Tools::TargetP" to predict the localization of a protein sequence that
i have stored as a string.

Thanks,
Vijay


From cjfields at illinois.edu  Tue Jan 12 23:36:53 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 17:36:53 -0600
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
Message-ID: <D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>

Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API conflict with the current EUtilities tools.

chris

On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:

> Hi All--
> 
> The beta of Bio::DB::SoapEUtilities is now available in the
> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
> service. The system is fully WSDL based, and all eutils are
> available. The best thing (IMHO) are the result adaptors, which
> provide conversion and iteration of SOAP results into BioPerl
> objects. Schau, mal:
> 
> use Bio::DB::EUtilities;
> my $fac = Bio::DB::EUtilities->new(); # step 1
> my $seqio = $fac->esearch(
>       -db => 'nucleotide', 
>       -term => 'HIV1 and CCR5 and Brazil'
>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
> # yes, it's already done the efetch under the hood...
> while ( my $seq = $seqio->next_seq ) { # step 4
>  # do something with $seq, a Bio::Seq object...
> }
> 
> or this:
> 
> my $links = $fac->elink( -db => 'protein', 
>                          -dbfrom => 'nucleotide',
>                          -id => \@nucids )->run( -auto_adapt => 1 );
> 
> # maybe more than one associated id...
> my @prot_0 = $links->id_map( $nucids[0] );
> 
> while ( my $ls = $links->next_linkset ) {
>    @ids = $ls->ids;
>    @submitted_ids = $ls->submitted_ids;
>    # etc.
> }
> 
> and much, much more. See
> 
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
> 
> and of course, the POD, for all the details, including 
> download/installation. Tests in bioperl-run/t.
> 
> cheers, 
> MAJ
> 
> -- No new dependencies were added or animals mistreated 
> -- during the making of these modules.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jan 13 00:22:10 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 12 Jan 2010 18:22:10 -0600
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
	<D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
	<5AD210CB0C444A57881BBDD34DE99149@NewLife>
Message-ID: <B536964F-8F2F-4E07-9FD3-B7D0A945253E@illinois.edu>

Okay, just making sure (I was getting a bit paranoid).  Great work on the SOAP interface, BTW!

chris

On Jan 12, 2010, at 6:08 PM, Mark A. Jensen wrote:

> Um, yeah.
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "BioPerl List" <bioperl-l at bioperl.org>
> Sent: Tuesday, January 12, 2010 6:36 PM
> Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web service
> 
> 
> Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API conflict with the current EUtilities tools.
> 
> chris
> 
> On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:
> 
>> Hi All--
>> 
>> The beta of Bio::DB::SoapEUtilities is now available in the
>> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
>> service. The system is fully WSDL based, and all eutils are
>> available. The best thing (IMHO) are the result adaptors, which
>> provide conversion and iteration of SOAP results into BioPerl
>> objects. Schau, mal:
>> 
>> use Bio::DB::EUtilities;
>> my $fac = Bio::DB::EUtilities->new(); # step 1
>> my $seqio = $fac->esearch(
>>      -db => 'nucleotide',
>>      -term => 'HIV1 and CCR5 and Brazil'
>>   )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
>> # yes, it's already done the efetch under the hood...
>> while ( my $seq = $seqio->next_seq ) { # step 4
>> # do something with $seq, a Bio::Seq object...
>> }
>> 
>> or this:
>> 
>> my $links = $fac->elink( -db => 'protein',
>>                         -dbfrom => 'nucleotide',
>>                         -id => \@nucids )->run( -auto_adapt => 1 );
>> 
>> # maybe more than one associated id...
>> my @prot_0 = $links->id_map( $nucids[0] );
>> 
>> while ( my $ls = $links->next_linkset ) {
>>   @ids = $ls->ids;
>>   @submitted_ids = $ls->submitted_ids;
>>   # etc.
>> }
>> 
>> and much, much more. See
>> 
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>> 
>> and of course, the POD, for all the details, including
>> download/installation. Tests in bioperl-run/t.
>> 
>> cheers,
>> MAJ
>> 
>> -- No new dependencies were added or animals mistreated
>> -- during the making of these modules.
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Wed Jan 13 00:08:12 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 19:08:12 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web
	service
In-Reply-To: <D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife>
	<D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
Message-ID: <5AD210CB0C444A57881BBDD34DE99149@NewLife>

Um, yeah.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Tuesday, January 12, 2010 6:36 PM
Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web 
service


Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's 
Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API 
conflict with the current EUtilities tools.

chris

On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:

> Hi All--
>
> The beta of Bio::DB::SoapEUtilities is now available in the
> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
> service. The system is fully WSDL based, and all eutils are
> available. The best thing (IMHO) are the result adaptors, which
> provide conversion and iteration of SOAP results into BioPerl
> objects. Schau, mal:
>
> use Bio::DB::EUtilities;
> my $fac = Bio::DB::EUtilities->new(); # step 1
> my $seqio = $fac->esearch(
>       -db => 'nucleotide',
>       -term => 'HIV1 and CCR5 and Brazil'
>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
> # yes, it's already done the efetch under the hood...
> while ( my $seq = $seqio->next_seq ) { # step 4
>  # do something with $seq, a Bio::Seq object...
> }
>
> or this:
>
> my $links = $fac->elink( -db => 'protein',
>                          -dbfrom => 'nucleotide',
>                          -id => \@nucids )->run( -auto_adapt => 1 );
>
> # maybe more than one associated id...
> my @prot_0 = $links->id_map( $nucids[0] );
>
> while ( my $ls = $links->next_linkset ) {
>    @ids = $ls->ids;
>    @submitted_ids = $ls->submitted_ids;
>    # etc.
> }
>
> and much, much more. See
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>
> and of course, the POD, for all the details, including
> download/installation. Tests in bioperl-run/t.
>
> cheers,
> MAJ
>
> -- No new dependencies were added or animals mistreated
> -- during the making of these modules.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Jan 13 01:09:28 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Tue, 12 Jan 2010 20:09:28 -0500
Subject: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP
	webservice
In-Reply-To: <5AD210CB0C444A57881BBDD34DE99149@NewLife>
References: <231A8D9473704E7697F7A486A0CDA86A@NewLife><D0ECBBE3-9492-457F-9478-8B28AF5CC61E@illinois.edu>
	<5AD210CB0C444A57881BBDD34DE99149@NewLife>
Message-ID: <A5829F72FD6F469D9CBCC94FC69C068F@NewLife>

corrected:

use Bio::DB::SoapEUtilities;
my $fac = Bio::DB::SoapEUtilities->new(); # step 1
my $seqio = $fac->esearch(
       -db => 'nucleotide',
       -term => 'HIV1 and CCR5 and Brazil'
    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
 # yes, it's already done the efetch under the hood...
 while ( my $seq = $seqio->next_seq ) { # step 4
  # do something with $seq, a Bio::Seq object...
 }

----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: "BioPerl List" <bioperl-l at bioperl.org>
Sent: Tuesday, January 12, 2010 7:08 PM
Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP 
webservice


> Um, yeah.
> ----- Original Message ----- 
> From: "Chris Fields" <cjfields at illinois.edu>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: "BioPerl List" <bioperl-l at bioperl.org>
> Sent: Tuesday, January 12, 2010 6:36 PM
> Subject: Re: [Bioperl-l] Bio::DB::SoapEUtilities : access to Entrez SOAP web 
> service
>
>
> Um, just to be clear, this isn't Bio::DB::EUtilities, right (it's 
> Bio::DB::SoapEUtilities)?  Otherwise this would be a serious namespace and API 
> conflict with the current EUtilities tools.
>
> chris
>
> On Jan 12, 2010, at 11:33 AM, Mark A. Jensen wrote:
>
>> Hi All--
>>
>> The beta of Bio::DB::SoapEUtilities is now available in the
>> bioperl-run trunk: one-stop shopping for the NCBI Entrez SOAP web
>> service. The system is fully WSDL based, and all eutils are
>> available. The best thing (IMHO) are the result adaptors, which
>> provide conversion and iteration of SOAP results into BioPerl
>> objects. Schau, mal:
>>
>> use Bio::DB::EUtilities;
>> my $fac = Bio::DB::EUtilities->new(); # step 1
>> my $seqio = $fac->esearch(
>>       -db => 'nucleotide',
>>       -term => 'HIV1 and CCR5 and Brazil'
>>    )->run(-auto_adapt => 1, -rettype => 'fasta'); # step 2, 3
>> # yes, it's already done the efetch under the hood...
>> while ( my $seq = $seqio->next_seq ) { # step 4
>>  # do something with $seq, a Bio::Seq object...
>> }
>>
>> or this:
>>
>> my $links = $fac->elink( -db => 'protein',
>>                          -dbfrom => 'nucleotide',
>>                          -id => \@nucids )->run( -auto_adapt => 1 );
>>
>> # maybe more than one associated id...
>> my @prot_0 = $links->id_map( $nucids[0] );
>>
>> while ( my $ls = $links->next_linkset ) {
>>    @ids = $ls->ids;
>>    @submitted_ids = $ls->submitted_ids;
>>    # etc.
>> }
>>
>> and much, much more. See
>>
>> http://www.bioperl.org/wiki/HOWTO:EUtilities_Web_Service
>>
>> and of course, the POD, for all the details, including
>> download/installation. Tests in bioperl-run/t.
>>
>> cheers,
>> MAJ
>>
>> -- No new dependencies were added or animals mistreated
>> -- during the making of these modules.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From tuco at pasteur.fr  Wed Jan 13 10:24:34 2010
From: tuco at pasteur.fr (Emmanuel Quevillon)
Date: Wed, 13 Jan 2010 11:24:34 +0100
Subject: [Bioperl-l] targetp request
In-Reply-To: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
Message-ID: <4B4D9F62.5010306@pasteur.fr>

On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote:
> Hi,
>
> I am trying to use targetP in bioperl.
> the documentation at the bioperl site is a bit confusing to me...
>
> I would appreciate if you could give a very small example, as to how to use
> "Bio::Tools::TargetP" to predict the localization of a protein sequence that
> i have stored as a string.
>
> Thanks,
> Vijay
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Dear Vivay,

Bio::Tools::TargetP is not intended to run targetp on a sequence but to 
read and parse results from targetp run.

 From the Pod doc :

DESCRIPTION
        TargetP modules will provides parsed informations about protein 
localization.  It
        reads in a targetp output file.  It parses the results, and 
returns a
        Bio::SeqFeature::Generic object for each sequences found to have 
a subcellular
        localization


So to analyze your sequence, you'll first need to run targetp on your 
sequence file to create a targetp result output file. Then use 
Bio::Tools::TargetP module to parse this result file and get only 
informations you want/need from the result to be display as shown in the 
SYNOPSIS of the Pod documentation of the module.

HTH

Regards

Emmanuel


From roy.chaudhuri at gmail.com  Wed Jan 13 12:52:58 2010
From: roy.chaudhuri at gmail.com (Roy Chaudhuri)
Date: Wed, 13 Jan 2010 12:52:58 +0000
Subject: [Bioperl-l] create Bio::AlignIO object from hash
In-Reply-To: <D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
References: <B8829F91-C130-447A-9449-55A4768C98CF@sgul.ac.uk>
	<4B4B2A51.9040602@gmail.com> <4B4B2D91.70906@gmail.com>
	<D35A61D6-1015-4AAD-97A8-C8764485377F@sgul.ac.uk>
Message-ID: <4B4DC22A.8080701@gmail.com>

Upload them to Bugzilla as patches, and one of the devs will review your 
changes and incorporate them into bioperl-live:
http://www.bioperl.org/wiki/HOWTO:SubmitPatch

Roy.

On 11/01/2010 16:27, Adam Witney wrote:
>
> Ah excellent, thanks Roy. I was indeed thinking about it the wrong
> way.
>
> In the process of writing this i have created a
>
> Bio::Tools::Run::Phylo::Phylip::Pars class
>
> which is essentially just a modified copy of ProtPars. I have also
> fixed a few typos and possible bugs in
>
> Bio/Tools/Run/Phylo/Phylip/Base.pm
> Bio/Tools/Run/Phylo/Phylip/PhylipConf.pm Bio/AlignIO/phylip.pm
> Bio/Tools/Run/Alignment/Clustalw.pm
>
> I am of course happy to send these back in to the project... how
> would i best do this?
>
> Cheers
>
> adam
>
>
> On 11 Jan 2010, at 13:54, Roy Chaudhuri wrote:
>
>> Actually, I guess some sample code would be more helpful:
>>
>> use Bio::LocatableSeq; use Bio::SimpleAlign; use Bio::AlignIO; my
>> $seq1=Bio::LocatableSeq->new(-id=>'one', -seq=>'AT-CG', -start=>1,
>> -end=>4); my $seq2=Bio::LocatableSeq->new(-id=>'two',
>> -seq=>'A--CG', -start=>1, -end=>3); my
>> $seq3=Bio::LocatableSeq->new(-id=>'three', -seq=>'ATTCG',
>> -start=>1, -end=>5); my
>> $aln=Bio::SimpleAlign->new(-seqs=>[$seq1,$seq2,$seq3]);
>> Bio::AlignIO->new(-format=>'phylip')->write_aln($aln);
>>
>> Cheers, Roy.
>>
>>
>> On 11/01/2010 13:40, Roy Chaudhuri wrote:
>>> Hi Adam,
>>>
>>> I'm guessing you actually want to create a Bio::SimpleAlign
>>> object (representing an alignment), rather than a Bio::AlignIO
>>> object (which is just for reading/writing alignment files).
>>> Bio::SimpleAlign has a documented new method that allows you to
>>> construct an alignment from Bio::LocatableSeq objects, which are
>>> similar to Bio::Seq objects but include gaps and start/end
>>> coordinates to describe their relationship to other sequences in
>>> the alignment.
>>>
>>> Roy.
>>>
>>> On 11/01/2010 12:21, Adam Witney wrote:
>>>> Hi,
>>>>
>>>> I am writing a script to automate the running of Phylip Pars.
>>>> In the process i have to create a Bio::AlignIO object from a
>>>> set of data that i have in a hash.
>>>>
>>>> I could write the hash data into a phylip file and then load
>>>> the Bio::AlignIO from that file, but i wondered if i could skip
>>>> the writing and then reading of a temporary file ?
>>>>
>>>> thanks for any help
>>>>
>>>> adam _______________________________________________ Bioperl-l
>>>> mailing list Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>


From marcelo011982 at gmail.com  Wed Jan 13 18:12:04 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Wed, 13 Jan 2010 16:12:04 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
Message-ID: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>

Hi..
I have an simple Blast result, such as blastn.
Is there an  scrip  to transform such result to Clustalw format in Bioperl
?(.aln)

Thanx for any help.


From Kevin.M.Brown at asu.edu  Wed Jan 13 18:01:42 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 13 Jan 2010 11:01:42 -0700
Subject: [Bioperl-l] targetp request
In-Reply-To: <4B4D9F62.5010306@pasteur.fr>
References: <581b44531001061020i68b2e80ic15f3bcc830204a@mail.gmail.com>
	<4B4D9F62.5010306@pasteur.fr>
Message-ID: <1A4207F8295607498283FE9E93B775B4067C133E@EX02.asurite.ad.asu.edu>

Sounds like this module might be in the wrong place then. Sounds more
like a SeqIO or AlignIO module, heheh. Also looks like the docs might
need to be cleaned up a bit for english readability (at least that
initial sentence).

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Emmanuel Quevillon
> Sent: Wednesday, January 13, 2010 3:25 AM
> To: bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] targetp request
> 
> On 1/6/10 7:20 PM, Vijayaraj Nagarajan wrote:
> > Hi,
> >
> > I am trying to use targetP in bioperl.
> > the documentation at the bioperl site is a bit confusing to me...
> >
> > I would appreciate if you could give a very small example, 
> as to how to use
> > "Bio::Tools::TargetP" to predict the localization of a 
> protein sequence that
> > i have stored as a string.
> >
> > Thanks,
> > Vijay
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> Dear Vivay,
> 
> Bio::Tools::TargetP is not intended to run targetp on a 
> sequence but to 
> read and parse results from targetp run.
> 
>  From the Pod doc :
> 
> DESCRIPTION
>         TargetP modules will provides parsed informations 
> about protein 
> localization.  It
>         reads in a targetp output file.  It parses the results, and 
> returns a
>         Bio::SeqFeature::Generic object for each sequences 
> found to have 
> a subcellular
>         localization
> 
> 
> So to analyze your sequence, you'll first need to run targetp on your 
> sequence file to create a targetp result output file. Then use 
> Bio::Tools::TargetP module to parse this result file and get only 
> informations you want/need from the result to be display as 
> shown in the 
> SYNOPSIS of the Pod documentation of the module.
> 
> HTH
> 
> Regards
> 
> Emmanuel
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Wed Jan 13 18:44:36 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 13 Jan 2010 13:44:36 -0500
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
Message-ID: <C85EC8A05E884B328AFDAA055341E9E2@NewLife>

Marcelo-
Yes-- look at the code snip at
http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
combined with the snip at 
http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
(using -format => 'clustalw')
cheers MAJ
----- Original Message ----- 
From: "Marcelo Iwata" <marcelo011982 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, January 13, 2010 1:12 PM
Subject: [Bioperl-l] Blast to Clustalw Format


> Hi..
> I have an simple Blast result, such as blastn.
> Is there an  scrip  to transform such result to Clustalw format in Bioperl
> ?(.aln)
> 
> Thanx for any help.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From dan.kortschak at adelaide.edu.au  Thu Jan 14 04:26:46 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 14:56:46 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
Message-ID: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>

Hi All,

I'm having a stupid problem that for some reason I just can't figure
out. I'm putting together a B:A:IO:bowtie module to wrap around the
B:A:IO:sam module so bowtie output can be used as an assembly start
point.

For some reason that is escaping me I can't create tempfiles!

What should be the relevant code in the module:

package Bio::Assembly::IO::bowtie;
use strict;
use warnings;

# Object preamble - inherits from Bio::Root::Root

use Bio::SeqIO;
use Bio::Tools::Run::Samtools;
use Bio::Assembly::IO;
use Carp;
use Bio::Root::Root;
use Bio::Root::IO;
use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );


and the line (there are a couple of others that are like to fail in the
same way, but I've not got that far)

my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
$self->tempdir(), -suffix => '.sam' );

Which dies with:
Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.

Relevant environment vars:
  DB<10> x @ISA 
0  'Bio::Root::Root'
1  'Bio::Root::IO'
2  'Bio::Assembly::IO'

DB<11> x $self
0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
   '_no_head' => undef
   '_no_sq' => undef
   '_root_verbose' => 0


Can someone suggest what I'm missing?

cheers
Dan


From maj at fortinbras.us  Thu Jan 14 05:11:01 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:11:01 -0500
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <84196F01FF584C64A79B89FECE2DD86F@NewLife>

Hey Dan-- what does your constructor look like? I wonder if something's getting 
lost in new() and _initialize() chaining spaghetti- MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Wednesday, January 13, 2010 11:26 PM
Subject: [Bioperl-l] not able to use Bio::Root::IO method


> Hi All,
>
> I'm having a stupid problem that for some reason I just can't figure
> out. I'm putting together a B:A:IO:bowtie module to wrap around the
> B:A:IO:sam module so bowtie output can be used as an assembly start
> point.
>
> For some reason that is escaping me I can't create tempfiles!
>
> What should be the relevant code in the module:
>
> package Bio::Assembly::IO::bowtie;
> use strict;
> use warnings;
>
> # Object preamble - inherits from Bio::Root::Root
>
> use Bio::SeqIO;
> use Bio::Tools::Run::Samtools;
> use Bio::Assembly::IO;
> use Carp;
> use Bio::Root::Root;
> use Bio::Root::IO;
> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>
>
> and the line (there are a couple of others that are like to fail in the
> same way, but I've not got that far)
>
> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
> $self->tempdir(), -suffix => '.sam' );
>
> Which dies with:
> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>
> Relevant environment vars:
>  DB<10> x @ISA
> 0  'Bio::Root::Root'
> 1  'Bio::Root::IO'
> 2  'Bio::Assembly::IO'
>
> DB<11> x $self
> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>   '_no_head' => undef
>   '_no_sq' => undef
>   '_root_verbose' => 0
>
>
>
> Can someone suggest what I'm missing?
>
> cheers
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Thu Jan 14 05:35:35 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 16:05:35 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <84196F01FF584C64A79B89FECE2DD86F@NewLife>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
Message-ID: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>

Thanks Mark, I'm not sure about that since @ISA still includes
Bio::Root:IO when it's at the call, but it might be.

cheers
Dan

Here is the entirety of the code (it reasonably short):

package Bio::Assembly::IO::bowtie;
use strict;
use warnings;

# Object preamble - inherits from Bio::Root::Root

use Bio::SeqIO;
use Bio::Tools::Run::Samtools;
use Bio::Assembly::IO;
use Carp;
use Bio::Root::Root;
use Bio::Root::IO;
use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );

our $HD = "\@HD\tVN:1.0\tSO:unsorted\n";
our $PG = "\@PG\tID=Bowtie\n";

our $HAVE_IO_UNCOMPRESS;
BEGIN {
# check requirements
    unless ( eval "require Bio::Tools::Run::Bowtie;") {
	Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index.");
    }
    unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") {
	Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand.");
    }
}

sub new {
	my $class = shift;
	my @args = @_;
	my $self = $class->SUPER::new(@args);
	my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args);
	$file =~ s/^<//;
	$self->{'_no_head'} = $no_head;
	$self->{'_no_sq'} = $no_sq;
	# get the sequence so samtools can work with it
	my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' );
	my $refdb = $inspector->run($index);
	my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb));
	my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' );
	return $sam;
}

sub _bowtie_to_sam {
	my ($self, $file, $refdb) = @_;

	$self->throw("'$file' does not exist or is not readable.")
		unless ( -e $file && -r $file );
	my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file);
	$self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/;

	my %SQ;
	my $mapq = 255;
	my $in_pair;
	my @mate_line;
	my $mlen;

	if ($file =~ m/\.gz[^.]*$/) {
		unless ($HAVE_IO_UNCOMPRESS) {
			croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" );
		}
		my ($tfh, $tf) = $self->io->tempfile;
		my $z = IO::Uncompress::Gunzip->new($_);
		while (<$z>) { print $tfh $_ }
		close $tfh;
		$file = $tf;
	}

        open(my $fh, $file) or
		$self->throw("Can not open '$file' for reading: $!");
            
	# create temp file for working
	my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
	
	while ($fh) {
		chomp;
		my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_);
		$SQ{$rname} = 1;
		
		my $paired_f =  ($qname =~ m#/[12]#) ? 0x03 : 0;
		my $strand_f = ($strand eq '-') ? 0x10 : 0;
		my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0;
		my $first_f =  ($qname =~ m#/1#) ? 0x40 : 0;
		my $second_f = ($qname =~ m#/2#) ? 0x80 : 0;
		my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f;

		$pos++;
		my $len = length $seq;
		die unless $len == length $qual;
		my $cigar = $len.'M';
		my @detail = split(',',$details);
		my $dist = 'NM:i:'.scalar @detail;
		
		my @mismatch;
		my $last_pos = 0;
		for (@detail) {
			m/(\d+):(\w)>\w/;
			my $err = ($1-$last_pos);
			$last_pos = $1+1;
			push @mismatch,($err,$2);
		}
		push @mismatch, $len-$last_pos;
		@mismatch = reverse @mismatch if $strand eq '-';
		my $mismatch = join('',('MD:Z:', at mismatch));

		if ($paired_f) {
			my $mrnm = '=';
			if ($in_pair) {
				my $mpos = $mate_line[3];
				$mate_line[7] = $pos;
				my $isize = $mpos-$pos-$len;
				$mate_line[8] = -$isize;
				print $sam_tmp_h join("\t", at mate_line),"\n";
				print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
				$in_pair = 0;
			} else {
				$mlen = $len;
				@mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist);
				$in_pair = 1;
			}
		} else {
			my $mrnm = '*';
			my $mpos = 0;
			my $isize = 0;
			print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
		}
	}

	close($fh);
	$sam_tmp_h->close;
	
	return $sam_tmp_f if $self->{'_no_head'};

	my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );

	# print header
	print $samh $HD;
	
	# print sequence dictionary
	unless ($self->{'_no_sq'}) {
		my $db  = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' );
		while ( my $seq = $db->next_seq() ) {
			$SQ{$seq->id} = $seq->length if $SQ{$seq->id};
		}
	
		map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ;
	}
	
	# print program
	print $samh $PG;
	
	open($sam_tmp_h, $sam_tmp_f) or
		$self->throw("Can not open '$sam_tmp_f' for reading: $!");

	print $samh $_ while ($sam_tmp_h);
	
	close($sam_tmp_h);
	$samh->close;
	
	return $samf;
}

sub _make_bam {
	my ($self, $file) = @_;
	
	$self->throw("'$file' does not exist or is not readable")
		unless ( -e $file && -r $file );

	# make a sorted bam file from a sam file input
	my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' );
	my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' );
	$_->close for ($bamh, $srth);
	
	my $samt = Bio::Tools::Run::Samtools->new( -command => 'view',
						   -sam_input => 1,
						   -bam_output => 1 );

	$samt->run( -bam => $file, -out => $bamf );

	$samt = Bio::Tools::Run::Samtools->new( -command => 'sort' );

	$samt->run( -bam => $bamf, -pfx => $srtf);

	return $srtf.'.bam'
}

1;


On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote:
> Hey Dan-- what does your constructor look like? I wonder if
> something's getting 
> lost in new() and _initialize() chaining spaghetti- MAJ
> 


From dan.kortschak at adelaide.edu.au  Thu Jan 14 05:35:48 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 16:05:48 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au>
	<B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
Message-ID: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>

I've had a bit of a play with that, but no luck.

Dan

On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote:
> I've found that rearranging the items in the 'use base' array can
> sometimes 
> recover
> lost methods. I don't know enough of the arcana to know why it works. 
> (Sometimes,
> java starts looking pretty good from here...)
> 


From maj at fortinbras.us  Thu Jan 14 05:38:00 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:38:00 -0500
Subject: [Bioperl-l] Fw:  not able to use Bio::Root::IO method
Message-ID: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>

up to list
----- Original Message ----- 
From: "Mark A. Jensen" <maj at fortinbras.us>
To: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
Sent: Thursday, January 14, 2010 12:36 AM
Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method


> Aha-- check out the pod for Bio::Root::IO:
> 
> "This module provides methods that will usually be needed for any sort
> of file- or stream-related input/output, e.g., keeping track of a file
> handle, transient printing and reading from the file handle, a close
> method, automatically closing the handle on garbage collection, etc.
> 
> To use this for your own code you will either want to inherit from
> this module, or instantiate an object for every file or stream you are
> dealing with. In the first case this module will most likely not be
> the first class off which your class inherits; therefore you need to
> call _initialize_io() with the named parameters in order to set file
> handle, open file, etc automatically."
> 
> I think you're wanting a call to $self->_initialize_io(). (There is no io() 
> method explicitly defined in any of the base classes.)
> MAJ
> ----- Original Message ----- 
> From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, January 13, 2010 11:26 PM
> Subject: [Bioperl-l] not able to use Bio::Root::IO method
> 
> 
>> Hi All,
>> 
>> I'm having a stupid problem that for some reason I just can't figure
>> out. I'm putting together a B:A:IO:bowtie module to wrap around the
>> B:A:IO:sam module so bowtie output can be used as an assembly start
>> point.
>> 
>> For some reason that is escaping me I can't create tempfiles!
>> 
>> What should be the relevant code in the module:
>> 
>> package Bio::Assembly::IO::bowtie;
>> use strict;
>> use warnings;
>> 
>> # Object preamble - inherits from Bio::Root::Root
>> 
>> use Bio::SeqIO;
>> use Bio::Tools::Run::Samtools;
>> use Bio::Assembly::IO;
>> use Carp;
>> use Bio::Root::Root;
>> use Bio::Root::IO;
>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>> 
>> 
>> and the line (there are a couple of others that are like to fail in the
>> same way, but I've not got that far)
>> 
>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
>> $self->tempdir(), -suffix => '.sam' );
>> 
>> Which dies with:
>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>> 
>> Relevant environment vars:
>>  DB<10> x @ISA 
>> 0  'Bio::Root::Root'
>> 1  'Bio::Root::IO'
>> 2  'Bio::Assembly::IO'
>> 
>> DB<11> x $self
>> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>>   '_no_head' => undef
>>   '_no_sq' => undef
>>   '_root_verbose' => 0
>> 
>> 
>> 
>> Can someone suggest what I'm missing?
>> 
>> cheers
>> Dan
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>


From maj at fortinbras.us  Thu Jan 14 05:50:11 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 00:50:11 -0500
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263446261.8016.50.camel@zoidberg.mbs.adelaide.edu.au>
	<B2799EEEE37B43F5AC5D308D5F8A765F@NewLife>
	<1263447348.8016.59.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <82BFF47099684EF496DB3875D39DCA14@NewLife>

For the benefit of the list, I categorically deny ever making the 
statement about java below....
MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 12:35 AM
Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method


> I've had a bit of a play with that, but no luck.
> 
> Dan
> 
> On Thu, 2010-01-14 at 00:26 -0500, Mark A. Jensen wrote:
>> I've found that rearranging the items in the 'use base' array can
>> sometimes 
>> recover
>> lost methods. I don't know enough of the arcana to know why it works. 
>> (Sometimes,
>> java starts looking pretty good from here...)
>> 
> 
>


From cjfields at illinois.edu  Thu Jan 14 07:23:41 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 01:23:41 -0600
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>

You can remove separate 'use' directives if they are declared with 'use base' (they will be imported then).  Also, Bio::Root::IO inherits Bio::Root::Root, and Bio::Assembly::IO should inherit from Bio::Root::IO, so the only base module you should need is Bio::Assembly::IO.  It's possible having all three is confusing the interpreter.

chris

On Jan 13, 2010, at 11:35 PM, Dan Kortschak wrote:

> Thanks Mark, I'm not sure about that since @ISA still includes
> Bio::Root:IO when it's at the call, but it might be.
> 
> cheers
> Dan
> 
> Here is the entirety of the code (it reasonably short):
> 
> package Bio::Assembly::IO::bowtie;
> use strict;
> use warnings;
> 
> # Object preamble - inherits from Bio::Root::Root
> 
> use Bio::SeqIO;
> use Bio::Tools::Run::Samtools;
> use Bio::Assembly::IO;
> use Carp;
> use Bio::Root::Root;
> use Bio::Root::IO;
> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
> 
> our $HD = "\@HD\tVN:1.0\tSO:unsorted\n";
> our $PG = "\@PG\tID=Bowtie\n";
> 
> our $HAVE_IO_UNCOMPRESS;
> BEGIN {
> # check requirements
>    unless ( eval "require Bio::Tools::Run::Bowtie;") {
> 	Bio::Root::Root->throw("Bio::Tools::Run::Bowtie is not available - cannot extract refdb from index.");
>    }
>    unless ( eval "require IO::Uncompress::Gunzip; \$HAVE_IO_UNCOMPRESS = 1") {
> 	Bio::Root::Root->warn("IO::Uncompress::Gunzip is not available; you'll have to do your decompression by hand.");
>    }
> }
> 
> sub new {
> 	my $class = shift;
> 	my @args = @_;
> 	my $self = $class->SUPER::new(@args);
> 	my ($file, $index, $no_head, $no_sq) = $self->_rearrange([qw(FILE INDEX NO_HEAD NO_SQ)], @args);
> 	$file =~ s/^<//;
> 	$self->{'_no_head'} = $no_head;
> 	$self->{'_no_sq'} = $no_sq;
> 	# get the sequence so samtools can work with it
> 	my $inspector = Bio::Tools::Run::Bowtie->new( -command => 'inspect' );
> 	my $refdb = $inspector->run($index);
> 	my $bam_file = $self->_make_bam($self->_bowtie_to_sam($file, $refdb));
> 	my $sam = Bio::Assembly::IO->new( -file => "<$bam_file", -refdb => $refdb , -format => 'sam' );
> 	return $sam;
> }
> 
> sub _bowtie_to_sam {
> 	my ($self, $file, $refdb) = @_;
> 
> 	$self->throw("'$file' does not exist or is not readable.")
> 		unless ( -e $file && -r $file );
> 	my $guesser = Bio::Tools::GuessSeqFormat->new(-file=>$file);
> 	$self->throw("'$file' is not a bowtie formatted file.") unless $guesser->guess =~ m/^bowtie$/;
> 
> 	my %SQ;
> 	my $mapq = 255;
> 	my $in_pair;
> 	my @mate_line;
> 	my $mlen;
> 
> 	if ($file =~ m/\.gz[^.]*$/) {
> 		unless ($HAVE_IO_UNCOMPRESS) {
> 			croak( "IO::Uncompress::Gunzip not available, can't expand '$_'" );
> 		}
> 		my ($tfh, $tf) = $self->io->tempfile;
> 		my $z = IO::Uncompress::Gunzip->new($_);
> 		while (<$z>) { print $tfh $_ }
> 		close $tfh;
> 		$file = $tf;
> 	}
> 
>        open(my $fh, $file) or
> 		$self->throw("Can not open '$file' for reading: $!");
> 
> 	# create temp file for working
> 	my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
> 	
> 	while ($fh) {
> 		chomp;
> 		my ($qname,$strand,$rname,$pos,$seq,$qual,$m,$details)=split("\t",$_);
> 		$SQ{$rname} = 1;
> 		
> 		my $paired_f =  ($qname =~ m#/[12]#) ? 0x03 : 0;
> 		my $strand_f = ($strand eq '-') ? 0x10 : 0;
> 		my $op_strand_f = ($strand eq '+' && $paired_f) ? 0x20 : 0;
> 		my $first_f =  ($qname =~ m#/1#) ? 0x40 : 0;
> 		my $second_f = ($qname =~ m#/2#) ? 0x80 : 0;
> 		my $flag = $paired_f | $strand_f | $op_strand_f | $first_f | $second_f;
> 
> 		$pos++;
> 		my $len = length $seq;
> 		die unless $len == length $qual;
> 		my $cigar = $len.'M';
> 		my @detail = split(',',$details);
> 		my $dist = 'NM:i:'.scalar @detail;
> 		
> 		my @mismatch;
> 		my $last_pos = 0;
> 		for (@detail) {
> 			m/(\d+):(\w)>\w/;
> 			my $err = ($1-$last_pos);
> 			$last_pos = $1+1;
> 			push @mismatch,($err,$2);
> 		}
> 		push @mismatch, $len-$last_pos;
> 		@mismatch = reverse @mismatch if $strand eq '-';
> 		my $mismatch = join('',('MD:Z:', at mismatch));
> 
> 		if ($paired_f) {
> 			my $mrnm = '=';
> 			if ($in_pair) {
> 				my $mpos = $mate_line[3];
> 				$mate_line[7] = $pos;
> 				my $isize = $mpos-$pos-$len;
> 				$mate_line[8] = -$isize;
> 				print $sam_tmp_h join("\t", at mate_line),"\n";
> 				print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
> 				$in_pair = 0;
> 			} else {
> 				$mlen = $len;
> 				@mate_line = ($qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, undef, undef, $seq, $qual, $mismatch, $dist);
> 				$in_pair = 1;
> 			}
> 		} else {
> 			my $mrnm = '*';
> 			my $mpos = 0;
> 			my $isize = 0;
> 			print $sam_tmp_h join("\t",$qname, $flag, $rname, $pos, $mapq, $cigar, $mrnm, $mpos, $isize, $seq, $qual, $mismatch, $dist),"\n";
> 		}
> 	}
> 
> 	close($fh);
> 	$sam_tmp_h->close;
> 	
> 	return $sam_tmp_f if $self->{'_no_head'};
> 
> 	my ($samh, $samf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.sam' );
> 
> 	# print header
> 	print $samh $HD;
> 	
> 	# print sequence dictionary
> 	unless ($self->{'_no_sq'}) {
> 		my $db  = Bio::SeqIO->new( -file => $refdb, -format => 'fasta' );
> 		while ( my $seq = $db->next_seq() ) {
> 			$SQ{$seq->id} = $seq->length if $SQ{$seq->id};
> 		}
> 	
> 		map { print $samh join("\t", ('@SQ', "SN:$_", "LN:$SQ{$_}")), "\n" } keys %SQ;
> 	}
> 	
> 	# print program
> 	print $samh $PG;
> 	
> 	open($sam_tmp_h, $sam_tmp_f) or
> 		$self->throw("Can not open '$sam_tmp_f' for reading: $!");
> 
> 	print $samh $_ while ($sam_tmp_h);
> 	
> 	close($sam_tmp_h);
> 	$samh->close;
> 	
> 	return $samf;
> }
> 
> sub _make_bam {
> 	my ($self, $file) = @_;
> 	
> 	$self->throw("'$file' does not exist or is not readable")
> 		unless ( -e $file && -r $file );
> 
> 	# make a sorted bam file from a sam file input
> 	my ($bamh, $bamf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.bam' );
> 	my ($srth, $srtf) = $self->io->tempfile( -dir => $self->tempdir(), -suffix => '.srt' );
> 	$_->close for ($bamh, $srth);
> 	
> 	my $samt = Bio::Tools::Run::Samtools->new( -command => 'view',
> 						   -sam_input => 1,
> 						   -bam_output => 1 );
> 
> 	$samt->run( -bam => $file, -out => $bamf );
> 
> 	$samt = Bio::Tools::Run::Samtools->new( -command => 'sort' );
> 
> 	$samt->run( -bam => $bamf, -pfx => $srtf);
> 
> 	return $srtf.'.bam'
> }
> 
> 1;
> 
> 
> On Thu, 2010-01-14 at 00:11 -0500, Mark A. Jensen wrote:
>> Hey Dan-- what does your constructor look like? I wonder if
>> something's getting 
>> lost in new() and _initialize() chaining spaghetti- MAJ
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jan 14 07:25:05 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 01:25:05 -0600
Subject: [Bioperl-l] Fw:  not able to use Bio::Root::IO method
In-Reply-To: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>
References: <59E0A4EAF5934DC6BDCA7D8E98DB085F@NewLife>
Message-ID: <1DB926E1-9C6F-4B96-8D7E-28317DD7DE42@illinois.edu>

Yes, that's true.  The call to an io() is a Bio::Tools::Run::WrapperBase thing (the io() is a Bio::Root::IO instance).

chris

On Jan 13, 2010, at 11:38 PM, Mark A. Jensen wrote:

> up to list
> ----- Original Message ----- From: "Mark A. Jensen" <maj at fortinbras.us>
> To: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
> Sent: Thursday, January 14, 2010 12:36 AM
> Subject: Re: [Bioperl-l] not able to use Bio::Root::IO method
> 
> 
>> Aha-- check out the pod for Bio::Root::IO:
>> "This module provides methods that will usually be needed for any sort
>> of file- or stream-related input/output, e.g., keeping track of a file
>> handle, transient printing and reading from the file handle, a close
>> method, automatically closing the handle on garbage collection, etc.
>> To use this for your own code you will either want to inherit from
>> this module, or instantiate an object for every file or stream you are
>> dealing with. In the first case this module will most likely not be
>> the first class off which your class inherits; therefore you need to
>> call _initialize_io() with the named parameters in order to set file
>> handle, open file, etc automatically."
>> I think you're wanting a call to $self->_initialize_io(). (There is no io() method explicitly defined in any of the base classes.)
>> MAJ
>> ----- Original Message ----- From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, January 13, 2010 11:26 PM
>> Subject: [Bioperl-l] not able to use Bio::Root::IO method
>>> Hi All,
>>> I'm having a stupid problem that for some reason I just can't figure
>>> out. I'm putting together a B:A:IO:bowtie module to wrap around the
>>> B:A:IO:sam module so bowtie output can be used as an assembly start
>>> point.
>>> For some reason that is escaping me I can't create tempfiles!
>>> What should be the relevant code in the module:
>>> package Bio::Assembly::IO::bowtie;
>>> use strict;
>>> use warnings;
>>> # Object preamble - inherits from Bio::Root::Root
>>> use Bio::SeqIO;
>>> use Bio::Tools::Run::Samtools;
>>> use Bio::Assembly::IO;
>>> use Carp;
>>> use Bio::Root::Root;
>>> use Bio::Root::IO;
>>> use base qw( Bio::Root::Root Bio::Root::IO Bio::Assembly::IO );
>>> and the line (there are a couple of others that are like to fail in the
>>> same way, but I've not got that far)
>>> my ($sam_tmp_h, $sam_tmp_f) = $self->io->tempfile( -dir =>
>>> $self->tempdir(), -suffix => '.sam' );
>>> Which dies with:
>>> Can't locate object method "io" via package "Bio::Assembly::IO::bowtie"
>>> at /usr/local/share/perl/5.10.0/Bio/Assembly/IO/bowtie.pm line 175.
>>> Relevant environment vars:
>>> DB<10> x @ISA 0  'Bio::Root::Root'
>>> 1  'Bio::Root::IO'
>>> 2  'Bio::Assembly::IO'
>>> DB<11> x $self
>>> 0  Bio::Assembly::IO::bowtie=HASH(0x2d226d8)
>>>  '_no_head' => undef
>>>  '_no_sq' => undef
>>>  '_root_verbose' => 0
>>> Can someone suggest what I'm missing?
>>> cheers
>>> Dan
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Thu Jan 14 07:59:20 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 14 Jan 2010 18:29:20 +1030
Subject: [Bioperl-l] not able to use Bio::Root::IO method
In-Reply-To: <B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>
References: <1263443206.8016.47.camel@zoidberg.mbs.adelaide.edu.au>
	<84196F01FF584C64A79B89FECE2DD86F@NewLife>
	<1263447335.8016.57.camel@zoidberg.mbs.adelaide.edu.au>
	<B981B792-3A93-4C7E-84A5-75BAC59E5B60@illinois.edu>
Message-ID: <1263455960.4630.3.camel@epistle>

Thanks Chris,

I've done that, and since the inheritance is direct (rather than being a
constructed attribute in the object hash) the calls are $obj->temp*
rather than the $obj->io->temp* that I was using.

It works now and is much clearer having gotten rid of much of the
declarations.

cheers
Dan

On Thu, 2010-01-14 at 01:23 -0600, Chris Fields wrote:
> You can remove separate 'use' directives if they are declared with
> 'use base' (they will be imported then).  Also, Bio::Root::IO inherits
> Bio::Root::Root, and Bio::Assembly::IO should inherit from
> Bio::Root::IO, so the only base module you should need is
> Bio::Assembly::IO.  It's possible having all three is confusing the
> interpreter.
> 
> chris


From marcelo011982 at gmail.com  Thu Jan 14 13:44:25 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Thu, 14 Jan 2010 11:44:25 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <C85EC8A05E884B328AFDAA055341E9E2@NewLife>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
	<C85EC8A05E884B328AFDAA055341E9E2@NewLife>
Message-ID: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>

Thanks Mark.
I think that most of you already know it.
But , i'll put it for new users:


#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
use Bio::AlignIO;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => '
../../fontes/exemplos/blat/teste2/output.blast ');
my $aln;
my $alnIO;
$alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
while ( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while ( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while ( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
      $aln = $hsp->get_aln;
      $alnIO->write_aln($aln);

    }
  }
}


On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> Marcelo-
> Yes-- look at the code snip at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
> combined with the snip at
> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
> (using -format => 'clustalw')
> cheers MAJ
> ----- Original Message ----- From: "Marcelo Iwata" <
> marcelo011982 at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Wednesday, January 13, 2010 1:12 PM
> Subject: [Bioperl-l] Blast to Clustalw Format
>
>
>  Hi..
>> I have an simple Blast result, such as blastn.
>> Is there an  scrip  to transform such result to Clustalw format in Bioperl
>> ?(.aln)
>>
>> Thanx for any help.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>


From marcelo011982 at gmail.com  Thu Jan 14 13:46:21 2010
From: marcelo011982 at gmail.com (Marcelo Iwata)
Date: Thu, 14 Jan 2010 11:46:21 -0200
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com>
	<C85EC8A05E884B328AFDAA055341E9E2@NewLife>
	<1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
Message-ID: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>

Sorry , the correct code is:


#!/usr/bin/perl -w

use strict;
use Bio::SearchIO;
use Bio::AlignIO;

my $in = new Bio::SearchIO(-format => 'blast',
                           -file   => '
../../fontes/exemplos/blat/teste2/output.blast ');
my $aln;
my $alnIO;
$alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
while ( my $result = $in->next_result ) {
  ## $result is a Bio::Search::Result::ResultI compliant object
  while ( my $hit = $result->next_hit ) {
    ## $hit is a Bio::Search::Hit::HitI compliant object
    while ( my $hsp = $hit->next_hsp ) {
      ## $hsp is a Bio::Search::HSP::HSPI compliant object
      $aln = $hsp->get_aln;
      $alnIO->write_aln($aln);

    }
  }
}


On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata <marcelo011982 at gmail.com>wrote:

> Thanks Mark.
> I think that most of you already know it.
> But , i'll put it for new users:
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
> use Bio::AlignIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
>                            -file   => '
> ../../fontes/exemplos/blat/teste2/output.blast ');
> my $aln;
> my $alnIO;
> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
> while ( my $result = $in->next_result ) {
>   ## $result is a Bio::Search::Result::ResultI compliant object
>   while ( my $hit = $result->next_hit ) {
>     ## $hit is a Bio::Search::Hit::HitI compliant object
>     while ( my $hsp = $hit->next_hsp ) {
>       ## $hsp is a Bio::Search::HSP::HSPI compliant object
>       $aln = $hsp->get_aln;
>       $alnIO->write_aln($aln);
>
>
>     }
>   }
> }
>
>
> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>> Marcelo-
>> Yes-- look at the code snip at
>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>> combined with the snip at
>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>> (using -format => 'clustalw')
>> cheers MAJ
>> ----- Original Message ----- From: "Marcelo Iwata" <
>> marcelo011982 at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Wednesday, January 13, 2010 1:12 PM
>> Subject: [Bioperl-l] Blast to Clustalw Format
>>
>>
>>  Hi..
>>> I have an simple Blast result, such as blastn.
>>> Is there an  scrip  to transform such result to Clustalw format in
>>> Bioperl
>>> ?(.aln)
>>>
>>> Thanx for any help.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>>
>


From maj at fortinbras.us  Thu Jan 14 13:54:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 08:54:31 -0500
Subject: [Bioperl-l] Blast to Clustalw Format
In-Reply-To: <1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>
References: <1c9f28971001131012w65378e9bw12d892e874028056@mail.gmail.com><C85EC8A05E884B328AFDAA055341E9E2@NewLife><1c9f28971001140544v715fc9acue40f836a31529307@mail.gmail.com>
	<1c9f28971001140546r275a0b67pec68de5ab8bff015@mail.gmail.com>
Message-ID: <1B8891488AA746F49BCAAB531FBE4D0B@NewLife>

Thanks Marcelo-- code snips always appreciated! MAJ
----- Original Message ----- 
From: "Marcelo Iwata" <marcelo011982 at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 8:46 AM
Subject: Re: [Bioperl-l] Blast to Clustalw Format


> Sorry , the correct code is:
>
>
>
> #!/usr/bin/perl -w
>
> use strict;
> use Bio::SearchIO;
> use Bio::AlignIO;
>
> my $in = new Bio::SearchIO(-format => 'blast',
>                           -file   => '
> ../../fontes/exemplos/blat/teste2/output.blast ');
> my $aln;
> my $alnIO;
> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
> while ( my $result = $in->next_result ) {
>  ## $result is a Bio::Search::Result::ResultI compliant object
>  while ( my $hit = $result->next_hit ) {
>    ## $hit is a Bio::Search::Hit::HitI compliant object
>    while ( my $hsp = $hit->next_hsp ) {
>      ## $hsp is a Bio::Search::HSP::HSPI compliant object
>      $aln = $hsp->get_aln;
>      $alnIO->write_aln($aln);
>
>    }
>  }
> }
>
>
> On Thu, Jan 14, 2010 at 11:44 AM, Marcelo Iwata 
> <marcelo011982 at gmail.com>wrote:
>
>> Thanks Mark.
>> I think that most of you already know it.
>> But , i'll put it for new users:
>>
>>
>> #!/usr/bin/perl -w
>>
>> use strict;
>> use Bio::SearchIO;
>> use Bio::AlignIO;
>>
>> my $in = new Bio::SearchIO(-format => 'blast',
>>                            -file   => '
>> ../../fontes/exemplos/blat/teste2/output.blast ');
>> my $aln;
>> my $alnIO;
>> $alnIO = Bio::AlignIO->new(-format =>"clustalw", -file => ">hsp.aln");
>> while ( my $result = $in->next_result ) {
>>   ## $result is a Bio::Search::Result::ResultI compliant object
>>   while ( my $hit = $result->next_hit ) {
>>     ## $hit is a Bio::Search::Hit::HitI compliant object
>>     while ( my $hsp = $hit->next_hsp ) {
>>       ## $hsp is a Bio::Search::HSP::HSPI compliant object
>>       $aln = $hsp->get_aln;
>>       $alnIO->write_aln($aln);
>>
>>
>>     }
>>   }
>> }
>>
>>
>> On Wed, Jan 13, 2010 at 4:44 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>> Marcelo-
>>> Yes-- look at the code snip at
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_SearchIO
>>> combined with the snip at
>>> http://www.bioperl.org/wiki/HOWTO:SearchIO#Using_the_methods
>>> (using -format => 'clustalw')
>>> cheers MAJ
>>> ----- Original Message ----- From: "Marcelo Iwata" <
>>> marcelo011982 at gmail.com>
>>> To: <bioperl-l at lists.open-bio.org>
>>> Sent: Wednesday, January 13, 2010 1:12 PM
>>> Subject: [Bioperl-l] Blast to Clustalw Format
>>>
>>>
>>>  Hi..
>>>> I have an simple Blast result, such as blastn.
>>>> Is there an  scrip  to transform such result to Clustalw format in
>>>> Bioperl
>>>> ?(.aln)
>>>>
>>>> Thanx for any help.
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From sidd.basu at gmail.com  Thu Jan 14 19:15:04 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 13:15:04 -0600
Subject: [Bioperl-l] reading blast report
Message-ID: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>

Hi, 
I have a script that reads a tblastn report(13000 records) and loads in
a chado database(Bio::Chado::Schema module),  however the machine runs of memory. I am trying to figure 
out other than loading the database stuff 
if it the reading of SearchIO module could consume a lot of memory. So,
when i am reading a blast file and getting the result object ....

while (my $result = $searchio->next_result)

* Does the searchio object loads a huge chunk of file in the memory or
  for each iteration it only reads a part of the result.

* Does doing an index on blast report and then reading from it be much
  faster and why. And is there any way i could iterate through each
  record in the index,  will that be helpful.

-siddhartha


From jason at bioperl.org  Thu Jan 14 19:53:29 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 11:53:29 -0800
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
Message-ID: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>

What aspects of the report are you loading?  You might consider the  
blast report as tab-delimited (-m 8 format) if you only are interested  
in start/end positions and scores of ailgnments which is a simpler and  
reduced dataset that has lower memory footprint by the parser.

Searchio (default) -format => blast - you can try the BLAST -format =>  
blast_pull instead which lazy parses to create objects and will reduce  
memory consumption.

-jason
On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:

> Hi,
> I have a script that reads a tblastn report(13000 records) and loads  
> in
> a chado database(Bio::Chado::Schema module),  however the machine  
> runs of memory. I am trying to figure
> out other than loading the database stuff
> if it the reading of SearchIO module could consume a lot of memory.  
> So,
> when i am reading a blast file and getting the result object ....
>
> while (my $result = $searchio->next_result)
>
> * Does the searchio object loads a huge chunk of file in the memory or
>  for each iteration it only reads a part of the result.
>
> * Does doing an index on blast report and then reading from it be much
>  faster and why. And is there any way i could iterate through each
>  record in the index,  will that be helpful.
>
> -siddhartha
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From sidd.basu at gmail.com  Thu Jan 14 20:15:45 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 14:15:45 -0600
Subject: [Bioperl-l]  Re: reading blast report
In-Reply-To: <83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
Message-ID: <4b4f7b74.5744f10a.7087.4813@mx.google.com>

On Thu, 14 Jan 2010, Jason Stajich wrote:

> What aspects of the report are you loading?  You might consider the blast 
> report as tab-delimited (-m 8 format) if you only are interested in 
> start/end positions and scores of ailgnments which is a simpler and reduced 
> dataset that has lower memory footprint by the parser.

I think this would be a better approach i am mostly interested in
start/end/score data only.

>
> Searchio (default) -format => blast - you can try the BLAST -format => 
> blast_pull instead which lazy parses to create objects and will reduce 
> memory consumption.

It's another good option though. But just out of curosity,  so the
regular blast parser do load the entire file in the memory consider the
output consist of multiple Results concatenated together into a
single file. Could anybody clarify.

thanks, 
-siddhartha


>
> -jason
> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
>
> > Hi,
> > I have a script that reads a tblastn report(13000 records) and loads in
> > a chado database(Bio::Chado::Schema module),  however the machine runs of 
> > memory. I am trying to figure
> > out other than loading the database stuff
> > if it the reading of SearchIO module could consume a lot of memory. So,
> > when i am reading a blast file and getting the result object ....
> >
> > while (my $result = $searchio->next_result)
> >
> > * Does the searchio object loads a huge chunk of file in the memory or
> >  for each iteration it only reads a part of the result.
> >
> > * Does doing an index on blast report and then reading from it be much
> >  faster and why. And is there any way i could iterate through each
> >  record in the index,  will that be helpful.
> >
> > -siddhartha
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>


From jason at bioperl.org  Thu Jan 14 21:28:29 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 13:28:29 -0800
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
Message-ID: <CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>


On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
>
>> What aspects of the report are you loading?  You might consider the  
>> blast
>> report as tab-delimited (-m 8 format) if you only are interested in
>> start/end positions and scores of ailgnments which is a simpler and  
>> reduced
>> dataset that has lower memory footprint by the parser.
>
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
>
>>
>> Searchio (default) -format => blast - you can try the BLAST -format  
>> =>
>> blast_pull instead which lazy parses to create objects and will  
>> reduce
>> memory consumption.
>
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider  
> the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.
>
> thanks,
> -siddhartha

Each result is parsed (1 result per query) and all the hits and HSPs  
are parsed and brought into memory with the standard (non-pull)  
approach.
The SearchIO iterates at the level of result - that is why you call  
next_result which parses each one at a time.

>
>
>>
>> -jason
>> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
>>
>>> Hi,
>>> I have a script that reads a tblastn report(13000 records) and  
>>> loads in
>>> a chado database(Bio::Chado::Schema module),  however the machine  
>>> runs of
>>> memory. I am trying to figure
>>> out other than loading the database stuff
>>> if it the reading of SearchIO module could consume a lot of  
>>> memory. So,
>>> when i am reading a blast file and getting the result object ....
>>>
>>> while (my $result = $searchio->next_result)
>>>
>>> * Does the searchio object loads a huge chunk of file in the  
>>> memory or
>>> for each iteration it only reads a part of the result.
>>>
>>> * Does doing an index on blast report and then reading from it be  
>>> much
>>> faster and why. And is there any way i could iterate through each
>>> record in the index,  will that be helpful.
>>>
>>> -siddhartha
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From sidd.basu at gmail.com  Thu Jan 14 21:40:42 2010
From: sidd.basu at gmail.com (Siddhartha Basu)
Date: Thu, 14 Jan 2010 15:40:42 -0600
Subject: [Bioperl-l]  Re: reading blast report
In-Reply-To: <CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
	<CC9082A4-1D93-49E1-916D-2C61FBD29FA5@bioperl.org>
Message-ID: <4b4f8f5d.5644f10a.2be2.47dc@mx.google.com>

Thanks jason for clarification.

On Thu, 14 Jan 2010, Jason Stajich wrote:

>
> On Jan 14, 2010, at 12:15 PM, Siddhartha Basu wrote:
>
> > On Thu, 14 Jan 2010, Jason Stajich wrote:
> >
> >> What aspects of the report are you loading?  You might consider the blast
> >> report as tab-delimited (-m 8 format) if you only are interested in
> >> start/end positions and scores of ailgnments which is a simpler and 
> >> reduced
> >> dataset that has lower memory footprint by the parser.
> >
> > I think this would be a better approach i am mostly interested in
> > start/end/score data only.
> >
> >>
> >> Searchio (default) -format => blast - you can try the BLAST -format =>
> >> blast_pull instead which lazy parses to create objects and will reduce
> >> memory consumption.
> >
> > It's another good option though. But just out of curosity,  so the
> > regular blast parser do load the entire file in the memory consider the
> > output consist of multiple Results concatenated together into a
> > single file. Could anybody clarify.
> >
> > thanks,
> > -siddhartha
>
> Each result is parsed (1 result per query) and all the hits and HSPs are 
> parsed and brought into memory with the standard (non-pull) approach.
> The SearchIO iterates at the level of result - that is why you call 
> next_result which parses each one at a time.
>
> >
> >
> >>
> >> -jason
> >> On Jan 14, 2010, at 11:15 AM, Siddhartha Basu wrote:
> >>
> >>> Hi,
> >>> I have a script that reads a tblastn report(13000 records) and loads in
> >>> a chado database(Bio::Chado::Schema module),  however the machine runs 
> >>> of
> >>> memory. I am trying to figure
> >>> out other than loading the database stuff
> >>> if it the reading of SearchIO module could consume a lot of memory. So,
> >>> when i am reading a blast file and getting the result object ....
> >>>
> >>> while (my $result = $searchio->next_result)
> >>>
> >>> * Does the searchio object loads a huge chunk of file in the memory or
> >>> for each iteration it only reads a part of the result.
> >>>
> >>> * Does doing an index on blast report and then reading from it be much
> >>> faster and why. And is there any way i could iterate through each
> >>> record in the index,  will that be helpful.
> >>>
> >>> -siddhartha
> >>>
> >>>
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >> --
> >> Jason Stajich
> >> jason.stajich at gmail.com
> >> jason at bioperl.org
> >> http://fungalgenomes.org/
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
>


From SMarkel at accelrys.com  Thu Jan 14 22:58:06 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Thu, 14 Jan 2010 14:58:06 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>

We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
from our customers.  Due to network irregularities (not sure what else
to call it) users see the getting of remote BLAST results as somewhat
random.  When results come back the hits are fine, but sometimes no
information comes back at all.  Retrying helps.

In looking at RemoteBlast.pm there are four "return -1" cases.

* $status eq 'ERROR'      (return on line 614)
* $line =~ /ERROR/I       (return on line 628)
* !$got_content           (return on line 648)
* !$response->is_success  (return on line 655)

In the case of no content we'd like to retry remote BLAST.  We're happy
to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
module, but we only want to retry in that case, not the other three.

What would happen if that third "return -1" changed to a different
return value?

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


From nickjd at gmail.com  Wed Jan 13 13:18:12 2010
From: nickjd at gmail.com (NickJD)
Date: Wed, 13 Jan 2010 05:18:12 -0800 (PST)
Subject: [Bioperl-l] Parsing PSI-BLAST results with SearchIO
Message-ID: <65554589-081b-4297-ab68-9ddfbd3d9944@c34g2000yqn.googlegroups.com>

I am trying to parse PSI-BLAST results using SearchIO and some very
basic code just to read the number of hits, number of hsps, etc.  I
have done 10 rounds on 1 input sequence and parsed it but it seems to
treat each round as a separate result, so round/iteration is always 1
and new_hits its always the total list not the ones that are new to
that round.  Does anyone have any experience of this?

Thanks,

Nick


From dsidote at waksman.rutgers.edu  Wed Jan 13 15:08:48 2010
From: dsidote at waksman.rutgers.edu (David J Sidote)
Date: Wed, 13 Jan 2010 10:08:48 -0500
Subject: [Bioperl-l] Bioinformatician position - Waksman Institute
Message-ID: <4b42af671001130708i703ecce0u47348484321714f@mail.gmail.com>

Bioinformatician ? Research Assistant Professor


The Waksman Institute of Microbiology located on the New Brunswick campus of
Rutgers University is seeking a highly motivated and talented bioinformatics
scientist for an Research Assistant Professor appointment.  The successful
candidate will analyze genome, transcriptome, and epigenome data generated
on the Life Sciences 454, Illumina, and AB SOLiD high-throughput sequencing
platforms. Excellent communication and teamwork skills are essential as the
successful candidate will work closely with individual research groups to
develop software to facilitate the visualization, quantification, and
interpretation of the data. The successful candidate will be expected to
contribute to the publication of scientific literature and to present at
seminars and conferences.


Qualifications:


-       PhD in molecular biology, genetics, bioinformatics, systems biology
or other related fields; candidates with a PhD in physics, mathematics, or
computer science with some working knowledge of biology and experience are
encouraged to apply.

-       Demonstrated scientific track record

-       Highly proficient in perl, python, or ruby programming, linux/unix
scripting, and SQL.

-       Experience with R is desirable but not required

-       Experience with high-throughput sequencing, microarrays, or other
high-throughput biological platforms

-       Excellent communication and organizational skills


How to Apply:


Please send a cover letter stating your current research interests, why you
are interested in this position, and how your skill set complements this
position along with a curriculum vitae, and the names and contact
information of three references to hr at waksman.rutgers.edu. Please include
"Bioinformatics Assistant Research Professor" in the subject line. Rutgers
is an equal opportunity employer.


For more information about this position please contact:

Dr. David Sidote (dsidote at waksman.rutgers.edu)


From albezg at gmail.com  Thu Jan 14 01:57:27 2010
From: albezg at gmail.com (albezg)
Date: Wed, 13 Jan 2010 20:57:27 -0500
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment with
 negative PDB ranges
In-Reply-To: <49C405F0.5050100@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com>
Message-ID: <4B4E7A07.7070805@gmail.com>

Hi all,

I have a problem using AlignIO to read Pfam database:
ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
The database is in STOCKHOLM 1.0 format. AlignIO can read the alignment 
OK until the alignment PF00331.13. There it crashes with the following 
message:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: '1-344' is not an integer.

STACK: Error::throw
STACK: Bio::Root::Root::throw 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
STACK: Bio::Range::end 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
STACK: Bio::Annotation::Target::new 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
STACK: Bio::AlignIO::stockholm::next_aln 
/home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
STACK: /home/albezg/scripts/pfam2fasta.pl:22
-----------------------------------------------------------

It appears this is caused by this entry:
#=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;

I don't care about residues in PDB, so I have just removed minus signs 
from the ranges. This seems to have fixed the crashing.

Is it a known problem? Is there a solution for it?

Thanks,
Alexandr


On 03/20/2009 05:09 PM, albezg wrote:
>
> I'm trying to change FASTA header(display_id) for a sequence in an
> alignment(SimpleAlign).
>
> There are no issues when I print it, however when I use AlignIO to write
> the alignment to a FASTA file, it does not work. Is this behavior intended?
>
> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>
> The error:
> ------------- EXCEPTION -------------
> MSG: No sequence with name [1/1-11]
> STACK Bio::SimpleAlign::displayname
> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
> STACK Bio::AlignIO::fasta::write_aln
> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
> STACK toplevel ./demo.pl:14
> -------------------------------------
>
> Alexandr


From mitch_skinner at berkeley.edu  Thu Jan 14 22:10:53 2010
From: mitch_skinner at berkeley.edu (Mitch Skinner)
Date: Thu, 14 Jan 2010 14:10:53 -0800
Subject: [Bioperl-l] filter_by_location in Bio::DB::SeqFeature::Store::memory
Message-ID: <4B4F966D.3030300@berkeley.edu>

Hi,

Some people haven't been getting all of the features in their GFF3 into 
JBrowse, and a nice test case that James Casbon posted to the list 
helped me track it down.

Here's an example of the behavior I was seeing with BioPerl 1.6.1 (using 
Devel::REPL):

==============
$ use Bio::DB::SeqFeature::Store

$ my $db = Bio::DB::SeqFeature::Store->new(-adaptor=>"memory", 
-dsn=>"casbon.gff3")
$Bio_DB_SeqFeature_Store_memory1 = 
Bio::DB::SeqFeature::Store::memory=HASH(0xa27ceec);

$ $db->features(-seq_id=>"CYP2C8")
$ARRAY1 = [
             Feature:src(41),
             region(CYP2C8),
             Feature:src(37),
             Feature:src(39),
             Feature:src(42),
             Feature:src(40),
             Feature:src(38)
           ];
==============

I expected to also see the features with IDs 43 and 44 (the gff3 file is 
attached).

I think there's a problem in the filter_by_location method.  If start 
and end parameters aren't passed to the method, it sets default start 
and end values that lead it to examine all of the bins in its index.  
But the end value that it creates is at the beginning of the last bin, 
and I think it should be at the end of the last bin instead.  The 
attached patch changes it to be at the end of the last bin.

Regards,
Mitch
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: casbon.gff3
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100114/8494aaa7/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: bdsfsm-filter_by_location.patch
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100114/8494aaa7/attachment-0001.ksh>

From jason at bioperl.org  Fri Jan 15 00:20:43 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 14 Jan 2010 16:20:43 -0800
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <4B4E7A07.7070805@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
Message-ID: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>

Seems like improper data really -- "-1" is an improper coordinate as  
far as the parser is concerned. You may want to tell Pfam that there  
is possible error in the dumper since that was the only record that  
had this problem?

-jason
On Jan 13, 2010, at 5:57 PM, albezg wrote:

> Hi all,
>
> I have a problem using AlignIO to read Pfam database:
> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
> The database is in STOCKHOLM 1.0 format. AlignIO can read the  
> alignment OK until the alignment PF00331.13. There it crashes with  
> the following message:
>
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: '1-344' is not an integer.
>
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /home/albezg/lib/perl5/site_perl/ 
> 5.10.0/Bio/Root/Root.pm:368
> STACK: Bio::Range::end /home/albezg/lib/perl5/site_perl/5.10.0/Bio/ 
> Range.pm:228
> STACK: Bio::Annotation::Target::new /home/albezg/lib/perl5/site_perl/ 
> 5.10.0/Bio/Annotation/Target.pm:82
> STACK:  
> Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target /home/ 
> albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
> GenericAlignHandler.pm:293
> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler / 
> home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
> GenericAlignHandler.pm:73
> STACK: Bio::AlignIO::stockholm::next_aln /home/albezg/lib/perl5/ 
> site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
> STACK: /home/albezg/scripts/pfam2fasta.pl:22
> -----------------------------------------------------------
>
> It appears this is caused by this entry:
> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>
> I don't care about residues in PDB, so I have just removed minus  
> signs from the ranges. This seems to have fixed the crashing.
>
> Is it a known problem? Is there a solution for it?
>
> Thanks,
> Alexandr
>
>
> On 03/20/2009 05:09 PM, albezg wrote:
>>
>> I'm trying to change FASTA header(display_id) for a sequence in an
>> alignment(SimpleAlign).
>>
>> There are no issues when I print it, however when I use AlignIO to  
>> write
>> the alignment to a FASTA file, it does not work. Is this behavior  
>> intended?
>>
>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>
>> The error:
>> ------------- EXCEPTION -------------
>> MSG: No sequence with name [1/1-11]
>> STACK Bio::SimpleAlign::displayname
>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>> STACK Bio::AlignIO::fasta::write_aln
>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>> STACK toplevel ./demo.pl:14
>> -------------------------------------
>>
>> Alexandr
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From maj at fortinbras.us  Fri Jan 15 02:00:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 14 Jan 2010 21:00:31 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
Message-ID: <CD613D33411040F8921DE3098FD6DF41@NewLife>

How about returning 1, 2, 4 for the non-zero cases, with some
error constants set for convenience? MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Thursday, January 14, 2010 5:58 PM
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From cjfields at illinois.edu  Fri Jan 15 00:42:31 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 14 Jan 2010 18:42:31 -0600
Subject: [Bioperl-l] reading blast report
In-Reply-To: <4b4f7b74.5744f10a.7087.4813@mx.google.com>
References: <4b4f6d3b.5544f10a.40b1.4126@mx.google.com>
	<83075C66-90CC-4250-890E-FE42F5FEB019@bioperl.org>
	<4b4f7b74.5744f10a.7087.4813@mx.google.com>
Message-ID: <0B76CCA7-C37C-4E24-BBDF-C8FD805DBBF2@illinois.edu>


On Jan 14, 2010, at 2:15 PM, Siddhartha Basu wrote:

> On Thu, 14 Jan 2010, Jason Stajich wrote:
> 
>> What aspects of the report are you loading?  You might consider the blast 
>> report as tab-delimited (-m 8 format) if you only are interested in 
>> start/end positions and scores of ailgnments which is a simpler and reduced 
>> dataset that has lower memory footprint by the parser.
> 
> I think this would be a better approach i am mostly interested in
> start/end/score data only.
> 
>> Searchio (default) -format => blast - you can try the BLAST -format => 
>> blast_pull instead which lazy parses to create objects and will reduce 
>> memory consumption.
> 
> It's another good option though. But just out of curosity,  so the
> regular blast parser do load the entire file in the memory consider the
> output consist of multiple Results concatenated together into a
> single file. Could anybody clarify.

Yes, the original SearchIO parsers all load the data into objects.  This was based on the presumption that one wouldn't want very large BLAST reports, but this assumption probably isn't amenable today.  The pull parser is one aswer to that, in it pulls the data only upon request (creates them on the fly), so it should be more amenable to parsing very large BLAST reports.

> thanks, 
> -siddhartha
> 
>> -jason

chris


From cjfields at illinois.edu  Fri Jan 15 06:33:50 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Jan 2010 00:33:50 -0600
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
Message-ID: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>

Scott,

I think this is fine (to change the third condition and retry with a specific code).  The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance).

One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3.  Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.

chris

PS - BTW, nice to finally meet you at GMOD!

On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:

> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields1 at gmail.com  Fri Jan 15 06:35:35 2010
From: cjfields1 at gmail.com (Christopher Fields)
Date: Fri, 15 Jan 2010 00:35:35 -0600
Subject: [Bioperl-l] filter_by_location in
	Bio::DB::SeqFeature::Store::memory
In-Reply-To: <4B4F966D.3030300@berkeley.edu>
References: <4B4F966D.3030300@berkeley.edu>
Message-ID: <992796AC-B85B-4555-88A1-36000C0A2002@gmail.com>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20100115/b772ee67/attachment-0004.html>

From David.Messina at sbc.su.se  Fri Jan 15 15:17:14 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 15 Jan 2010 16:17:14 +0100
Subject: [Bioperl-l] getting/setting species names with Bio::Species
Message-ID: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>

Hi everybody,

I'm having a little trouble with names in Bio::Species objects.

According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method:

my $my_species_obj = Bio::Species->new();
$my_species_obj->species('Homo sapiens');

print $my_species_obj->species;     # 'Homo sapiens'


That works fine if I create the Bio::Species object myself.

But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back:

my $io = Bio::SeqIO->new('-format' => 'genbank',
                         '-file'   => 'hoxa2.gb');
my $seq_obj = $io->next_seq;
my $io_species_obj = $seq_obj->species;

print $io_species_obj->species;     # 'sapiens'


I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately.

Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object:

print $my_species_obj->binomial;    # 'Homosapiens'
print $io_species_obj->binomial;    # 'Homo sapiens'


I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way?

If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite.


Thanks,
Dave


From maj at fortinbras.us  Fri Jan 15 15:31:16 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 10:31:16 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
Message-ID: <C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>

I'm not that familiar with Bio::Species either, but this looks
like conflicting semantics betwen Bio::Species and Bio::SeqIO.
Bio::SeqIO sets the species accessor to the 'species' element of
the lineage array, I believe.
FWIW, I'd prefer "binomial" = "genus" . "species"
MAJ
----- Original Message ----- 
From: "Dave Messina" <David.Messina at sbc.su.se>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:17 AM
Subject: [Bioperl-l] getting/setting species names with Bio::Species


> Hi everybody,
>
> I'm having a little trouble with names in Bio::Species objects.
>
> According to the Bio::Species documentation, if I have a species name as a 
> string, like "Homo sapiens", I can get and set that using the species method:
>
> my $my_species_obj = Bio::Species->new();
> $my_species_obj->species('Homo sapiens');
>
> print $my_species_obj->species;     # 'Homo sapiens'
>
>
> That works fine if I create the Bio::Species object myself.
>
> But if I try to get that string back out from a BIo::Species object created by 
> SeqIO from a genbank file, I get just 'sapiens' back:
>
> my $io = Bio::SeqIO->new('-format' => 'genbank',
>                         '-file'   => 'hoxa2.gb');
> my $seq_obj = $io->next_seq;
> my $io_species_obj = $seq_obj->species;
>
> print $io_species_obj->species;     # 'sapiens'
>
>
> I think that happens because genbank records have more taxonomic info about 
> the species name, like the genus (and in fact the whole taxonomic 
> categorization: kingdom phylum order, etc). So the genus is stored separately.
>
> Poking around a bit more in Bio::Species, I turned up the method 'binomial', 
> which appears to do the right thing, returning genus and species in both 
> cases. Except, as you can see, the space is stripped out for my 
> species-name-is-just-a-string object:
>
> print $my_species_obj->binomial;    # 'Homosapiens'
> print $io_species_obj->binomial;    # 'Homo sapiens'
>
>
> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I 
> using it correctly above, or is there a better way?
>
> If not, this kinda looks like a bug to me. I've got a patch which works and 
> passes the BioPerl test suite.
>
>
> Thanks,
> Dave
>
>
>
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 15:24:06 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 10:24:06 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
	<E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
Message-ID: <F1C8FA379C5746FB8987C1D41905C3F3@NewLife>

True-- blast+ allows remote dbs. I just commited a patch that makes
this easy in StandAloneBlastPlus: specify '-remote => 1' in the
factory, and downstream command calls will take care of it-
MAJ

# ex...
use Bio::Tools::Run::StandAloneBlastPlus;
use Bio::Seq;

$ENV{BLASTPLUSDIR} = $where_it_is;
my $fac = Bio::Tools::Run::StandAloneBlastPlus->new(
    -db_name => 'wgs',
    -remote => 1
    );
my $result = $fac->blastn(
    -query => 
Bio::Seq->new(-seq=>'ggcaacaaacctggtaaagaagacggcaacaagcctggtaaagaagatggcaacaagcct',
       -id=>"proteinA")
    );


1;

----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Scott Markel" <smarkel at accelrys.com>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 1:33 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From SMarkel at accelrys.com  Fri Jan 15 15:40:31 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Fri, 15 Jan 2010 07:40:31 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net>
	<E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>

Chris,

It was nice meeting you and Scott C., too.  And seeing Jason again.

If you and Mark

> How about returning 1, 2, 4 for the non-zero cases, with some
> error constants set for convenience? MAJ

are okay with adding more return values, that works best for us in
Pipeline Pilot.

I'll add a Bugzilla entry.

Scott


-----Original Message-----
From: Chris Fields [mailto:cjfields at illinois.edu] 
Sent: Thursday, 14 January 2010 10:34 PM
To: Scott Markel
Cc: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes

Scott,

I think this is fine (to change the third condition and retry with a specific code).  The other possibility is to simply throw different exceptions under each of these circumstances, which can be caught via eval to allow a retry under only certain conditions (no content, for instance).

One interesting bit: I think (though I'm not sure) the new BLAST+ allows remote BLAST queries from command line, similar to the legacy blastcl3.  Mark just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.

chris

PS - BTW, nice to finally meet you at GMOD!

On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:

> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
> from our customers.  Due to network irregularities (not sure what else
> to call it) users see the getting of remote BLAST results as somewhat
> random.  When results come back the hits are fine, but sometimes no
> information comes back at all.  Retrying helps.
> 
> In looking at RemoteBlast.pm there are four "return -1" cases.
> 
> * $status eq 'ERROR'      (return on line 614)
> * $line =~ /ERROR/I       (return on line 628)
> * !$got_content           (return on line 648)
> * !$response->is_success  (return on line 655)
> 
> In the case of no content we'd like to retry remote BLAST.  We're happy
> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
> module, but we only want to retry in that case, not the other three.
> 
> What would happen if that third "return -1" changed to a different
> return value?
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>    International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Fri Jan 15 16:00:21 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Jan 2010 10:00:21 -0600
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
	<C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
Message-ID: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>

> FWIW, I'd prefer "binomial" = "genus" . "species"


That's the way Bio::Species is supposed to work, at least when it was refactored by Sendu.  But just a note: Bio::Species was considered deprecated (scheduled for the 1.7 release IIRC) for many very good reasons in favor of Bio::Taxon.  First and foremost among these is the fact we cannot consistently parse out the genus/species/strain/variant/etc for every organism in GenBank w/o knowing it's full lineage, which means including some taxonomic information.  And even then it's highly problematic.

We've had several heated discussions on list about how to handle this in a somewhat backwards-compatible way, and the main solution was to forego compatibility issues altogether and eventually deprecate Bio::Species altogether in favor of Bio::Taxon, a class that doesn't make the same assumptions.  Bio::Species, in the interim, is-a Bio::Taxon.  You'll note that a minimal Bio::DB::Taxonomy instance is constructed from the classification scheme in some instances, but if one had a proper DB link one could link to Entrez Taxonomy or a local flat file indexes DB and grab the info.  Bio::Taxon (correct me if I'm wrong on this Sendu, if you're out there) eschews various methods (species, etc) for simpler consistent ones based on Taxonomy, and doesn't force us to handle every exception to getting the genus/species out of a name.  That is left up to the user, at their peril.

For either one, if you are reproducing the fully qualified name, you probably should use something like node_name() for consistency.  Bio::Species also has scientific_name().  With a true Bio::Taxon one would need to be check this is performed on the species node.

chris

On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote:

> I'm not that familiar with Bio::Species either, but this looks
> like conflicting semantics betwen Bio::Species and Bio::SeqIO.
> Bio::SeqIO sets the species accessor to the 'species' element of
> the lineage array, I believe.
> FWIW, I'd prefer "binomial" = "genus" . "species"
> MAJ
> ----- Original Message ----- From: "Dave Messina" <David.Messina at sbc.su.se>
> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 15, 2010 10:17 AM
> Subject: [Bioperl-l] getting/setting species names with Bio::Species
> 
> 
>> Hi everybody,
>> 
>> I'm having a little trouble with names in Bio::Species objects.
>> 
>> According to the Bio::Species documentation, if I have a species name as a string, like "Homo sapiens", I can get and set that using the species method:
>> 
>> my $my_species_obj = Bio::Species->new();
>> $my_species_obj->species('Homo sapiens');
>> 
>> print $my_species_obj->species;     # 'Homo sapiens'
>> 
>> 
>> That works fine if I create the Bio::Species object myself.
>> 
>> But if I try to get that string back out from a BIo::Species object created by SeqIO from a genbank file, I get just 'sapiens' back:
>> 
>> my $io = Bio::SeqIO->new('-format' => 'genbank',
>>                        '-file'   => 'hoxa2.gb');
>> my $seq_obj = $io->next_seq;
>> my $io_species_obj = $seq_obj->species;
>> 
>> print $io_species_obj->species;     # 'sapiens'
>> 
>> 
>> I think that happens because genbank records have more taxonomic info about the species name, like the genus (and in fact the whole taxonomic categorization: kingdom phylum order, etc). So the genus is stored separately.
>> 
>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', which appears to do the right thing, returning genus and species in both cases. Except, as you can see, the space is stripped out for my species-name-is-just-a-string object:
>> 
>> print $my_species_obj->binomial;    # 'Homosapiens'
>> print $io_species_obj->binomial;    # 'Homo sapiens'
>> 
>> 
>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I using it correctly above, or is there a better way?
>> 
>> If not, this kinda looks like a bug to me. I've got a patch which works and passes the BioPerl test suite.
>> 
>> 
>> Thanks,
>> Dave
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From SMarkel at accelrys.com  Fri Jan 15 16:10:34 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Fri, 15 Jan 2010 08:10:34 -0800
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <FE85CD2526044E8797D5A1A248AF6866@NewLife>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net><E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
	<5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
	<FE85CD2526044E8797D5A1A248AF6866@NewLife>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B30A7@EXCH1-COLO.accelrys.net>

Mark,

Thank you.

Scott


-----Original Message-----
From: Mark A. Jensen [mailto:maj at fortinbras.us] 
Sent: Friday, 15 January 2010 8:10 AM
To: Scott Markel; Chris Fields
Cc: Bioperl-l at lists.open-bio.org
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes

can do Scott-- cheers MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:40 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Chris,
>
> It was nice meeting you and Scott C., too.  And seeing Jason again.
>
> If you and Mark
>
>> How about returning 1, 2, 4 for the non-zero cases, with some
>> error constants set for convenience? MAJ
>
> are okay with adding more return values, that works best for us in
> Pipeline Pilot.
>
> I'll add a Bugzilla entry.
>
> Scott
>
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thursday, 14 January 2010 10:34 PM
> To: Scott Markel
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
>
> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 16:09:38 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 11:09:38 -0500
Subject: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
References: <5ACBA19439E77B43A06F4CAB897EC977019B3049@EXCH1-COLO.accelrys.net><E0332194-8F95-44C7-80A9-F2EF20CF2037@illinois.edu>
	<5ACBA19439E77B43A06F4CAB897EC977019B3096@EXCH1-COLO.accelrys.net>
Message-ID: <FE85CD2526044E8797D5A1A248AF6866@NewLife>

can do Scott-- cheers MAJ
----- Original Message ----- 
From: "Scott Markel" <SMarkel at accelrys.com>
To: "Chris Fields" <cjfields at illinois.edu>
Cc: <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 10:40 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes


> Chris,
>
> It was nice meeting you and Scott C., too.  And seeing Jason again.
>
> If you and Mark
>
>> How about returning 1, 2, 4 for the non-zero cases, with some
>> error constants set for convenience? MAJ
>
> are okay with adding more return values, that works best for us in
> Pipeline Pilot.
>
> I'll add a Bugzilla entry.
>
> Scott
>
>
> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Thursday, 14 January 2010 10:34 PM
> To: Scott Markel
> Cc: Bioperl-l at lists.open-bio.org
> Subject: Re: [Bioperl-l] Bio::Tools::Run::RemoteBlast return codes
>
> Scott,
>
> I think this is fine (to change the third condition and retry with a specific 
> code).  The other possibility is to simply throw different exceptions under 
> each of these circumstances, which can be caught via eval to allow a retry 
> under only certain conditions (no content, for instance).
>
> One interesting bit: I think (though I'm not sure) the new BLAST+ allows 
> remote BLAST queries from command line, similar to the legacy blastcl3.  Mark 
> just wrote up a BLAST+ wrapper, so it might be worth testing that theory out.
>
> chris
>
> PS - BTW, nice to finally meet you at GMOD!
>
> On Jan 14, 2010, at 4:58 PM, Scott Markel wrote:
>
>> We've been looking at Bio::Tools::Run::RemoteBlast after some feedback
>> from our customers.  Due to network irregularities (not sure what else
>> to call it) users see the getting of remote BLAST results as somewhat
>> random.  When results come back the hits are fine, but sometimes no
>> information comes back at all.  Retrying helps.
>>
>> In looking at RemoteBlast.pm there are four "return -1" cases.
>>
>> * $status eq 'ERROR'      (return on line 614)
>> * $line =~ /ERROR/I       (return on line 628)
>> * !$got_content           (return on line 648)
>> * !$response->is_success  (return on line 655)
>>
>> In the case of no content we'd like to retry remote BLAST.  We're happy
>> to do that part in our Pipeline Pilot Perl code wrapper for the BioPerl
>> module, but we only want to retry in that case, not the other three.
>>
>> What would happen if that third "return -1" changed to a different
>> return value?
>>
>> Scott
>>
>> Scott Markel, Ph.D.
>> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
>> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
>> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
>> San Diego, CA 92121                 fax:    +1 858 799 5222
>> USA                                 web:    http://www.accelrys.com
>>
>> http://www.linkedin.com/in/smarkel
>> Vice President, Board of Directors:
>>    International Society for Computational Biology
>> Chair: ISCB Publications Committee
>> Associate Editor: PLoS Computational Biology
>> Editorial Board: Briefings in Bioinformatics
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From maj at fortinbras.us  Fri Jan 15 16:10:02 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 15 Jan 2010 11:10:02 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se><C6C1B4D2BDDF435B9D351965BADA2A34@NewLife>
	<16F8A316-FAB3-4D5E-975A-05CE14578982@illinois.edu>
Message-ID: <C4C0A0697FCE4CFD897AD58FA7FD58AA@NewLife>

excellent summary--thanks!!
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 11:00 AM
Subject: Re: [Bioperl-l] getting/setting species names with Bio::Species


>> FWIW, I'd prefer "binomial" = "genus" . "species"
>
>
> That's the way Bio::Species is supposed to work, at least when it was 
> refactored by Sendu.  But just a note: Bio::Species was considered deprecated 
> (scheduled for the 1.7 release IIRC) for many very good reasons in favor of 
> Bio::Taxon.  First and foremost among these is the fact we cannot consistently 
> parse out the genus/species/strain/variant/etc for every organism in GenBank 
> w/o knowing it's full lineage, which means including some taxonomic 
> information.  And even then it's highly problematic.
>
> We've had several heated discussions on list about how to handle this in a 
> somewhat backwards-compatible way, and the main solution was to forego 
> compatibility issues altogether and eventually deprecate Bio::Species 
> altogether in favor of Bio::Taxon, a class that doesn't make the same 
> assumptions.  Bio::Species, in the interim, is-a Bio::Taxon.  You'll note that 
> a minimal Bio::DB::Taxonomy instance is constructed from the classification 
> scheme in some instances, but if one had a proper DB link one could link to 
> Entrez Taxonomy or a local flat file indexes DB and grab the info.  Bio::Taxon 
> (correct me if I'm wrong on this Sendu, if you're out there) eschews various 
> methods (species, etc) for simpler consistent ones based on Taxonomy, and 
> doesn't force us to handle every exception to getting the genus/species out of 
> a name.  That is left up to the user, at their peril.
>
> For either one, if you are reproducing the fully qualified name, you probably 
> should use something like node_name() for consistency.  Bio::Species also has 
> scientific_name().  With a true Bio::Taxon one would need to be check this is 
> performed on the species node.
>
> chris
>
> On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote:
>
>> I'm not that familiar with Bio::Species either, but this looks
>> like conflicting semantics betwen Bio::Species and Bio::SeqIO.
>> Bio::SeqIO sets the species accessor to the 'species' element of
>> the lineage array, I believe.
>> FWIW, I'd prefer "binomial" = "genus" . "species"
>> MAJ
>> ----- Original Message ----- From: "Dave Messina" <David.Messina at sbc.su.se>
>> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 15, 2010 10:17 AM
>> Subject: [Bioperl-l] getting/setting species names with Bio::Species
>>
>>
>>> Hi everybody,
>>>
>>> I'm having a little trouble with names in Bio::Species objects.
>>>
>>> According to the Bio::Species documentation, if I have a species name as a 
>>> string, like "Homo sapiens", I can get and set that using the species 
>>> method:
>>>
>>> my $my_species_obj = Bio::Species->new();
>>> $my_species_obj->species('Homo sapiens');
>>>
>>> print $my_species_obj->species;     # 'Homo sapiens'
>>>
>>>
>>> That works fine if I create the Bio::Species object myself.
>>>
>>> But if I try to get that string back out from a BIo::Species object created 
>>> by SeqIO from a genbank file, I get just 'sapiens' back:
>>>
>>> my $io = Bio::SeqIO->new('-format' => 'genbank',
>>>                        '-file'   => 'hoxa2.gb');
>>> my $seq_obj = $io->next_seq;
>>> my $io_species_obj = $seq_obj->species;
>>>
>>> print $io_species_obj->species;     # 'sapiens'
>>>
>>>
>>> I think that happens because genbank records have more taxonomic info about 
>>> the species name, like the genus (and in fact the whole taxonomic 
>>> categorization: kingdom phylum order, etc). So the genus is stored 
>>> separately.
>>>
>>> Poking around a bit more in Bio::Species, I turned up the method 'binomial', 
>>> which appears to do the right thing, returning genus and species in both 
>>> cases. Except, as you can see, the space is stripped out for my 
>>> species-name-is-just-a-string object:
>>>
>>> print $my_species_obj->binomial;    # 'Homosapiens'
>>> print $io_species_obj->binomial;    # 'Homo sapiens'
>>>
>>>
>>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I 
>>> using it correctly above, or is there a better way?
>>>
>>> If not, this kinda looks like a bug to me. I've got a patch which works and 
>>> passes the BioPerl test suite.
>>>
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From hlapp at drycafe.net  Fri Jan 15 17:04:43 2010
From: hlapp at drycafe.net (Hilmar Lapp)
Date: Fri, 15 Jan 2010 12:04:43 -0500
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
Message-ID: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>


On Jan 15, 2010, at 10:17 AM, Dave Messina wrote:

> According to the Bio::Species documentation, if I have a species  
> name as a string, like "Homo sapiens", I can get and set that using  
> the species method:
>
> my $my_species_obj = Bio::Species->new();
> $my_species_obj->species('Homo sapiens');


If that's really what the documentation says, it's wrong. It is the  
binomial() method that does this (as getter and setter).

	-hilmar
-- 
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
===========================================================


From David.Messina at sbc.su.se  Fri Jan 15 18:37:17 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Fri, 15 Jan 2010 19:37:17 +0100
Subject: [Bioperl-l] getting/setting species names with Bio::Species
In-Reply-To: <2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>
References: <86045871-98A8-46A3-9ECC-699C32CFE1A1@sbc.su.se>
	<2114E010-7819-4E74-9A92-C6DE6496ED0E@drycafe.net>
Message-ID: <24798E45-CF24-47D9-AB39-E66C35A5FA8B@sbc.su.se>

Thanks guys.

Well, looks like I ignored the deprecation warnings at my own peril. :)

I'll reimplement my code using Bio::Taxon directly instead. I made a little test using the node_name() method as Chris suggested, and it seems to do the trick nicely.


> If that's really what the documentation says, it's wrong.

I'm afraid so. In the POD
>  Title   : species
>  Usage   : $self->species( $species );
>            $species = $self->species();
>  Function: Get or set the scientific species name.
>  Example : $self->species('Homo sapiens');
>  Returns : Scientific species name as string
>  Args    : Scientific species name as string

and the HOWTO 
http://www.bioperl.org/wiki/HOWTO:Feature-Annotation#The_Species_Object
> # legible and long
> my $species_object = $seq_object->species;
> my $species_string = $species_object->species;
> 
> # Perlish
> my $species_string = $seq_object->species->species;
> # either way, $species_string is "Homo sapiens"


Unless there's objection, I'll fix both of those.


> It is the binomial() method that does this (as getter and setter).

Great, thanks for the clarification, Hilmar.


From bhakti.dwivedi at gmail.com  Sun Jan 17 16:02:47 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sun, 17 Jan 2010 11:02:47 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
Message-ID: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>

Hi

Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
&& hit1 -> query1)  from a blast table report?

Thanks

BD


From cjfields at illinois.edu  Sun Jan 17 17:45:08 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 17 Jan 2010 11:45:08 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <4FC546A8-079F-4A17-AB96-D4A0060904D6@illinois.edu>

It's probably not best to use BioPerl directly for this.  Have you tried OrthoMCL, or InParanoid? 

chris

On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote:

> Hi
> 
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?
> 
> Thanks
> 
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Sun Jan 17 21:03:24 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 17 Jan 2010 16:03:24 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <B602C24552CF42C58F80F3883198121C@NewLife>

re Chris's answer, check out this archived post:
http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
cheers MAJ
----- Original Message ----- 
From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 17, 2010 11:02 AM
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?


> Hi
> 
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?
> 
> Thanks
> 
> BD
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From bhakti.dwivedi at gmail.com  Sun Jan 17 21:10:03 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sun, 17 Jan 2010 16:10:03 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <B602C24552CF42C58F80F3883198121C@NewLife>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<B602C24552CF42C58F80F3883198121C@NewLife>
Message-ID: <b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>

Thank you!


On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen <maj at fortinbras.us> wrote:

> re Chris's answer, check out this archived post:
> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
> cheers MAJ
> ----- Original Message ----- From: "Bhakti Dwivedi" <
> bhakti.dwivedi at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Sunday, January 17, 2010 11:02 AM
> Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
>
>
>  Hi
>>
>> Is there a Bio-perl module to parse the reciprocal best hits (query1->
>> hit1
>> && hit1 -> query1)  from a blast table report?
>>
>> Thanks
>>
>> BD
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>


From cjfields at illinois.edu  Sun Jan 17 22:00:02 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 17 Jan 2010 16:00:02 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<B602C24552CF42C58F80F3883198121C@NewLife>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
Message-ID: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>

OrthoMCL has updated to v2 and no longer uses BioPerl, just plain perl.  Database is available here:

http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi

Package (you'll need a few other things to get it working):

http://orthomcl.org/common/downloads/software/

chris

On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:

> Thank you!
> 
> 
> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
> 
>> re Chris's answer, check out this archived post:
>> http://bioperl.org/pipermail/bioperl-l/2008-March/027357.html
>> cheers MAJ
>> ----- Original Message ----- From: "Bhakti Dwivedi" <
>> bhakti.dwivedi at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Sunday, January 17, 2010 11:02 AM
>> Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
>> 
>> 
>> Hi
>>> 
>>> Is there a Bio-perl module to parse the reciprocal best hits (query1->
>>> hit1
>>> && hit1 -> query1)  from a blast table report?
>>> 
>>> Thanks
>>> 
>>> BD
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From tristan.lefebure at gmail.com  Sun Jan 17 23:12:56 2010
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Sun, 17 Jan 2010 18:12:56 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
	<392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
Message-ID: <201001171812.56238.tristan.lefebure@gmail.com>

The transition to orthoMCL v2 being a bit painful (you need 
a MySQL database), I recently switched directly to MCL and 
the accompanying mclblastline and co programs. Modular, 
simple and very fast. Following some simulations, It gives 
better results with incomplete genomes than orthoMCL v1.x 
...

http://micans.org/mcl/

--Tristan

On Sunday 17 January 2010 17:00:02 Chris Fields wrote:
> OrthoMCL has updated to v2 and no longer uses BioPerl,
>  just plain perl.  Database is available here:
> 
> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi
> 
> Package (you'll need a few other things to get it
>  working):
> 
> http://orthomcl.org/common/downloads/software/
> 
> chris
> 
> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:
> > Thank you!
> >
> > On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen 
<maj at fortinbras.us> wrote:
> >> re Chris's answer, check out this archived post:
> >> http://bioperl.org/pipermail/bioperl-l/2008-March/0273
> >>57.html cheers MAJ
> >> ----- Original Message ----- From: "Bhakti Dwivedi" <
> >> bhakti.dwivedi at gmail.com>
> >> To: <bioperl-l at lists.open-bio.org>
> >> Sent: Sunday, January 17, 2010 11:02 AM
> >> Subject: [Bioperl-l] Reciprocal best hits using
> >> Bioperl?
> >>
> >>
> >> Hi
> >>
> >>> Is there a Bio-perl module to parse the reciprocal
> >>> best hits (query1-> hit1
> >>> && hit1 -> query1)  from a blast table report?
> >>>
> >>> Thanks
> >>>
> >>> BD
> >>> _______________________________________________
> >>> Bioperl-l mailing list
> >>> Bioperl-l at lists.open-bio.org
> >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From jason at bioperl.org  Sun Jan 17 23:59:05 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 17 Jan 2010 15:59:05 -0800
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <201001171812.56238.tristan.lefebure@gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<b643abd21001171310h63fcf290wd5a5e741dd5ecc92@mail.gmail.com>
	<392263B8-10EC-4361-82A8-0ED8E9FC7627@illinois.edu>
	<201001171812.56238.tristan.lefebure@gmail.com>
Message-ID: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>

yes - but mcl alone is something slightly different in that it doesn't  
correct for inparalogs, but for incomplete genomes this is probably  
okay.

orthomcl2 does correct the major memory hog problem and efficiencies  
in the parsing in the previous version by relying on the db for the  
indexing and looking of the reciprocal hits.

-jason
On Jan 17, 2010, at 3:12 PM, Tristan Lefebure wrote:

> The transition to orthoMCL v2 being a bit painful (you need
> a MySQL database), I recently switched directly to MCL and
> the accompanying mclblastline and co programs. Modular,
> simple and very fast. Following some simulations, It gives
> better results with incomplete genomes than orthoMCL v1.x
> ...
>
> http://micans.org/mcl/
>
> --Tristan
>
> On Sunday 17 January 2010 17:00:02 Chris Fields wrote:
>> OrthoMCL has updated to v2 and no longer uses BioPerl,
>> just plain perl.  Database is available here:
>>
>> http://orthomcl.org/cgi-bin/OrthoMclWeb.cgi
>>
>> Package (you'll need a few other things to get it
>> working):
>>
>> http://orthomcl.org/common/downloads/software/
>>
>> chris
>>
>> On Jan 17, 2010, at 3:10 PM, Bhakti Dwivedi wrote:
>>> Thank you!
>>>
>>> On Sun, Jan 17, 2010 at 4:03 PM, Mark A. Jensen
> <maj at fortinbras.us> wrote:
>>>> re Chris's answer, check out this archived post:
>>>> http://bioperl.org/pipermail/bioperl-l/2008-March/0273
>>>> 57.html cheers MAJ
>>>> ----- Original Message ----- From: "Bhakti Dwivedi" <
>>>> bhakti.dwivedi at gmail.com>
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Sunday, January 17, 2010 11:02 AM
>>>> Subject: [Bioperl-l] Reciprocal best hits using
>>>> Bioperl?
>>>>
>>>>
>>>> Hi
>>>>
>>>>> Is there a Bio-perl module to parse the reciprocal
>>>>> best hits (query1-> hit1
>>>>> && hit1 -> query1)  from a blast table report?
>>>>>
>>>>> Thanks
>>>>>
>>>>> BD
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From tristan.lefebure at gmail.com  Mon Jan 18 01:36:38 2010
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Sun, 17 Jan 2010 20:36:38 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
Message-ID: <201001172036.39032.tristan.lefebure@gmail.com>

On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
> yes - but mcl alone is something slightly different in
>  that it doesn't   correct for inparalogs, but for
>  incomplete genomes this is probably okay.

interestingly, my experience with not too divergent 
bacterial genomes (same genera) does not support the 
normalization used in the orthoMCL (which, as far as I 
understand, is a standardization of the -Log10(evalue) per 
taxa combination, including a taxa with itself). MCL, which 
does not do any normalization (just -Log10(evalue)) gives 
about the same number of false negative (i.e. missed 
orthologs), but a lot less false positive (false orthologs). 
In other words, you get many fake singletons. I don't known 
exactly if the problem lies in the normalization process or 
the fact that orthoMCLv1.x is using a very old version of 
MCL. What I do known is that many false positive are made of 
short or incomplete proteins that are very common in draft 
genomes and automatic annotations... Things might be 
completely different with more divergent and globally longer 
proteins. Testing orthoMCLv2 on the same data set would 
probably give the answer.

--Tristan


From robert.bradbury at gmail.com  Mon Jan 18 10:20:33 2010
From: robert.bradbury at gmail.com (Robert Bradbury)
Date: Mon, 18 Jan 2010 05:20:33 -0500
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <201001172036.39032.tristan.lefebure@gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
Message-ID: <deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>

My comment might be that the problem with OrthoMCL is that it is
primarily lower organisms.  The problem with Ensembl (and some other
databases) is that it is primarliy higher organisms (though they do
include Drosophila, C. elegans and Yeast).

The problem arises when one wants to cross those boundaries.  For
example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
tRNAs, and the fundamental biochemistry (EC) proteins are homologous
all the way from the most ancient bacteria through H. sapiens.  The
only way to play in the mixed arena of prokaryotes and eukaryotes
involving fundamental vectors in evolution is to either construct ones
own databases (which presumably means getting involved with MySQL, and
probably spending some $$$ on hardware) or to develop some BioPerl
modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
using some part of the cloud.  This problem isn't going to get smaller
its only going to get larger, now that the cost of sequencing
(pseudo-resequencing) a vertebrate genome is starting to come in under
$10,000 and people are starting to seriously talk about 10,000
vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
people are going to undertake very soon.

Robert


On 1/17/10, Tristan Lefebure <tristan.lefebure at gmail.com> wrote:
> On Sunday 17 January 2010 18:59:05 Jason Stajich wrote:
>> yes - but mcl alone is something slightly different in
>>  that it doesn't   correct for inparalogs, but for
>>  incomplete genomes this is probably okay.
>
> interestingly, my experience with not too divergent
> bacterial genomes (same genera) does not support the
> normalization used in the orthoMCL (which, as far as I
> understand, is a standardization of the -Log10(evalue) per
> taxa combination, including a taxa with itself). MCL, which
> does not do any normalization (just -Log10(evalue)) gives
> about the same number of false negative (i.e. missed
> orthologs), but a lot less false positive (false orthologs).
> In other words, you get many fake singletons. I don't known
> exactly if the problem lies in the normalization process or
> the fact that orthoMCLv1.x is using a very old version of
> MCL. What I do known is that many false positive are made of
> short or incomplete proteins that are very common in draft
> genomes and automatic annotations... Things might be
> completely different with more divergent and globally longer
> proteins. Testing orthoMCLv2 on the same data set would
> probably give the answer.
>
> --Tristan
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From ghhu at sibs.ac.cn  Mon Jan 18 02:34:23 2010
From: ghhu at sibs.ac.cn (Guohong Hu)
Date: Mon, 18 Jan 2010 10:34:23 +0800
Subject: [Bioperl-l] Bioperl 1.6
Message-ID: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>

Hi there,

 
I was trying to install BioPerl in windows using ppm, by following the
instruction in
"http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
the repositories, and did the search of Bioperl packages. The latest version
available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
install it, a number of prerequisite modules were being installed too, which
include Bioperl 1.4. Then an error message showed up during installation:

 
"ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
BioPerl has already installed a file that package bioperl wants to install."

 
It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
wanted to install again. I don't know why bioperl 1.4 was one of the
prerequisites for 1.6.1. If I just install 1.4, it will be installed without
errors. But I need a newer version, because some modules (like

Bio::Tools::HMM) is not included in 1.4.

 
I saw on internet that somebody had the same problem when he was trying to
install BioPerl 1.5, but I didn't find the solution.

 
Anybody has a clue on that? Thank you for your time.

 
GH

 
From cjfields at illinois.edu  Mon Jan 18 15:30:20 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 09:30:20 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
Message-ID: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>

Guohong, 

1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first.  Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused).  Just curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I can physically remove it to prevent this from happening.

chris

On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:

> Hi there,
> 
> 
> 
> I was trying to install BioPerl in windows using ppm, by following the
> instruction in
> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
> the repositories, and did the search of Bioperl packages. The latest version
> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
> install it, a number of prerequisite modules were being installed too, which
> include Bioperl 1.4. Then an error message showed up during installation:
> 
> 
> 
> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
> BioPerl has already installed a file that package bioperl wants to install."
> 
> 
> 
> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
> wanted to install again. I don't know why bioperl 1.4 was one of the
> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
> errors. But I need a newer version, because some modules (like
> 
> Bio::Tools::HMM) is not included in 1.4.
> 
> 
> 
> I saw on internet that somebody had the same problem when he was trying to
> install BioPerl 1.5, but I didn't find the solution.
> 
> 
> 
> Anybody has a clue on that? Thank you for your time.
> 
> 
> 
> GH
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 18 16:12:08 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 10:12:08 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
	<deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
Message-ID: <B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>

(my small rant on this)

On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote:

> My comment might be that the problem with OrthoMCL is that it is
> primarily lower organisms.  The problem with Ensembl (and some other
> databases) is that it is primarliy higher organisms (though they do
> include Drosophila, C. elegans and Yeast).

OrthoMCL v2 handles both lower and higher organism; I've used it for both, with decent success.  Most other ortholog tools do as well (if I'm not mistaken, ensembl also uses MCL under the hood, unless that's changed).  I don't believe one should be completely bound to one toolset, particularly in this case (there are lots of nice ortholog clustering tools using various moeans of comparison out there), but I do think OrthoMCL is very good as an initial pass.  If anything, I would like a set of (possibly bioperl-based, definitely DB-based) modules that can deal with this information.

The more imperative issue in my opinion is that one is prisoner to the gene models for those specific organisms of interest, and this may vary widely depending on the source of those gene models (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc).  For instance, if gene models are poorly curated or rarely updated, the comparisons may be significantly flawed.  Some of these issues may also be (somewhat) alleviated once more transcriptome data is available that helps clear up gene model ambiguities, but that won't be true for all organisms, at least initially.

Note this isn't meant as a slam on any specific DBs or MODs in general, the problem is one born of the fact that there isn't a single, centralized, trusted, consistently updated source for this data, specifically something that will handle moderated third-party annotation.  That's a very difficult problem to solve effectively.  Some of these very issues crept up at the GMOD conference, and there appears to be consensus that a real attempt is needed to address this.  

I don't know, maybe it's just unicorns and rainbows.  Personally I do think the situation will improve, as there seems to be great demand for it, but it requires time, resources, manpower, money, cat herding, etc.

> The problem arises when one wants to cross those boundaries.  For
> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
> tRNAs, and the fundamental biochemistry (EC) proteins are homologous
> all the way from the most ancient bacteria through H. sapiens.  The
> only way to play in the mixed arena of prokaryotes and eukaryotes
> involving fundamental vectors in evolution is to either construct ones
> own databases (which presumably means getting involved with MySQL, and
> probably spending some $$$ on hardware) or to develop some BioPerl
> modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
> using some part of the cloud.  This problem isn't going to get smaller
> its only going to get larger, now that the cost of sequencing
> (pseudo-resequencing) a vertebrate genome is starting to come in under
> $10,000 and people are starting to seriously talk about 10,000
> vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
> people are going to undertake very soon.
> 
> Robert

They're already undertaking it now using a broad range of organisms, in and out of the cloud.  In most cases one can amend a prior recip. comparative analysis with new data fairly easily, if one takes care to do so early on (i.e. set up the BLAST databases with a specified defined size for comparative stats between separate analyses).  OrthoMCL v2 describes a procedure to do this, and I believe others have similar methodology.  

I could also see possible ways one can further optimize this, for instance in cases where two very closely-related organisms are compared, where translated seqs are 100% identical, etc.  IIRC, the OrthoMCL DB site already has a way to upload custom sets of protein data for mapping to (already pre-run) clusters.  Just the fact that the tools are available as OS, they're semi-automated, and can be generically applied to data of personal interest is a great boon.  Not sure I see the downside of that, and I'm pretty confident the scalability issues will be addressed in some way.

chris


From maj at fortinbras.us  Mon Jan 18 16:33:12 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 11:33:12 -0500
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
Message-ID: <6093E45F17B543438AC02E6C626439E1@NewLife>

this issue's come up before, see this thread
http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Guohong Hu" <ghhu at sibs.ac.cn>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 10:30 AM
Subject: Re: [Bioperl-l] Bioperl 1.6


> Guohong,
>
> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed 
> first.  Make sure the repos are set according to the Windows installation 
> instructions on the BioPerl wiki:
>
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> IIRC the actual order of the PPM repository can be critical (PPM pulls based 
> on highest version, first repo, but sometimes it gets confused).  Just curious 
> but where is the v 1.4 PPM located?  If it is local to our PPM repo I can 
> physically remove it to prevent this from happening.
>
> chris
>
> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>
>> Hi there,
>>
>>
>>
>> I was trying to install BioPerl in windows using ppm, by following the
>> instruction in
>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>> the repositories, and did the search of Bioperl packages. The latest version
>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>> install it, a number of prerequisite modules were being installed too, which
>> include Bioperl 1.4. Then an error message showed up during installation:
>>
>>
>>
>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>> BioPerl has already installed a file that package bioperl wants to install."
>>
>>
>>
>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>> wanted to install again. I don't know why bioperl 1.4 was one of the
>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>> errors. But I need a newer version, because some modules (like
>>
>> Bio::Tools::HMM) is not included in 1.4.
>>
>>
>>
>> I saw on internet that somebody had the same problem when he was trying to
>> install BioPerl 1.5, but I didn't find the solution.
>>
>>
>>
>> Anybody has a clue on that? Thank you for your time.
>>
>>
>>
>> GH
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Mon Jan 18 17:18:34 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 11:18:34 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <6093E45F17B543438AC02E6C626439E1@NewLife>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<6093E45F17B543438AC02E6C626439E1@NewLife>
Message-ID: <E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>

Mark,

Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing this?  Regardless, it's problematic for me to test this out directly, at least for the next few days.  Maybe someone could try it?

Also, there is the Strawberry Perl alternative, which uses CPAN (I think ActiveState also supports this).

chris

On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote:

> this issue's come up before, see this thread
> http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
> MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Guohong Hu" <ghhu at sibs.ac.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Monday, January 18, 2010 10:30 AM
> Subject: Re: [Bioperl-l] Bioperl 1.6
> 
> 
>> Guohong,
>> 
>> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed first.  Make sure the repos are set according to the Windows installation instructions on the BioPerl wiki:
>> 
>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>> 
>> IIRC the actual order of the PPM repository can be critical (PPM pulls based on highest version, first repo, but sometimes it gets confused).  Just curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I can physically remove it to prevent this from happening.
>> 
>> chris
>> 
>> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>> 
>>> Hi there,
>>> 
>>> 
>>> 
>>> I was trying to install BioPerl in windows using ppm, by following the
>>> instruction in
>>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>>> the repositories, and did the search of Bioperl packages. The latest version
>>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>>> install it, a number of prerequisite modules were being installed too, which
>>> include Bioperl 1.4. Then an error message showed up during installation:
>>> 
>>> 
>>> 
>>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>>> BioPerl has already installed a file that package bioperl wants to install."
>>> 
>>> 
>>> 
>>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>>> wanted to install again. I don't know why bioperl 1.4 was one of the
>>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>>> errors. But I need a newer version, because some modules (like
>>> 
>>> Bio::Tools::HMM) is not included in 1.4.
>>> 
>>> 
>>> 
>>> I saw on internet that somebody had the same problem when he was trying to
>>> install BioPerl 1.5, but I didn't find the solution.
>>> 
>>> 
>>> 
>>> Anybody has a clue on that? Thank you for your time.
>>> 
>>> 
>>> 
>>> GH
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From clarsen at vecna.com  Mon Jan 18 17:42:13 2010
From: clarsen at vecna.com (Chris Larsen)
Date: Mon, 18 Jan 2010 12:42:13 -0500
Subject: [Bioperl-l] Reciprocal best blast hits using BioPerl?
In-Reply-To: <B7BD7C2F-4A70-49B5-9074-7EBAF5094AE9@vecna.com>
References: <B0218AEF-3CEB-4E06-B8DF-7B302D024797@vecna.com>
	<B7BD7C2F-4A70-49B5-9074-7EBAF5094AE9@vecna.com>
Message-ID: <ED172CDA-A8C3-4488-9648-1FBA7036BAD6@vecna.com>

Bhakti, (and Chris, Mark)--

Yes there is some perl available to parse reciprocal best blast hits.

Mark's referenced / archived post was mine, we were looking to do what  
you wanted. Here we proceed with the thread.

We ended up implementing OrthoMCL 1.4 as Chris F pointed to, and then  
made a simple perl parser that would take the raw OrthoMCL output, do  
splits, and spit out a delimited table of all the orthologs in a  
group, for say Mycobacterium Genus, so you could stuff it into DBLoader.

The link to the script, SOP, and method is at:
http://www.biohealthbase.org/brcDocs/documents/BHB_ORTHOLOG_SOP.pdf

Giving e.g.:

Francisella 1 110321310
Francisella 1 110321361
Francisella 1 56707275
Francisella 1 56707366
Francisella 1 56707462

Five members of Ortholog Group 1, with just their gi number.  And you  
can see the results of that parsing, supported by a database, being  
used to load BioHealthbase with all the reciprocal best blast hits  
plus other OrthoMCL parsing, for mycobacterial PolA at:

http://www.biohealthbase.org/brc/details.do?locus=MAV_3155&decorator=mycobacterium

See? Pretty? We were just interested in making ortholog groups on the  
bais of paralog-conscious reciprocal blast stuff. Like you. This  
package and doc I've made does what you want I think, as long as you  
stay in prokaryotes. But--careful...garbage in, garbage out. We  
started with clean Genuses. (. o O Genii?). You'll get more junky HUGE  
and TINY ortholog groups if you put in different Orders of microbes.  
Its taxa sensitive. OrthoMCL author David Roos is great at it though  
and designed it in mind of higher unicellular euks too...comb the docs  
for that; sorry I was doing bacterial work at the time and cant guide  
you if thats what you want.. If you end up installing OrthMCL 1.4, you  
can pipe the output to this method and get out useable stuff.

Hope it works for you.

Cheers,

Chris L

-- 

Christopher Larsen, Ph.D.
Sr. Scientist / Grants Manager
Vecna Technologies
6404 Ivy Lane #500
Greenbelt, MD 20770
Phone: (240) 965-4525
Fax: (240) 547-6133
240-737-4525


From maj at fortinbras.us  Mon Jan 18 19:37:43 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 14:37:43 -0500
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<6093E45F17B543438AC02E6C626439E1@NewLife>
	<E6A0E774-8559-412A-BFE9-13C45DE4EF18@illinois.edu>
Message-ID: <61F331117B7C4E2282684FA240B9710F@NewLife>

I will play around with it-- in the meantime, Guohong, please look at the 
following
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
where there is a workaround for this issue, using the ppm-shell--
cheers,
Mark
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Guohong Hu" <ghhu at sibs.ac.cn>; <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 12:18 PM
Subject: Re: [Bioperl-l] Bioperl 1.6


Mark,

Odd issue, maybe it's a dependency like Bio::ASN1::EntrezGene that's causing 
this?  Regardless, it's problematic for me to test this out directly, at least 
for the next few days.  Maybe someone could try it?

Also, there is the Strawberry Perl alternative, which uses CPAN (I think 
ActiveState also supports this).

chris

On Jan 18, 2010, at 10:33 AM, Mark A. Jensen wrote:

> this issue's come up before, see this thread
> http://lists.open-bio.org/pipermail/bioperl-l/2009-October/031400.html
> MAJ
> ----- Original Message ----- From: "Chris Fields" <cjfields at illinois.edu>
> To: "Guohong Hu" <ghhu at sibs.ac.cn>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Monday, January 18, 2010 10:30 AM
> Subject: Re: [Bioperl-l] Bioperl 1.6
>
>
>> Guohong,
>>
>> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed 
>> first.  Make sure the repos are set according to the Windows installation 
>> instructions on the BioPerl wiki:
>>
>> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>>
>> IIRC the actual order of the PPM repository can be critical (PPM pulls based 
>> on highest version, first repo, but sometimes it gets confused).  Just 
>> curious but where is the v 1.4 PPM located?  If it is local to our PPM repo I 
>> can physically remove it to prevent this from happening.
>>
>> chris
>>
>> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
>>
>>> Hi there,
>>>
>>>
>>>
>>> I was trying to install BioPerl in windows using ppm, by following the
>>> instruction in
>>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>>> the repositories, and did the search of Bioperl packages. The latest version
>>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>>> install it, a number of prerequisite modules were being installed too, which
>>> include Bioperl 1.4. Then an error message showed up during installation:
>>>
>>>
>>>
>>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>>> BioPerl has already installed a file that package bioperl wants to install."
>>>
>>>
>>>
>>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>>> wanted to install again. I don't know why bioperl 1.4 was one of the
>>> prerequisites for 1.6.1. If I just install 1.4, it will be installed without
>>> errors. But I need a newer version, because some modules (like
>>>
>>> Bio::Tools::HMM) is not included in 1.4.
>>>
>>>
>>>
>>> I saw on internet that somebody had the same problem when he was trying to
>>> install BioPerl 1.5, but I didn't find the solution.
>>>
>>>
>>>
>>> Anybody has a clue on that? Thank you for your time.
>>>
>>>
>>>
>>> GH
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From jason at bioperl.org  Mon Jan 18 20:24:33 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Jan 2010 12:24:33 -0800
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
	<201001171812.56238.tristan.lefebure@gmail.com>
	<9151FF3A-B51D-4E03-AAA0-4463A6A57F10@bioperl.org>
	<201001172036.39032.tristan.lefebure@gmail.com>
	<deaa866a1001180220p25099fn210ffb61697677a0@mail.gmail.com>
	<B691F69C-0D8F-4AC1-9CC0-79B3A9DAA45D@illinois.edu>
Message-ID: <68DF70A5-63A6-428D-A7F1-7B3D01528375@bioperl.org>


On Jan 18, 2010, at 8:12 AM, Chris Fields wrote:

> (my small rant on this)
>
> On Jan 18, 2010, at 4:20 AM, Robert Bradbury wrote:
>
>> My comment might be that the problem with OrthoMCL is that it is
>> primarily lower organisms.  The problem with Ensembl (and some other
>> databases) is that it is primarliy higher organisms (though they do
>> include Drosophila, C. elegans and Yeast).
>
> OrthoMCL v2 handles both lower and higher organism; I've used it for  
> both, with decent success.  Most other ortholog tools do as well (if  
> I'm not mistaken, ensembl also uses MCL under the hood, unless  
> that's changed).  I don't believe one should be completely bound to  
> one toolset, particularly in this case (there are lots of nice  
> ortholog clustering tools using various moeans of comparison out  
> there), but I do think OrthoMCL is very good as an initial pass.  If  
> anything, I would like a set of (possibly bioperl-based, definitely  
> DB-based) modules that can deal with this information.
>
> The more imperative issue in my opinion is that one is prisoner to  
> the gene models for those specific organisms of interest, and this  
> may vary widely depending on the source of those gene models  
> (Ensembl, UCSC, NCBI, EBI, centralized MODs like FlyBase, etc).  For  
> instance, if gene models are poorly curated or rarely updated, the  
> comparisons may be significantly flawed.  Some of these issues may  
> also be (somewhat) alleviated once more transcriptome data is  
> available that helps clear up gene model ambiguities, but that won't  
> be true for all organisms, at least initially.
>
> Note this isn't meant as a slam on any specific DBs or MODs in  
> general, the problem is one born of the fact that there isn't a  
> single, centralized, trusted, consistently updated source for this  
> data, specifically something that will handle moderated third-party  
> annotation.  That's a very difficult problem to solve effectively.   
> Some of these very issues crept up at the GMOD conference, and there  
> appears to be consensus that a real attempt is needed to address this.
>
> I don't know, maybe it's just unicorns and rainbows.  Personally I  
> do think the situation will improve, as there seems to be great  
> demand for it, but it requires time, resources, manpower, money, cat  
> herding, etc.
>
>> The problem arises when one wants to cross those boundaries.  For
>> example the 5-10 antioxidant proteins, the ~150 DNA repair proteins,
>> many of the mitochondrial (ETC) proteins, the ribosomal rRNA's &
>> tRNAs, and the fundamental biochemistry (EC) proteins are homologous
>> all the way from the most ancient bacteria through H. sapiens.  The
>> only way to play in the mixed arena of prokaryotes and eukaryotes
>> involving fundamental vectors in evolution is to either construct  
>> ones
>> own databases (which presumably means getting involved with MySQL,  
>> and
>> probably spending some $$$ on hardware) or to develop some BioPerl
>> modules that can do the  SpeciesX vs. SpeciesY comparisons on demand
>> using some part of the cloud.  This problem isn't going to get  
>> smaller
>> its only going to get larger, now that the cost of sequencing
>> (pseudo-resequencing) a vertebrate genome is starting to come in  
>> under
>> $10,000 and people are starting to seriously talk about 10,000
>> vertebrate genomes.  10,000 x 10,000 x 20,000 (genes) isn't something
>> people are going to undertake very soon.
>>
>> Robert
>
> They're already undertaking it now using a broad range of organisms,  
> in and out of the cloud.  In most cases one can amend a prior recip.  
> comparative analysis with new data fairly easily, if one takes care  
> to do so early on (i.e. set up the BLAST databases with a specified  
> defined size for comparative stats between separate analyses).   
> OrthoMCL v2 describes a procedure to do this, and I believe others  
> have similar methodology.
>
> I could also see possible ways one can further optimize this, for  
> instance in cases where two very closely-related organisms are  
> compared, where translated seqs are 100% identical, etc.  IIRC, the  
> OrthoMCL DB site already has a way to upload custom sets of protein  
> data for mapping to (already pre-run) clusters.  Just the fact that  
> the tools are available as OS, they're semi-automated, and can be  
> generically applied to data of personal interest is a great boon.   
> Not sure I see the downside of that, and I'm pretty confident the  
> scalability issues will be addressed in some way.


I think that the approach that Paul Thomas's group at SRI http://www.ai.sri.com/esb/ 
  is doing is really what you'd want to focus on if you are only  
interested in a particular set of gene families rather than de novo  
clustering. That or the PhyloFacts approach http://phylogenomics.berkeley.edu/phylofacts/ 
  .  That is where HMMs are more appropriate, focusing on your initial  
seed set of families of proteins. HMMs for your families with some  
automated clustering initially to get better resolution.  Once you  
start throwing multiple 10^6 proteins  the unsupervised clustering  
approach may not be able to give as accurate or timely results but can  
be a good initial filtering step depending on how much initial  
knowledge you are starting with. Using HMM models won't be as  
computationally expensive either if you are compute limited.

TreeFam is also providing curated phylogenies of gene families http://www.treefam.org/ 
  that span the optisthokonts in that a few fungi are sprinkled in.

Also things like http://boinc.bio.wzw.tum.de/boincsimap/ provide ways  
to use distributed computing to calculate the matrix of similarities  
among proteins if you are interested in the exhaustive approach.

-jason

>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From jay at jays.net  Mon Jan 18 23:36:20 2010
From: jay at jays.net (Jay Hannah)
Date: Mon, 18 Jan 2010 17:36:20 -0600
Subject: [Bioperl-l] Reciprocal best hits using Bioperl?
In-Reply-To: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
References: <b643abd21001170802h7a0581ud805a32991dfe9ea@mail.gmail.com>
Message-ID: <9AA13F94-3336-4CC1-89C4-249D0EB7C857@jays.net>

On Jan 17, 2010, at 10:02 AM, Bhakti Dwivedi wrote:
> Is there a Bio-perl module to parse the reciprocal best hits (query1-> hit1
> && hit1 -> query1)  from a blast table report?

If all the advice and resources in this thread have not dissuaded you from writing your own, you could glance at cross_blast() here as reference:

   https://clabsvn.ist.unomaha.edu/anonsvn/user/jhannah/UNO/seqlab/seqlab/tutorial.pod

About the (abandoned) project:

   http://clab.ist.unomaha.edu/CLAB/index.php/SeqLab_%28Perl%29

I wrote that in 2006 for clustering a few hundred proteins based on custom criteria.

Cheers,

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From jay at jays.net  Tue Jan 19 00:22:48 2010
From: jay at jays.net (Jay Hannah)
Date: Mon, 18 Jan 2010 18:22:48 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
Message-ID: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>

I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do.

   http://github.com/jhannah/bio-broodcomb

It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. 

The first two functions I stuck in the framework:

Find subsequences (Bio::BroodComb::SubSeq):

   use Bio::BroodComb;
   my $bc = Bio::BroodComb->new();
   $bc->load_large_seq(file => "large_seq.fasta");
   $bc->load_small_seq(file => "small_seq.fasta");
   $bc->find_subseqs();
   print $bc->subseq_report1;

In-silico PCR (Bio::BroodComb::PCR):

  use Bio::BroodComb;
  my $bc = Bio::BroodComb->new();
  $bc->load_large_seq(file => "large_seq.fasta");
  $bc->add_primerset(
     description    => "U5/R",   # however you want it reported
     forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA',
     reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT',
  );
  $bc->find_pcr_hits();
  $bc->find_pcr_products();
  print $bc->pcr_report1;

I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever.

Suggestions, contributions welcome.   :)

   http://github.com/jhannah/bio-broodcomb

Jay Hannah
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From ocornejo at gmail.com  Tue Jan 19 00:46:10 2010
From: ocornejo at gmail.com (Omar Cornejo)
Date: Mon, 18 Jan 2010 16:46:10 -0800 (PST)
Subject: [Bioperl-l] installing bioperl for mac
Message-ID: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>

Dear People,
  I have tried to install Bioperl in my new Mac Book, which carries
the latest perl distribution (5.10.0) and for some reason I can't
(using fink) make it recognize this version or perl.
  I have tried:
fink install bioperl-pm510
fink install bioperl-pm5100

but neither one works.  Is it fine installing bioperl for perl v 5.9?

thank you,
Omar Cornejo


From jason at bioperl.org  Tue Jan 19 01:04:31 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 18 Jan 2010 17:04:31 -0800
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <4B5502D9.2010706@gmail.com>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
	<4B5502D9.2010706@gmail.com>
Message-ID: <F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>

Alexandr -

Thanks for getting back to us - I am guessing the parser needs to  
recognize negative coordinates around about line 370 in Bio/AlignIO/ 
Handler/GenericAlignHandler.pm which assumes a split on '-' will be  
sufficient.

Can you post it as a bug to bugzilla along with attaching a record and  
script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/

-jason
On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote:

> I have contacted Pfam, and I have been told that The PDB file actually
> does include a reference to residue "-1":
>
> DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>
> DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>
>
> Since negative numbers are allowed in PDB, the data should probably be
> considered valid.
>
> There are quite a few records like this, so this is not an isolated  
> issue.
>
> Alexandr
>
> On 1/14/2010 7:20 PM, Jason Stajich wrote:
>> Seems like improper data really -- "-1" is an improper coordinate  
>> as far
>> as the parser is concerned. You may want to tell Pfam that there is
>> possible error in the dumper since that was the only record that had
>> this problem?
>>
>> -jason
>> On Jan 13, 2010, at 5:57 PM, albezg wrote:
>>
>>> Hi all,
>>>
>>> I have a problem using AlignIO to read Pfam database:
>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>>> alignment OK until the alignment PF00331.13. There it crashes with  
>>> the
>>> following message:
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: '1-344' is not an integer.
>>>
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>>> STACK: Bio::Range::end
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>>> STACK: Bio::Annotation::Target::new
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
>>> GenericAlignHandler.pm:293
>>>
>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/ 
>>> GenericAlignHandler.pm:73
>>>
>>> STACK: Bio::AlignIO::stockholm::next_aln
>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>>> -----------------------------------------------------------
>>>
>>> It appears this is caused by this entry:
>>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>>
>>> I don't care about residues in PDB, so I have just removed minus  
>>> signs
>>> from the ranges. This seems to have fixed the crashing.
>>>
>>> Is it a known problem? Is there a solution for it?
>>>
>>> Thanks,
>>> Alexandr
>>>
>>>
>>> On 03/20/2009 05:09 PM, albezg wrote:
>>>>
>>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>>> alignment(SimpleAlign).
>>>>
>>>> There are no issues when I print it, however when I use AlignIO  
>>>> to write
>>>> the alignment to a FASTA file, it does not work. Is this behavior
>>>> intended?
>>>>
>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>>
>>>> The error:
>>>> ------------- EXCEPTION -------------
>>>> MSG: No sequence with name [1/1-11]
>>>> STACK Bio::SimpleAlign::displayname
>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>>> STACK Bio::AlignIO::fasta::write_aln
>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>>> STACK toplevel ./demo.pl:14
>>>> -------------------------------------
>>>>
>>>> Alexandr
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>>
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From cjfields at illinois.edu  Tue Jan 19 02:19:30 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 20:19:30 -0600
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
	with negative PDB ranges
In-Reply-To: <F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
	<4B5502D9.2010706@gmail.com>
	<F2003A46-3B84-4F83-AD84-72445CE3D57E@bioperl.org>
Message-ID: <46FD172A-69C0-436C-A005-AC38668C3347@illinois.edu>

Alexandr,

Posting the bug report would be great, should be an easy enough fix.

chris

On Jan 18, 2010, at 7:04 PM, Jason Stajich wrote:

> Alexandr -
> 
> Thanks for getting back to us - I am guessing the parser needs to recognize negative coordinates around about line 370 in Bio/AlignIO/Handler/GenericAlignHandler.pm which assumes a split on '-' will be sufficient.
> 
> Can you post it as a bug to bugzilla along with attaching a record and script that replicates the problem so a test can be written for this. http://bugzilla.open-bio.org/
> 
> -jason
> On Jan 18, 2010, at 4:54 PM, Alexandr Bezginov wrote:
> 
>> I have contacted Pfam, and I have been told that The PDB file actually
>> does include a reference to residue "-1":
>> 
>> DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>> 
>> DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611
>> 
>> 
>> Since negative numbers are allowed in PDB, the data should probably be
>> considered valid.
>> 
>> There are quite a few records like this, so this is not an isolated issue.
>> 
>> Alexandr
>> 
>> On 1/14/2010 7:20 PM, Jason Stajich wrote:
>>> Seems like improper data really -- "-1" is an improper coordinate as far
>>> as the parser is concerned. You may want to tell Pfam that there is
>>> possible error in the dumper since that was the only record that had
>>> this problem?
>>> 
>>> -jason
>>> On Jan 13, 2010, at 5:57 PM, albezg wrote:
>>> 
>>>> Hi all,
>>>> 
>>>> I have a problem using AlignIO to read Pfam database:
>>>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>>>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>>>> alignment OK until the alignment PF00331.13. There it crashes with the
>>>> following message:
>>>> 
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: '1-344' is not an integer.
>>>> 
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>>>> STACK: Bio::Range::end
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>>>> STACK: Bio::Annotation::Target::new
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
>>>> 
>>>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
>>>> 
>>>> STACK: Bio::AlignIO::stockholm::next_aln
>>>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>>>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>>>> -----------------------------------------------------------
>>>> 
>>>> It appears this is caused by this entry:
>>>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>>> 
>>>> I don't care about residues in PDB, so I have just removed minus signs
>>>> from the ranges. This seems to have fixed the crashing.
>>>> 
>>>> Is it a known problem? Is there a solution for it?
>>>> 
>>>> Thanks,
>>>> Alexandr
>>>> 
>>>> 
>>>> On 03/20/2009 05:09 PM, albezg wrote:
>>>>> 
>>>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>>>> alignment(SimpleAlign).
>>>>> 
>>>>> There are no issues when I print it, however when I use AlignIO to write
>>>>> the alignment to a FASTA file, it does not work. Is this behavior
>>>>> intended?
>>>>> 
>>>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>>> 
>>>>> The error:
>>>>> ------------- EXCEPTION -------------
>>>>> MSG: No sequence with name [1/1-11]
>>>>> STACK Bio::SimpleAlign::displayname
>>>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>>>> STACK Bio::AlignIO::fasta::write_aln
>>>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>>>> STACK toplevel ./demo.pl:14
>>>>> -------------------------------------
>>>>> 
>>>>> Alexandr
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> -- 
>>> Jason Stajich
>>> jason.stajich at gmail.com
>>> jason at bioperl.org
>>> http://fungalgenomes.org/
>>> 
>> 
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Tue Jan 19 02:20:31 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 20:20:31 -0600
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
Message-ID: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>

On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:

> Dear People,
>  I have tried to install Bioperl in my new Mac Book, which carries
> the latest perl distribution (5.10.0) and for some reason I can't
> (using fink) make it recognize this version or perl.
>  I have tried:
> fink install bioperl-pm510
> fink install bioperl-pm5100
> 
> but neither one works.  Is it fine installing bioperl for perl v 5.9?
> 
> thank you,
> Omar Cornejo

fink doesn't have a package for perl 5.10.  You can install it using CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX installation instructions on the wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix

chris


From dan.kortschak at adelaide.edu.au  Tue Jan 19 02:47:47 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 19 Jan 2010 13:17:47 +1030
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie
 now available BETA
Message-ID: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>

Hi All,

A wrapper and output parser for bowtie 'ultrafast, memory-efficient
short read aligner' are now available in the bioperl-live and
bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
bioperl-run/trunk at 16726). Bowtie details are available here:

http://bowtie-bio.sourceforge.net/index.shtml

The modules can return a Bio::Assembly::Scaffold object (operating via
the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
uses large amounts of memory - the test suite works for me with >=2GB
but not with 1GB due to this. (Is there a disk file system based tool
for this for large projects?)

Bowtie (>0.12.0) can align in colour space, but this is not currently
supported by the wrapper though it should not be difficult to add. If
someone can point me to a small set of colour space reads and a
reference sequence I will be able to use these for testing.

Thanks to the core devs for helping me with many of my problems in
putting this together.

Dan


From maj at fortinbras.us  Tue Jan 19 03:31:36 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 18 Jan 2010 22:31:36 -0500
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
	Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <D26A5B3DAFDA4068863C7735BAF7894B@NewLife>

Excellent Dan! Thanks for all this work-- MAJ
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 18, 2010 9:47 PM
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now 
available BETA


> Hi All,
>
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
>
> http://bowtie-bio.sourceforge.net/index.shtml
>
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
>
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
>
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
>
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Tue Jan 19 03:36:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 21:36:12 -0600
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
	Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <CD36CE88-DC05-4A17-86A7-17A85C14F67A@illinois.edu>

On Jan 18, 2010, at 8:47 PM, Dan Kortschak wrote:

> Hi All,
> 
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
> 
> http://bowtie-bio.sourceforge.net/index.shtml
> 
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
> 
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
> 
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
> 
> Dan

And (on behalf of the core devs) thank you for putting this together!

chris


From scott at scottcain.net  Tue Jan 19 03:41:43 2010
From: scott at scottcain.net (Scott Cain)
Date: Mon, 18 Jan 2010 22:41:43 -0500
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
Message-ID: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>

But make sure you have the developers tools installed before the first
time you run the cpan shell; it will make your life easier.

Scott


On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu> wrote:
> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
>
>> Dear People,
>> ?I have tried to install Bioperl in my new Mac Book, which carries
>> the latest perl distribution (5.10.0) and for some reason I can't
>> (using fink) make it recognize this version or perl.
>> ?I have tried:
>> fink install bioperl-pm510
>> fink install bioperl-pm5100
>>
>> but neither one works. ?Is it fine installing bioperl for perl v 5.9?
>>
>> thank you,
>> Omar Cornejo
>
> fink doesn't have a package for perl 5.10. ?You can install it using CPAN, however (it's pure perl), or use other UNIX-y options. ?See the UNIX installation instructions on the wiki:
>
> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research


From cjfields at illinois.edu  Tue Jan 19 04:04:57 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 22:04:57 -0600
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <009801c8b957$2af4f8d0$80deea70$@ac.cn>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
	<009801c8b957$2af4f8d0$80deea70$@ac.cn>
Message-ID: <79D53148-1FDA-4025-99A6-77A7F124E6BD@illinois.edu>

Hmm, the trouchelle repo is the only one that had a working DB_File for perl 5.10 (not sure but I think 5.8.9 was fine).  Probably worth contacting them about this to see if they can drop the (way out-of-date) 1.4 distribution.

chris

On May 18, 2008, at 9:22 PM, Guohong Hu wrote:

> Thank for you all. The problem is solved. The bioperl 1.4 version is from
> the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I
> added all the repo according to the bioperl wiki instruction, somehow 1.4
> became a prerequisite for 1.6. But Chris's question reminded me, so I
> removed Trouchelle repo, and the installation proceeded without errors. I
> suggested we put a note in the wiki link since it looks like an odd issue
> not just for me.
> 
> Best,
> Guohong
> 
> 
> 
> _________________________________________
> ???: Chris Fields [mailto:cjfields at illinois.edu] 
> ????: 2010?1?18? 23:30
> ???: Guohong Hu
> ??: bioperl-l at lists.open-bio.org
> ??: Re: [Bioperl-l] Bioperl 1.6
> 
> Guohong, 
> 
> 1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed
> first.  Make sure the repos are set according to the Windows installation
> instructions on the BioPerl wiki:
> 
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
> 
> IIRC the actual order of the PPM repository can be critical (PPM pulls based
> on highest version, first repo, but sometimes it gets confused).  Just
> curious but where is the v 1.4 PPM located?  If it is local to our PPM repo
> I can physically remove it to prevent this from happening.
> 
> chris
> 
> On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:
> 
>> Hi there,
>> 
>> 
>> 
>> I was trying to install BioPerl in windows using ppm, by following the
>> instruction in
>> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
>> the repositories, and did the search of Bioperl packages. The latest
> version
>> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
>> install it, a number of prerequisite modules were being installed too,
> which
>> include Bioperl 1.4. Then an error message showed up during installation:
>> 
>> 
>> 
>> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
>> BioPerl has already installed a file that package bioperl wants to
> install."
>> 
>> 
>> 
>> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
>> wanted to install again. I don't know why bioperl 1.4 was one of the
>> prerequisites for 1.6.1. If I just install 1.4, it will be installed
> without
>> errors. But I need a newer version, because some modules (like
>> 
>> Bio::Tools::HMM) is not included in 1.4.
>> 
>> 
>> 
>> I saw on internet that somebody had the same problem when he was trying to
>> install BioPerl 1.5, but I didn't find the solution.
>> 
>> 
>> 
>> Anybody has a clue on that? Thank you for your time.
>> 
>> 
>> 
>> GH
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From ocornejo at gmail.com  Tue Jan 19 04:18:00 2010
From: ocornejo at gmail.com (Omar Eduardo Cornejo Ordaz)
Date: Mon, 18 Jan 2010 23:18:00 -0500
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
	<4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
	<5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>
Message-ID: <ddd346a41001182018o5952415fx7930d85a9430453@mail.gmail.com>

I see.
  thank you Scott and Chris.
  I had already installed the latest version of the Xcode Developer Tools.
  I will go the cpan way then.

have a nice one,
Omar

On Mon, Jan 18, 2010 at 10:58 PM, Chris Fields <cjfields at illinois.edu>wrote:

> Yes, definitely!
>
> -c
>
> On Jan 18, 2010, at 9:41 PM, Scott Cain wrote:
>
> > But make sure you have the developers tools installed before the first
> > time you run the cpan shell; it will make your life easier.
> >
> > Scott
> >
> >
> > On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu>
> wrote:
> >> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
> >>
> >>> Dear People,
> >>>  I have tried to install Bioperl in my new Mac Book, which carries
> >>> the latest perl distribution (5.10.0) and for some reason I can't
> >>> (using fink) make it recognize this version or perl.
> >>>  I have tried:
> >>> fink install bioperl-pm510
> >>> fink install bioperl-pm5100
> >>>
> >>> but neither one works.  Is it fine installing bioperl for perl v 5.9?
> >>>
> >>> thank you,
> >>> Omar Cornejo
> >>
> >> fink doesn't have a package for perl 5.10.  You can install it using
> CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX
> installation instructions on the wiki:
> >>
> >> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
> >>
> >> chris
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >
> >
> >
> > --
> > ------------------------------------------------------------------------
> > Scott Cain, Ph. D.                                   scott at scottcain
> dot net
> > GMOD Coordinator (http://gmod.org/)                     216-392-3087
> > Ontario Institute for Cancer Research
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Tue Jan 19 03:58:36 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 18 Jan 2010 21:58:36 -0600
Subject: [Bioperl-l] installing bioperl for mac
In-Reply-To: <4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
References: <780113ef-a931-4c2e-bdfe-6a67706e2d0e@p16g2000vbe.googlegroups.com>
	<2C159519-B13D-4ABA-BC7B-A21AB5EE0C37@illinois.edu>
	<4536f7701001181941v7ba47d7du340d18f02f84bb17@mail.gmail.com>
Message-ID: <5C767287-A133-4DB5-8708-AD1BF48A4E7E@illinois.edu>

Yes, definitely!

-c

On Jan 18, 2010, at 9:41 PM, Scott Cain wrote:

> But make sure you have the developers tools installed before the first
> time you run the cpan shell; it will make your life easier.
> 
> Scott
> 
> 
> On Mon, Jan 18, 2010 at 9:20 PM, Chris Fields <cjfields at illinois.edu> wrote:
>> On Jan 18, 2010, at 6:46 PM, Omar Cornejo wrote:
>> 
>>> Dear People,
>>>  I have tried to install Bioperl in my new Mac Book, which carries
>>> the latest perl distribution (5.10.0) and for some reason I can't
>>> (using fink) make it recognize this version or perl.
>>>  I have tried:
>>> fink install bioperl-pm510
>>> fink install bioperl-pm5100
>>> 
>>> but neither one works.  Is it fine installing bioperl for perl v 5.9?
>>> 
>>> thank you,
>>> Omar Cornejo
>> 
>> fink doesn't have a package for perl 5.10.  You can install it using CPAN, however (it's pure perl), or use other UNIX-y options.  See the UNIX installation instructions on the wiki:
>> 
>> http://www.bioperl.org/wiki/Installing_Bioperl_for_Unix
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> 
> 
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From albezg at gmail.com  Tue Jan 19 00:54:49 2010
From: albezg at gmail.com (Alexandr Bezginov)
Date: Mon, 18 Jan 2010 19:54:49 -0500
Subject: [Bioperl-l] AlignIO crashes when reading stockholm alignment
 with negative PDB ranges
In-Reply-To: <94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
References: <49C2B97B.7070304@gmail.com>
	<AF3C122E7BE347C38CA270CAD36FEF0B@NewLife>
	<C6A636FB-CEE2-4A78-9E53-F66999CF3F1D@illinois.edu>
	<49C405F0.5050100@gmail.com> <4B4E7A07.7070805@gmail.com>
	<94913CE4-22AA-407C-9B27-7572A680C9F7@bioperl.org>
Message-ID: <4B5502D9.2010706@gmail.com>

I have contacted Pfam, and I have been told that The PDB file actually
does include a reference to residue "-1":

DBREF  1E5N A   -1   347  UNP    P14768   XYNA_PSEFL     264    611

DBREF  1E5N B   -1   347  UNP    P14768   XYNA_PSEFL     264    611


Since negative numbers are allowed in PDB, the data should probably be
considered valid.

There are quite a few records like this, so this is not an isolated issue.

Alexandr

On 1/14/2010 7:20 PM, Jason Stajich wrote:
> Seems like improper data really -- "-1" is an improper coordinate as far
> as the parser is concerned. You may want to tell Pfam that there is
> possible error in the dumper since that was the only record that had
> this problem?
> 
> -jason
> On Jan 13, 2010, at 5:57 PM, albezg wrote:
> 
>> Hi all,
>>
>> I have a problem using AlignIO to read Pfam database:
>> ftp://ftp.sanger.ac.uk/pub/databases/Pfam/current_release/Pfam-A.seed.gz
>> The database is in STOCKHOLM 1.0 format. AlignIO can read the
>> alignment OK until the alignment PF00331.13. There it crashes with the
>> following message:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: '1-344' is not an integer.
>>
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Root/Root.pm:368
>> STACK: Bio::Range::end
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Range.pm:228
>> STACK: Bio::Annotation::Target::new
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/Annotation/Target.pm:82
>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::_stockholm_target
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:293
>>
>> STACK: Bio::AlignIO::Handler::GenericAlignHandler::data_handler
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/Handler/GenericAlignHandler.pm:73
>>
>> STACK: Bio::AlignIO::stockholm::next_aln
>> /home/albezg/lib/perl5/site_perl/5.10.0/Bio/AlignIO/stockholm.pm:471
>> STACK: /home/albezg/scripts/pfam2fasta.pl:22
>> -----------------------------------------------------------
>>
>> It appears this is caused by this entry:
>> #=GS XYNA_PSEFL/263-608    DR PDB; 1e5n B; -1-344;
>>
>> I don't care about residues in PDB, so I have just removed minus signs
>> from the ranges. This seems to have fixed the crashing.
>>
>> Is it a known problem? Is there a solution for it?
>>
>> Thanks,
>> Alexandr
>>
>>
>> On 03/20/2009 05:09 PM, albezg wrote:
>>>
>>> I'm trying to change FASTA header(display_id) for a sequence in an
>>> alignment(SimpleAlign).
>>>
>>> There are no issues when I print it, however when I use AlignIO to write
>>> the alignment to a FASTA file, it does not work. Is this behavior
>>> intended?
>>>
>>> Demo code: http://github.com/jhannah/sandbox/tree/master/Bio_AlignIO_bug
>>>
>>> The error:
>>> ------------- EXCEPTION -------------
>>> MSG: No sequence with name [1/1-11]
>>> STACK Bio::SimpleAlign::displayname
>>> /scratch/BioSoftware/bioperl-live/Bio/SimpleAlign.pm:2659
>>> STACK Bio::AlignIO::fasta::write_aln
>>> /scratch/BioSoftware/bioperl-live/Bio/AlignIO/fasta.pm:200
>>> STACK toplevel ./demo.pl:14
>>> -------------------------------------
>>>
>>> Alexandr
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> -- 
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> 


From ghhu at sibs.ac.cn  Tue Jan 19 02:22:19 2010
From: ghhu at sibs.ac.cn (Guohong Hu)
Date: Tue, 19 Jan 2010 02:22:19 -0000
Subject: [Bioperl-l] Bioperl 1.6
In-Reply-To: <EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
References: <004f01ca97e6$bee06650$3ca132f0$@ac.cn>
	<EB78FEC6-F762-4C9C-AAC6-875EEBE94898@illinois.edu>
Message-ID: <009801c8b957$2af4f8d0$80deea70$@ac.cn>

Thank for you all. The problem is solved. The bioperl 1.4 version is from
the Trouchelle repo, but 1.6 is in the Bioperl Regular Releases repo. When I
added all the repo according to the bioperl wiki instruction, somehow 1.4
became a prerequisite for 1.6. But Chris's question reminded me, so I
removed Trouchelle repo, and the installation proceeded without errors. I
suggested we put a note in the wiki link since it looks like an odd issue
not just for me.

Best,
Guohong


_________________________________________
???: Chris Fields [mailto:cjfields at illinois.edu] 
????: 2010?1?18? 23:30
???: Guohong Hu
??: bioperl-l at lists.open-bio.org
??: Re: [Bioperl-l] Bioperl 1.6

Guohong, 

1.6.1 PPM doesn't (at least, shouldn't) require BioPerl 1.4 to be installed
first.  Make sure the repos are set according to the Windows installation
instructions on the BioPerl wiki:

http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

IIRC the actual order of the PPM repository can be critical (PPM pulls based
on highest version, first repo, but sometimes it gets confused).  Just
curious but where is the v 1.4 PPM located?  If it is local to our PPM repo
I can physically remove it to prevent this from happening.

chris

On Jan 17, 2010, at 8:34 PM, Guohong Hu wrote:

> Hi there,
> 
> 
> 
> I was trying to install BioPerl in windows using ppm, by following the
> instruction in
> "http://bioperl.open-bio.org/wiki/Installing_Bioperl_on_Windows". I set up
> the repositories, and did the search of Bioperl packages. The latest
version
> available is BioPerl 1.6.1 on http://bioperl.org/DIST. When I tried to
> install it, a number of prerequisite modules were being installed too,
which
> include Bioperl 1.4. Then an error message showed up during installation:
> 
> 
> 
> "ERROR: File conflict for 'C:/Perl/html/bin/bp_aacomp.html'.The package
> BioPerl has already installed a file that package bioperl wants to
install."
> 
> 
> 
> It looks to me that BioPerl 1.6.1 had installed a file that bioperl 1.4
> wanted to install again. I don't know why bioperl 1.4 was one of the
> prerequisites for 1.6.1. If I just install 1.4, it will be installed
without
> errors. But I need a newer version, because some modules (like
> 
> Bio::Tools::HMM) is not included in 1.4.
> 
> 
> 
> I saw on internet that somebody had the same problem when he was trying to
> install BioPerl 1.5, but I didn't find the solution.
> 
> 
> 
> Anybody has a clue on that? Thank you for your time.
> 
> 
> 
> GH
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jw12 at sanger.ac.uk  Tue Jan 19 10:41:12 2010
From: jw12 at sanger.ac.uk (Jonathan Warren)
Date: Tue, 19 Jan 2010 10:41:12 +0000
Subject: [Bioperl-l] DAS Workshop Registrations now Open (workshop date 7-9
	April 2010)
Message-ID: <9EDF4E46-15F8-434E-B557-2DE5906C4182@sanger.ac.uk>

If you don't know about DAS and wish to know how to distribute your  
latest biological annotation to the world then the upcoming DAS  
workshop maybe for you.
If you know about DAS and are maybe a DAS client developer then the  
upcoming DAS workshop is for you (as you will need to know about the  
upcoming DAS 1.6 Specification and how it may affect your software).

For information on the workshop and registration please go to:

http://www.ebi.ac.uk/training/handson/DAS_070410.html


Jonathan Warren
Senior Developer and DAS coordinator
jw12 at sanger.ac.uk


-- 
 The Wellcome Trust Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE. 


From SMarkel at accelrys.com  Tue Jan 19 18:00:22 2010
From: SMarkel at accelrys.com (Scott Markel)
Date: Tue, 19 Jan 2010 10:00:22 -0800
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
 Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>

Dan,

Life Tech has sample data for E. coli at

http://solidsoftwaretools.com/gf/project/ecoli2x50/

and

http://solidsoftwaretools.com/gf/project/dh10bfrag/.

Reference sequences are included.

Scott

Scott Markel, Ph.D.
Principal Bioinformatics Architect  email:  smarkel at accelrys.com
Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
San Diego, CA 92121                 fax:    +1 858 799 5222
USA                                 web:    http://www.accelrys.com

http://www.linkedin.com/in/smarkel
Vice President, Board of Directors:
    International Society for Computational Biology
Chair: ISCB Publications Committee
Associate Editor: PLoS Computational Biology
Editorial Board: Briefings in Bioinformatics


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
Sent: Monday, 18 January 2010 6:48 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA

Hi All,

A wrapper and output parser for bowtie 'ultrafast, memory-efficient
short read aligner' are now available in the bioperl-live and
bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
bioperl-run/trunk at 16726). Bowtie details are available here:

http://bowtie-bio.sourceforge.net/index.shtml

The modules can return a Bio::Assembly::Scaffold object (operating via
the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
uses large amounts of memory - the test suite works for me with >=2GB
but not with 1GB due to this. (Is there a disk file system based tool
for this for large projects?)

Bowtie (>0.12.0) can align in colour space, but this is not currently
supported by the wrapper though it should not be difficult to add. If
someone can point me to a small set of colour space reads and a
reference sequence I will be able to use these for testing.

Thanks to the core devs for helping me with many of my problems in
putting this together.

Dan

_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dan.kortschak at adelaide.edu.au  Tue Jan 19 21:18:20 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 07:48:20 +1030
Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and
 Bio::Assembly::IO::bowtie now available BETA
In-Reply-To: <5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>
References: <1263869267.12881.33.camel@zoidberg.mbs.adelaide.edu.au>
	<5ACBA19439E77B43A06F4CAB897EC977019B31FE@EXCH1-COLO.accelrys.net>
Message-ID: <1263935900.4813.0.camel@epistle>

Great.

Thanks, Scott.

Dan

On Tue, 2010-01-19 at 10:00 -0800, Scott Markel wrote:
> Dan,
> 
> Life Tech has sample data for E. coli at
> 
> http://solidsoftwaretools.com/gf/project/ecoli2x50/
> 
> and
> 
> http://solidsoftwaretools.com/gf/project/dh10bfrag/.
> 
> Reference sequences are included.
> 
> Scott
> 
> Scott Markel, Ph.D.
> Principal Bioinformatics Architect  email:  smarkel at accelrys.com
> Accelrys (Pipeline Pilot R&D)       mobile: +1 858 205 3653
> 10188 Telesis Court, Suite 100      voice:  +1 858 799 5603
> San Diego, CA 92121                 fax:    +1 858 799 5222
> USA                                 web:    http://www.accelrys.com
> 
> http://www.linkedin.com/in/smarkel
> Vice President, Board of Directors:
>     International Society for Computational Biology
> Chair: ISCB Publications Committee
> Associate Editor: PLoS Computational Biology
> Editorial Board: Briefings in Bioinformatics
> 
> 
> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Dan Kortschak
> Sent: Monday, 18 January 2010 6:48 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Bio::Tools::Run::Bowtie and Bio::Assembly::IO::bowtie now available BETA
> 
> Hi All,
> 
> A wrapper and output parser for bowtie 'ultrafast, memory-efficient
> short read aligner' are now available in the bioperl-live and
> bioperl-run subversion repositories (bioperl-live/trunk at 16727 and
> bioperl-run/trunk at 16726). Bowtie details are available here:
> 
> http://bowtie-bio.sourceforge.net/index.shtml
> 
> The modules can return a Bio::Assembly::Scaffold object (operating via
> the MAJ's Bio::Assembly::IO::sam module in bioperl-live/trunk
> which requires lstein's Bio::DB::Sam, from CPAN). Note that Bio::DB::Sam
> uses large amounts of memory - the test suite works for me with >=2GB
> but not with 1GB due to this. (Is there a disk file system based tool
> for this for large projects?)
> 
> Bowtie (>0.12.0) can align in colour space, but this is not currently
> supported by the wrapper though it should not be difficult to add. If
> someone can point me to a small set of colour space reads and a
> reference sequence I will be able to use these for testing.
> 
> Thanks to the core devs for helping me with many of my problems in
> putting this together.
> 
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From dan.kortschak at adelaide.edu.au  Wed Jan 20 05:32:05 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 16:02:05 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
Message-ID: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>

Hi Chris (or others),

I've been looking at ways to do large assemblies (really rnaseq/readseq
comparisons for coverage) with maq/bowtie output and it's clear that for
the size of project that I'm working on the space complexity is too
nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
go.

I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF

This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
read through the docs, and it's not entirely clear (I'm hoping I've
interpreted it the right way), but does this result in the return of
features such that overlapping features are returned as a single feature
while non-overlapping features come back separately. If this is the
case, it would satisfy my requirements perfectly.

thanks for your time
Dan


From jason at bioperl.org  Wed Jan 20 06:35:24 2010
From: jason at bioperl.org (Jason Stajich)
Date: Tue, 19 Jan 2010 22:35:24 -0800
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>

Are you looking at the bowtie features file or the SAM?
-jason
On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote:

> Hi Chris (or others),
>
> I've been looking at ways to do large assemblies (really rnaseq/ 
> readseq
> comparisons for coverage) with maq/bowtie output and it's clear that  
> for
> the size of project that I'm working on the space complexity is too
> nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> go.
>
> I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF ->  
> B:DB:GFF
>
> This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> read through the docs, and it's not entirely clear (I'm hoping I've
> interpreted it the right way), but does this result in the return of
> features such that overlapping features are returned as a single  
> feature
> while non-overlapping features come back separately. If this is the
> case, it would satisfy my requirements perfectly.
>
> thanks for your time
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/


From dan.kortschak at adelaide.edu.au  Wed Jan 20 07:19:05 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Wed, 20 Jan 2010 17:49:05 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
	<C541CE0A-0B4D-4708-A6FA-8D9049B96483@bioperl.org>
Message-ID: <1263971945.4582.2.camel@epistle>

It doesn't really matter, they are largely inter-convertible. The
problem is not really the upstream processing, but the aggregation of
reads into read-assigned regions (unless I've misunderstood your
question).

Dan

On Tue, 2010-01-19 at 22:35 -0800, Jason Stajich wrote:
> Are you looking at the bowtie features file or the SAM?
> -jason
> On Jan 19, 2010, at 9:32 PM, Dan Kortschak wrote:
> 
> > Hi Chris (or others),
> >
> > I've been looking at ways to do large assemblies (really rnaseq/ 
> > readseq
> > comparisons for coverage) with maq/bowtie output and it's clear that  
> > for
> > the size of project that I'm working on the space complexity is too
> > nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> > go.
> >
> > I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF ->  
> > B:DB:GFF
> >
> > This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> > read through the docs, and it's not entirely clear (I'm hoping I've
> > interpreted it the right way), but does this result in the return of
> > features such that overlapping features are returned as a single  
> > feature
> > while non-overlapping features come back separately. If this is the
> > case, it would satisfy my requirements perfectly.
> >
> > thanks for your time
> > Dan
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/

-- 
Dan Kortschak <dan.kortschak at adelaide.edu.au>


From ajmackey at gmail.com  Wed Jan 20 12:59:38 2010
From: ajmackey at gmail.com (Aaron Mackey)
Date: Wed, 20 Jan 2010 07:59:38 -0500
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
Message-ID: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>

I would advise using BEDtools or the R IRanges package for this kind of
aggregation/merging work, rather than trying to reinvent this particular
wheel.

-Aaron

On Wed, Jan 20, 2010 at 12:32 AM, Dan Kortschak <
dan.kortschak at adelaide.edu.au> wrote:

> Hi Chris (or others),
>
> I've been looking at ways to do large assemblies (really rnaseq/readseq
> comparisons for coverage) with maq/bowtie output and it's clear that for
> the size of project that I'm working on the space complexity is too
> nasty with Bio::DB::Sam. So I thought Bio::DB:GFF might be the way to
> go.
>
> I was thinking: B:T:R:Bowtie ~> B:SeqFeat:Generic -> B:T:GFF -> B:DB:GFF
>
> This depends on the behaviour of B:DB:GFF->features(-merge=>1). I've
> read through the docs, and it's not entirely clear (I'm hoping I've
> interpreted it the right way), but does this result in the return of
> features such that overlapping features are returned as a single feature
> while non-overlapping features come back separately. If this is the
> case, it would satisfy my requirements perfectly.
>
> thanks for your time
> Dan
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From dan.kortschak at adelaide.edu.au  Wed Jan 20 21:16:39 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Thu, 21 Jan 2010 07:46:39 +1030
Subject: [Bioperl-l] using Bio::DB::GFF for aggregation
In-Reply-To: <24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>
References: <1263965525.12780.24.camel@zoidberg.mbs.adelaide.edu.au>
	<24c96eca1001200459g36cc5610pe12a10fea8b59a4c@mail.gmail.com>
Message-ID: <1264022199.4688.29.camel@epistle>

Thanks for that, I'll look into those. BEDtools looks like what I want.

cheers
Dan

On Wed, 2010-01-20 at 07:59 -0500, Aaron Mackey wrote:
> I would advise using BEDtools or the R IRanges package for this kind
> of aggregation/merging work, rather than trying to reinvent this
> particular wheel.
> 
> -Aaron


From biopython at maubp.freeserve.co.uk  Thu Jan 21 12:33:53 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 21 Jan 2010 12:33:53 +0000
Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML in
	BioSQL
Message-ID: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>

Hi all,

This is cross posted to try and ensure relevant people see it.
I suggest we continue the discussion on the BioSQL list
(for how to serialise structured annotation to BioSQL), and/or
the OpenBio list (for things like file format naming conventions).

I am hoping we (Bio*) can be consistent in how we parse and load
into BioSQL the SwissProt DE lines (known as "swiss" format in
both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
equivalent UniProt XML tags (which we are tentatively going to
call the "uniprot" format in Biopython's SeqIO - comments?).

Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
files and load them into BioSQL. Biopython currently treats the DE
comment lines as a long string, as BioPerl used to:

http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html

I understand that BioPerl now turns the SwissProt DE lines into a
TagTree, and for storing this in BioSQL this gets serialised as XML.
I would like Biopython to handle this the same way (although rather
than a Perl TagTree, we'd use a Python structure of course), and
would appreciate clarification of what exactly was implemented
(e.g. which bit of the BioPerl source code should be look at,
and could you show a worked example?).

Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
Open-Bio lists yet) has started work on parsing UniProt XML
files for Biopython. Here the DE comment lines are already
provided broken up with XML markup. Hopefully their nested
structure matches what BioPerl was doing with the SwissProt
DE lines.

Regards,

Peter


From cjfields at illinois.edu  Thu Jan 21 13:34:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Jan 2010 07:34:12 -0600
Subject: [Bioperl-l] [Open-bio-l] SwissProt DE lines and UniProt XML /
	TagTree as XML in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <A6F5F623-2750-4BB0-91F7-5A87BABE367B@illinois.edu>

Peter,

The relevant code is in Bio::Annotation::TagTree in bioperl-live, which is a decorator for Data::Stag:

http://search.cpan.org/~cmungall/Data-Stag-0.11/Data/Stag.pm

This is where the text output is derived from.  It's a bit of a heavyweight solution to the problem, but it's capable of round-tripping the DE data and parses out the data in a way that's approachable.  We could probably abstract out the serialization backend there and allow a pure bioperl solution (or the current solution) as a fallback. 

If the plain-text DE info is represented in a hierarchy already in UniProt XML, we should probably conform as closely as possible to that (using a standard format like XML, JSON, etc.).  

chris

On Jan 21, 2010, at 6:33 AM, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter


From sharmashalu.bio at gmail.com  Thu Jan 21 14:25:44 2010
From: sharmashalu.bio at gmail.com (shalu sharma)
Date: Thu, 21 Jan 2010 09:25:44 -0500
Subject: [Bioperl-l] sequence orientation
Message-ID: <465b5a661001210625j3d84a165u69d8c8d21d2fe7ac@mail.gmail.com>

Hi All,
         This is not a perl/bioperl query but i thought that its a best
place to ask.
I have some pyro reads ( from CAMERA) and i want to find out their 5' and 3'
ends. Is there any way i can do this?

I would really appreciate if anyone can help me out.

Thanks
Shalu


From rtbio.2009 at gmail.com  Thu Jan 21 18:28:43 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Thu, 21 Jan 2010 19:28:43 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <196889DF87964224ACDB948681BA7F86@NewLife>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<4C2E8133F916495B876628EF3E8FCBB2@NewLife>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
Message-ID: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>

Hello Mark,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");

while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;
   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;
              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen <maj at fortinbras.us> wrote:

>  Excellent Roopa- it's my pleasure-- MAJ
>
> ----- Original Message -----
> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
> *To:* Mark A. Jensen <maj at fortinbras.us>
> *Sent:* Saturday, January 09, 2010 6:41 PM
> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>
> Hi Mark,
>
> Thank you very very much. The code is working now. Thanks for the support
> and time you have spent on me.
>
> Thanks in advance
> Roopa.
>
> On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>>  There is still a bug with the double quotes. Use "$organ\[ORGN]", which
>> prevents perl from
>> looking for a member of an array called @organ. This would have shown up
>> if 'use strict;' had
>> been in place. Still don't know whether this would work precisely; can you
>> send me the query
>> sequence so I can reproduce your ouput?
>> thanks MAJ
>>
>>  ----- Original Message -----
>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>> *Sent:* Saturday, January 09, 2010 2:02 PM
>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>
>> Hi Mark,
>>
>> I tried it with double quotes but still i got the same o/p with sequences
>> from different species.
>>
>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...  1813
>> 0.0
>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...  1622
>> 0.0
>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...   773
>> 0.0
>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...   749
>> 0.0
>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   542
>> 2e-151
>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...   196
>> 3e-47
>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...   190
>> 1e-45
>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...   181
>> 7e-43
>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...   179
>> 2e-42
>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   178
>> 8e-42
>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...   170
>> 1e-39
>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...   169
>> 4e-39
>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...   167
>> 1e-38
>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...   150
>> 1e-33
>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...   145
>> 5e-32
>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...   143
>> 2e-31
>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...   143
>> 2e-31
>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...   138
>> 7e-30
>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...   138
>> 7e-30
>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...   136
>> 2e-29
>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...   136
>> 2e-29
>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...   134
>> 9e-29
>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...   132
>> 3e-28
>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...   132
>> 3e-28
>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...   132
>> 3e-28
>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...   131
>> 1e-27
>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...   131
>> 1e-27
>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...   129
>> 4e-27
>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...   129
>> 4e-27
>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...   129
>> 4e-27
>> ref|NM_001003470.1|  Danio rerio protein kinase, cAMP-dependen...   129
>> 4e-27
>> ref|XM_001141503.1|  PREDICTED: Pan troglodytes verus protein ...   127
>> 1e-26
>> ref|XM_001145269.1|  PREDICTED: Pan troglodytes protein kinase...   127
>> 1e-26
>> ref|XM_512434.2|  PREDICTED: Pan troglodytes cAMP-dependent pr...   127
>> 1e-26
>> ref|XM_001171457.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_001171437.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_847420.1|  PREDICTED: Canis familiaris similar to Serin...   127
>> 1e-26
>> ref|NM_207518.1|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>> ref|NM_002730.3|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>>
>>
>> Thanks in advance.
>>
>> Roopa.
>>
>> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>>  I understand you. Put in the double quotes and see what happens.
>>>
>>>  ----- Original Message -----
>>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>> *To:* Mark A. Jensen <maj at fortinbras.us>
>>>   *Sent:* Saturday, January 09, 2010 1:40 PM
>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>
>>> Hi Mark,
>>>
>>> Thanks for your reply. It was working when I specifically use the name of
>>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a
>>> $organ which takes the organism given by the user i.e., let it be anything
>>>
>>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc.,  I should get the
>>> sequences related to only those organisms.
>>>
>>> i.e., If the user enters Pseudomonas,the $organ parameter of the code
>>> takes Pseudomonas ,does BLAST and returns only those sequences that produce
>>> significant alignment with Pseudomonas(only).But this is not happening like
>>> that .
>>>
>>> Please help me in this regard.
>>>
>>> Thanks in advance
>>> Roopa
>>>
>>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>
>>>>  Hi Roopa-- You may get what you want if you make the change.
>>>> With single quotes, ENTREZ_QUERY is set to the literal string
>>>>
>>>>  $organ[ORGN]
>>>>
>>>> while, with double quotes, the variable value will be substituted,
>>>> and the parameter should be set to
>>>>
>>>>  Trypanosoma brucei[ORGN]
>>>>
>>>> I'm guess that it worked because the database ignored the strange
>>>> parameter,
>>>> and returned all the matches. Try this and if it doesn't work I look
>>>> harder.
>>>> cheers,
>>>> Mark
>>>>
>>>>  ----- Original Message -----
>>>> *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>>>> *Sent:* Saturday, January 09, 2010 1:24 PM
>>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>>
>>>> hello Mark,
>>>>
>>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in
>>>> double quotations,but. I would like to have only those specific sequences
>>>> which are specific for my Organism i.e., I need sequences only from the
>>>> organism that I entered.
>>>>
>>>> When the organism is Trypanosoma brucei,I could get even Leishmania and
>>>> other species as the similar sequences. But I want to get only trypanosoma
>>>> brucei sequences.
>>>>
>>>> Could you please help me out in this regard?
>>>>
>>>> Roopa.
>>>>
>>>> My output
>>>>
>>>> I/P organism: Trypanosoma brucei
>>>>
>>>> O/P:-
>>>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1813    0.0
>>>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1622    0.0
>>>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 773    0.0
>>>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 749    0.0
>>>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 542    2e-151
>>>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...
>>>> 196    3e-47
>>>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 190    1e-45
>>>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...
>>>> 181    7e-43
>>>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 179    2e-42
>>>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 178    8e-42
>>>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 170    1e-39
>>>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...
>>>> 168    4e-39
>>>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...
>>>> 167    1e-38
>>>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 150    1e-33
>>>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...
>>>> 145    5e-32
>>>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...
>>>> 143    2e-31
>>>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...
>>>> 143    2e-31
>>>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...
>>>> 138    7e-30
>>>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...
>>>> 138    7e-30
>>>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...
>>>> 136    2e-29
>>>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...
>>>> 136    2e-29
>>>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...
>>>> 134    9e-29
>>>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...
>>>> 132    3e-28
>>>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...
>>>> 132    3e-28
>>>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...
>>>> 132    3e-28
>>>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...
>>>> 131    1e-27
>>>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 131    1e-27
>>>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...
>>>> 129    4e-27
>>>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...
>>>> 129    4e-27
>>>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...
>>>> 129    4e-27
>>>>
>>>> Roopa.
>>>>
>>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>>
>>>>> I see it immediately (from making same bug many times) :
>>>>>
>>>>>
>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>> =>
>>>>> - '$organ[ORGN]');
>>>>> +"$organ[ORGN]");
>>>>>
>>>>>
>>>>> MAJ
>>>>>
>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>> rtbio.2009 at gmail.com>
>>>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>>>> Cc: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Saturday, January 09, 2010 11:57 AM
>>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>>
>>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei
>>>>>> as
>>>>>> the organism parameter,but when I tried to use the Organism parameter
>>>>>> from
>>>>>> the user,it was not working i.e., I was unable to get the target
>>>>>> sequences.
>>>>>> Please help me in this regard. My code is
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>>
>>>>>> #path for extra camel module
>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>> use Roopablast;
>>>>>>
>>>>>>
>>>>>> use Bio::SearchIO;
>>>>>> use Bio::Search::Result::BlastResult;
>>>>>> use Bio::Perl;
>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::DB::GenBank;
>>>>>>
>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>> my $outstring ="";
>>>>>>
>>>>>> &parse_form;
>>>>>>
>>>>>> print "Content-type: text/html\n\n";
>>>>>> print "<HTML>\n";
>>>>>> print "<head><title>RNAi Result</title>";
>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>> print "</head>\n";
>>>>>> print "<body>\n";
>>>>>> print " Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>> print "</BODY>\n";
>>>>>> print "</HTML>\n";
>>>>>>
>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>> exit if $pid;
>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>
>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>
>>>>>> print OUTFILE "<HTML>\n
>>>>>> <head><title>RNAi Result</title>
>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>> </head>\n
>>>>>> <body>\n
>>>>>>  Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>> wait......<br>
>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>> </BODY>\n
>>>>>> </HTML>\n";
>>>>>>
>>>>>> close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
>>>>>>
>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>
>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>> $in{'Threshold'});
>>>>>>
>>>>>>
>>>>>> sub blastcode
>>>>>> {
>>>>>>
>>>>>> $inpu1= $_[0];
>>>>>>
>>>>>> $organ= $_[1];
>>>>>>
>>>>>> open(NUC,'>',$nuc);
>>>>>> print NUC $inpu1,"\n";
>>>>>> close(NUC);
>>>>>>
>>>>>> my $prog = 'blastn';
>>>>>> my $db   = 'refseq_rna';
>>>>>> my $e_val= '1e-10';
>>>>>> my $organism= $organ;
>>>>>>
>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>
>>>>>> my @params = ( '-prog' => $prog,
>>>>>>        '-data' => $db,
>>>>>>        '-expect' => $e_val,
>>>>>>        '-readmethod' => 'SearchIO',
>>>>>>       '-Organism'   => $organism );
>>>>>>
>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>            print OUTFILE $inpu1;
>>>>>>             close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>>> =>
>>>>>> '$organ[ORGN]');
>>>>>>
>>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>
>>>>>>  #change a paramter
>>>>>>
>>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>> Brucei[ORGN]';
>>>>>>
>>>>>> #change a paramter
>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>> '$input2[ORGN]';
>>>>>>
>>>>>>  my $v = 1;
>>>>>>  #$v is just to turn on and off the messages
>>>>>>
>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>> '-organism' => $organ );
>>>>>>
>>>>>>
>>>>>> while (my $input = $str->next_seq())
>>>>>> {
>>>>>>  #Blast a sequence against a database:
>>>>>>   #Alternatively, you could  pass in a file with many
>>>>>>   #sequences rather than loop through sequence one at a time
>>>>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>   #and swap the two lines below for an example of that.
>>>>>>
>>>>>>            #open(OUTFILE,'>',$debugfile);
>>>>>>             # print OUTFILE $input;
>>>>>>             #close(OUTFILE);
>>>>>>
>>>>>>
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>
>>>>>>               open(OUTFILE,'>',$debugfile);
>>>>>>            #   print OUTFILE $r;
>>>>>>               close(OUTFILE);
>>>>>>
>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>  #   open(OUTFILE,'>',$debugfile);
>>>>>>   #           print OUTFILE "while entered";
>>>>>>    #         close(OUTFILE);
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>
>>>>>>     #         open(OUTFILE,'>',$debugfile);
>>>>>>      #        print OUTFILE "foreach entered";
>>>>>>       #      close(OUTFILE);
>>>>>>
>>>>>>       my $rc = $factory->retrieve_blast($rid);
>>>>>>
>>>>>>       if( !ref($rc) )
>>>>>>       {
>>>>>>       if( $rc < 0 )
>>>>>>       {
>>>>>>       $factory->remove_rid($rid);
>>>>>>       }
>>>>>>        open(OUTFILE,'>',$debugfile);
>>>>>>        #      print OUTFILE "if entered";
>>>>>>             close(OUTFILE);
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>       }
>>>>>>      else {
>>>>>>         #    open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "else entered";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>         my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>         print BLASTDEBUGFILE $result->next_hit();
>>>>>>         close(BLASTDEBUGFILE);
>>>>>>
>>>>>>       my $filename =
>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>
>>>>>>        # open(DEBUGFILE,'>',$debugfile);
>>>>>>        # open(new,'>',$filename);
>>>>>>        # @arra=<new>;
>>>>>>        # print DEBUGFILE @arra;
>>>>>>        # close(DEBUGFILE);
>>>>>>        # close(new);
>>>>>>
>>>>>>        $factory->save_output($filename);
>>>>>>  # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>      # close(BLASTDEBUGFILE);
>>>>>>
>>>>>>      $factory->remove_rid($rid);
>>>>>>
>>>>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>      print BLASTDEBUGFILE  $organism;
>>>>>>       close(BLASTDEBUGFILE);
>>>>>>
>>>>>>   # open(OUTFILE,'>',$outfile);
>>>>>>   # print OUTFILE "Test2 $result->database_name()";
>>>>>>   # close(OUTFILE);
>>>>>>
>>>>>> #$hit = $result->next_hit;
>>>>>> #open(new,'>',$debugfile);
>>>>>> #print $hit;
>>>>>> #close(new);
>>>>>>
>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>
>>>>>>           next unless ( $v > 0);
>>>>>>
>>>>>>         #     open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "$hit in while hits";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>>>>                 push(@seqs,$dna);
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>>  }
>>>>>>
>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>  #print OUTFILE $seqs[0];
>>>>>>  #close(OUTFILE);
>>>>>>
>>>>>> return(@seqs);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Roopa.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Roopa--
>>>>>>>
>>>>>>> I got your code to work with the following changes:
>>>>>>>
>>>>>>> +# the input should be a valid FASTA file...
>>>>>>> ...
>>>>>>> open(NUC,'>',$nuc);
>>>>>>> +print NUC ">seq (need a name line for valid fasta)\n";
>>>>>>> print NUC $inpu1, "\n";
>>>>>>> close(NUC);
>>>>>>> ...
>>>>>>>
>>>>>>> +# you can set these header parms in the call itself...
>>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params,
>>>>>>> -ENTREZ_QUERY =>
>>>>>>> ''Trypanosoma Brucei[ORGN]');
>>>>>>>
>>>>>>>  #change a paramter
>>>>>>> +# commented this out...
>>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>> 'Trypanosoma
>>>>>>> Brucei[ORGN]';
>>>>>>>
>>>>>>> MAJ
>>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>>>> rtbio.2009 at gmail.com
>>>>>>> >
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, January 08, 2010 10:00 AM
>>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>>>>>>
>>>>>>>
>>>>>>>  Hello all,
>>>>>>>
>>>>>>>>
>>>>>>>> I was trying Remote blast using Bioperl. My input data is a
>>>>>>>> Trypanosoma
>>>>>>>> brucei sequence in Fasta format. When I was trying to submit to
>>>>>>>> BLAST
>>>>>>>> using
>>>>>>>> the step
>>>>>>>> $r=$factory->submit_blast($input)
>>>>>>>> It was not returning anything which I checked by debugging the code.
>>>>>>>> It is
>>>>>>>> not blasting my input sequence even though I mentioned all the
>>>>>>>> parameters.I
>>>>>>>> would paste the code below.
>>>>>>>>
>>>>>>>> Please help me in solving put this problem. It is very urgent.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Roopa.
>>>>>>>>
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #path for extra camel module
>>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>>>> use Roopablast;
>>>>>>>>
>>>>>>>>
>>>>>>>> use Bio::SearchIO;
>>>>>>>> use Bio::Search::Result::BlastResult;
>>>>>>>> use Bio::Perl;
>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>> use Bio::Seq;
>>>>>>>> use Bio::SeqIO;
>>>>>>>> use Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>> my $outstring ="";
>>>>>>>>
>>>>>>>> &parse_form;
>>>>>>>>
>>>>>>>> print "Content-type: text/html\n\n";
>>>>>>>> print "<HTML>\n";
>>>>>>>> print "<head><title>RNAi Result</title>";
>>>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>>>> print "</head>\n";
>>>>>>>> print "<body>\n";
>>>>>>>> print " Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>>>> print "</BODY>\n";
>>>>>>>> print "</HTML>\n";
>>>>>>>>
>>>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>>>> exit if $pid;
>>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>>>> </head>\n
>>>>>>>> <body>\n
>>>>>>>>  Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>>>> wait......<br>
>>>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>>>> </BODY>\n
>>>>>>>> </HTML>\n";
>>>>>>>>
>>>>>>>> close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> @compseqs = blastcode($in{'Inputseq'});
>>>>>>>>
>>>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>>>
>>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>>>> $in{'Threshold'});
>>>>>>>>
>>>>>>>>
>>>>>>>> sub blastcode
>>>>>>>> {
>>>>>>>>
>>>>>>>> $inpu1= $_[0];
>>>>>>>>
>>>>>>>> #$organ= $_[1];
>>>>>>>>
>>>>>>>> open(NUC,'>',$nuc);
>>>>>>>> print NUC $inpu1;
>>>>>>>> close(NUC);
>>>>>>>>
>>>>>>>> my $prog = 'blastn';
>>>>>>>> my $db   = 'refseq_rna';
>>>>>>>> my $e_val= '1e-10';
>>>>>>>> my $organism= 'Trypanosoma Brucei';
>>>>>>>>
>>>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>       '-data' => $db,
>>>>>>>>       '-expect' => $e_val,
>>>>>>>>       '-readmethod' => 'SearchIO',
>>>>>>>>       '-Organism'   => $organism );
>>>>>>>>
>>>>>>>>          # open(OUTFILE,'>',$debugfile);
>>>>>>>>           #  print OUTFILE @params;
>>>>>>>>           # close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>
>>>>>>>>  #change a paramter
>>>>>>>>
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>>>> Brucei[ORGN]';
>>>>>>>>
>>>>>>>> #change a paramter
>>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>>> '$input2[ORGN]';
>>>>>>>>
>>>>>>>>  my $v = 1;
>>>>>>>>  #$v is just to turn on and off the messages
>>>>>>>>
>>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>>>> '-organism' => 'Trypanosoma Brucei' );
>>>>>>>>
>>>>>>>>
>>>>>>>> while (my $input = $str->next_seq())
>>>>>>>> {
>>>>>>>>  #Blast a sequence against a database:
>>>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>>>  #and swap the two lines below for an example of that.
>>>>>>>>
>>>>>>>>           open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE $input;
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  my $r = $factory->submit_blast($input);    #The program stops here
>>>>>>>> it
>>>>>>>> does not return any value and it does not enter the While
>>>>>>>> loop,Please help
>>>>>>>> me in this regard.#
>>>>>>>>              open(OUTFILE,'>',$debugfile);
>>>>>>>>              print OUTFILE $r;
>>>>>>>>              close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>>>
>>>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>>>    open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "while entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>   foreach my $rid ( @rids ) {
>>>>>>>>
>>>>>>>>             open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "foreach entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>>>
>>>>>>>>      if( !ref($rc) )
>>>>>>>>      {
>>>>>>>>      if( $rc < 0 )
>>>>>>>>      {
>>>>>>>>      $factory->remove_rid($rid);
>>>>>>>>      }
>>>>>>>>       open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "if entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>       print STDERR "." if ( $v > 0 );
>>>>>>>>       sleep 5;
>>>>>>>>      }
>>>>>>>>     else {
>>>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "else entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>        my $result = $rc->next_result();
>>>>>>>>       #save the output
>>>>>>>>      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>        print BLASTDEBUGFILE $result->next_hit();
>>>>>>>>        close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>      my $filename =
>>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>>>
>>>>>>>>       # open(DEBUGFILE,'>',$debugfile);
>>>>>>>>       # open(new,'>',$filename);
>>>>>>>>       # @arra=<new>;
>>>>>>>>       # print DEBUGFILE @arra;
>>>>>>>>       # close(DEBUGFILE);
>>>>>>>>       # close(new);
>>>>>>>>
>>>>>>>>       $factory->save_output($filename);
>>>>>>>>
>>>>>>>>     # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>>>     # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>>>     # close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>     $factory->remove_rid($rid);
>>>>>>>>
>>>>>>>>     open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>     print BLASTDEBUGFILE  $organism;
>>>>>>>>      close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>  # open(OUTFILE,'>',$outfile);
>>>>>>>>  # print OUTFILE "Test2 $result->database_name()";
>>>>>>>>  # close(OUTFILE);
>>>>>>>>
>>>>>>>> #$hit = $result->next_hit;
>>>>>>>> #open(new,'>',$debugfile);
>>>>>>>> #print $hit;
>>>>>>>> #close(new);
>>>>>>>>
>>>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>>>
>>>>>>>>          next unless ( $v > 0);
>>>>>>>>
>>>>>>>>        #     open(OUTFILE,'>',$debugfile);
>>>>>>>>         #    print OUTFILE "$hit in while hits";
>>>>>>>>          #  close(OUTFILE);
>>>>>>>>
>>>>>>>>     my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>>>         my $dna = $sequ->seq();        # get the sequence as a
>>>>>>>> string
>>>>>>>>                push(@seqs,$dna);
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>    }
>>>>>>>>  }
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>>>  #print OUTFILE $seqs[0];
>>>>>>>>  #close(OUTFILE);
>>>>>>>>
>>>>>>>> return(@seqs);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile) || die ;
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>>>>>>> <body>\n
>>>>>>>> <p><font face=\"Courier, monospace font set\">
>>>>>>>> Inputsequence: <br>";
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      print OUTFILE " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      print OUTFILE "<br>\n";
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> print OUTFILE "</font> <p>";
>>>>>>>>
>>>>>>>> $z=@compseqs;
>>>>>>>>
>>>>>>>> for($k=1;$k<$z;$k++) {
>>>>>>>>  print OUTFILE "<font face=\"Courier, monospace font
>>>>>>>> set\"><p>Compare
>>>>>>>> Sequence: <br>";
>>>>>>>>
>>>>>>>>  for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>>>>>>
>>>>>>>>      print OUTFILE substr ($compseqs[$k], $i, 1);
>>>>>>>>
>>>>>>>>      if ( ($i+1)%10==0){
>>>>>>>>          print OUTFILE " ";
>>>>>>>>      }
>>>>>>>>      if ( ($i+1)%60==0){
>>>>>>>>          print OUTFILE "<br>\n";
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  print OUTFILE "<p></font>";
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<p>
>>>>>>>> Window: <br>$in{'Windowsize'}
>>>>>>>> <p>
>>>>>>>> <p>
>>>>>>>> Threshold: <br>$in{'Threshold'}
>>>>>>>> <p>";
>>>>>>>> my $j=0;
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>>>>>>      if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>>>>>>          $j=$in{'Windowsize'};
>>>>>>>>      }
>>>>>>>>      $height=$out[$i]->{similar}*5;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ($j>0) {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>      $j--;
>>>>>>>>  }
>>>>>>>>  else {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      $outstring .= " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      $outstring .= "<br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%800==0){
>>>>>>>>      print OUTFILE "<br><br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>>>>>>> set\">$outstring</font>";
>>>>>>>>
>>>>>>>> #foreach (@out) {
>>>>>>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar}
>>>>>>>> matchs<p>";
>>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>>>>>>
>>>>>>>> #    }
>>>>>>>> #}
>>>>>>>>
>>>>>>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>>>>>>
>>>>>>>> close OUTFILE;
>>>>>>>>
>>>>>>>> #nameprint();
>>>>>>>>
>>>>>>>> sub parse_form {
>>>>>>>>  local ($buffer, @pairs, $pair, $name, $value);
>>>>>>>>  # Read in text
>>>>>>>>  $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>>>>>>  if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>>>>>>  {
>>>>>>>>      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>>>>>>  }
>>>>>>>>  else
>>>>>>>>  {
>>>>>>>>      $buffer = $ENV{'QUERY_STRING'};
>>>>>>>>  }
>>>>>>>>  @pairs = split(/&/, $buffer);
>>>>>>>>  foreach $pair (@pairs)
>>>>>>>>  {
>>>>>>>>      ($name, $value) = split(/=/, $pair);
>>>>>>>>      $value =~ tr/+/ /;
>>>>>>>>      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>>>>>>      $in{$name} = $value;
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>


From bernd.web at gmail.com  Thu Jan 21 18:37:18 2010
From: bernd.web at gmail.com (Bernd Web)
Date: Thu, 21 Jan 2010 19:37:18 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
	<c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
Message-ID: <716af09c1001211037p59b19a29l1967f1e514469e79@mail.gmail.com>

Hi,

Regarding RemoteBlast, my I add a query?
It seems that Bio::Tools::Run::RemoteBlast  is sending each sequence
seperately to the NCBI (at least in BP 1.5.2).
This means that for each Sequence a RID is to be checked. Is this
indeed the case?
The BLAST URL-API or batch interface supports sending multiple
sequences at once.

Regards,
Bernd

On Thu, Jan 21, 2010 at 7:28 PM, Roopa Raghuveer <rtbio.2009 at gmail.com> wrote:
> Hello Mark,
>
> This is Roopa again. I have a small problem again. I am working on Remote
> blast. The program works well. But the problem is this. ?The program
> accesses the server and gets the output correctly. I am trying to send the
> result sequences into an array and I found that always the first sequence
> among the Result sequences is missing. The code is
>
> ?my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
> '-organism' => "$organ\[ORGN]");


From cjfields at illinois.edu  Fri Jan 22 04:31:25 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Jan 2010 22:31:25 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
In-Reply-To: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
Message-ID: <BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>

Jay,

Did you want to release it to CPAN?  I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight.

chris

On Jan 18, 2010, at 6:22 PM, Jay Hannah wrote:

> I formalized a little framework so I could stop re-writing little programs that do some things people frequently ask me to do.
> 
>   http://github.com/jhannah/bio-broodcomb
> 
> It stores everything in SQLite so users can write custom reports if they want to. It uses BioPerl and some shiny CPAN beads (DBIx::Class, Moose::Role). Tests included. 
> 
> The first two functions I stuck in the framework:
> 
> Find subsequences (Bio::BroodComb::SubSeq):
> 
>   use Bio::BroodComb;
>   my $bc = Bio::BroodComb->new();
>   $bc->load_large_seq(file => "large_seq.fasta");
>   $bc->load_small_seq(file => "small_seq.fasta");
>   $bc->find_subseqs();
>   print $bc->subseq_report1;
> 
> In-silico PCR (Bio::BroodComb::PCR):
> 
>  use Bio::BroodComb;
>  my $bc = Bio::BroodComb->new();
>  $bc->load_large_seq(file => "large_seq.fasta");
>  $bc->add_primerset(
>     description    => "U5/R",   # however you want it reported
>     forward_primer => 'GCGGGCAGCAATACTGCTTTGTAA',
>     reverse_primer => 'ACCAGCGTTCAGCATATGGAGGAT',
>  );
>  $bc->find_pcr_hits();
>  $bc->find_pcr_products();
>  print $bc->pcr_report1;
> 
> I find this rather handy, so will probably be adding all my applicable future work to it instead of writing stand-alone programs. Not sure if it should be renamed for eventual CPAN / wherever.
> 
> Suggestions, contributions welcome.   :)
> 
>   http://github.com/jhannah/bio-broodcomb
> 
> Jay Hannah
> http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From jason at bioperl.org  Fri Jan 22 06:17:14 2010
From: jason at bioperl.org (Jason Stajich)
Date: Thu, 21 Jan 2010 22:17:14 -0800
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
Message-ID: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>

I'm considering putting in allowable initialization parameter (and get/ 
set) for Bio::AlignIO that would allow setting of the alphabet.  This  
is then passed to Bio::LocatableSeq creation so that _guess_alphabet  
isn't called. This will allow removal of warnings about empty  
sequences because _guess_alphabet won't be called on a sequence if we  
have explictly set the alphabet.

This worked great on my local install and tests pass.  Any objections  
or concerns?

basically it means when you make an AlignIO you can specify the  
alphabet i.e.

my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
file => 'genome.fasaln');

I have some alignments with empty sequences and I think turning off  
the warnings is appropriate where I force the alphabet choice. It  
should also have a very modest speedup benefit too.

-jason
--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From rtbio.2009 at gmail.com  Fri Jan 22 09:54:32 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Fri, 22 Jan 2010 10:54:32 +0100
Subject: [Bioperl-l] Fwd:  Regarding blast in Bioperl
In-Reply-To: <c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
References: <c7cac1601001080700m319b6aft5157786649a51bc0@mail.gmail.com>
	<c7cac1601001091024m53bd4bd3v4fcc31c51b3e5e1c@mail.gmail.com>
	<9D8A1428463C4D5E9C416521C35E254C@NewLife>
	<c7cac1601001091040q67e5358dy69a0208c461ef24e@mail.gmail.com>
	<D7723023B7DD4D6CAA36535E906DAB7A@NewLife>
	<c7cac1601001091102j2f5c18c5v263397bfd8a90692@mail.gmail.com>
	<D6F7C8EB0814499E8BD3E4F7F8BBFBEE@NewLife>
	<c7cac1601001091541y462cb562oae113b5f5b3e2711@mail.gmail.com>
	<196889DF87964224ACDB948681BA7F86@NewLife>
	<c7cac1601001211028q39df5f1etd91712e55321abb2@mail.gmail.com>
Message-ID: <c7cac1601001220154r4f92651ejb79663898e0b8fc2@mail.gmail.com>

---------- Forwarded message ----------
From: Roopa Raghuveer <rtbio.2009 at gmail.com>
Date: Thu, Jan 21, 2010 at 7:28 PM
Subject: Re: [Bioperl-l] Regarding blast in Bioperl
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: bioperl-l at lists.open-bio.org


Hello Mark,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


On Sun, Jan 10, 2010 at 1:03 AM, Mark A. Jensen <maj at fortinbras.us> wrote:

>  Excellent Roopa- it's my pleasure-- MAJ
>
> ----- Original Message -----
>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
> *To:* Mark A. Jensen <maj at fortinbras.us>
>  *Sent:* Saturday, January 09, 2010 6:41 PM
> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>
> Hi Mark,
>
> Thank you very very much. The code is working now. Thanks for the support
> and time you have spent on me.
>
> Thanks in advance
> Roopa.
>
> On Sat, Jan 9, 2010 at 10:56 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>
>>  There is still a bug with the double quotes. Use "$organ\[ORGN]", which
>> prevents perl from
>> looking for a member of an array called @organ. This would have shown up
>> if 'use strict;' had
>> been in place. Still don't know whether this would work precisely; can you
>> send me the query
>> sequence so I can reproduce your ouput?
>> thanks MAJ
>>
>>  ----- Original Message -----
>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>> *Sent:* Saturday, January 09, 2010 2:02 PM
>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>
>> Hi Mark,
>>
>> I tried it with double quotes but still i got the same o/p with sequences
>> from different species.
>>
>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...  1813
>> 0.0
>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...  1622
>> 0.0
>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...   773
>> 0.0
>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...   749
>> 0.0
>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...   551
>> 3e-154
>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   542
>> 2e-151
>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...   538
>> 2e-150
>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...   196
>> 3e-47
>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...   190
>> 1e-45
>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...   181
>> 7e-43
>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...   179
>> 2e-42
>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...   178
>> 8e-42
>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...   170
>> 1e-39
>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...   169
>> 4e-39
>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...   167
>> 1e-38
>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...   150
>> 1e-33
>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...   145
>> 5e-32
>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...   143
>> 2e-31
>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...   143
>> 2e-31
>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...   138
>> 7e-30
>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...   138
>> 7e-30
>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...   136
>> 2e-29
>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...   136
>> 2e-29
>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...   134
>> 9e-29
>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...   132
>> 3e-28
>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...   132
>> 3e-28
>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...   132
>> 3e-28
>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...   131
>> 1e-27
>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...   131
>> 1e-27
>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...   129
>> 4e-27
>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...   129
>> 4e-27
>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...   129
>> 4e-27
>> ref|NM_001003470.1|  Danio rerio protein kinase, cAMP-dependen...   129
>> 4e-27
>> ref|XM_001141503.1|  PREDICTED: Pan troglodytes verus protein ...   127
>> 1e-26
>> ref|XM_001145269.1|  PREDICTED: Pan troglodytes protein kinase...   127
>> 1e-26
>> ref|XM_512434.2|  PREDICTED: Pan troglodytes cAMP-dependent pr...   127
>> 1e-26
>> ref|XM_001171457.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_001171437.1|  PREDICTED: Pan troglodytes cAMP-dependent...   127
>> 1e-26
>> ref|XM_847420.1|  PREDICTED: Canis familiaris similar to Serin...   127
>> 1e-26
>> ref|NM_207518.1|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>> ref|NM_002730.3|  Homo sapiens protein kinase, cAMP-dependent,...   127
>> 1e-26
>>
>>
>> Thanks in advance.
>>
>> Roopa.
>>
>> On Sat, Jan 9, 2010 at 7:46 PM, Mark A. Jensen <maj at fortinbras.us> wrote:
>>
>>>  I understand you. Put in the double quotes and see what happens.
>>>
>>>  ----- Original Message -----
>>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>> *To:* Mark A. Jensen <maj at fortinbras.us>
>>>   *Sent:* Saturday, January 09, 2010 1:40 PM
>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>
>>> Hi Mark,
>>>
>>> Thanks for your reply. It was working when I specifically use the name of
>>> the organism as Trypanosoma brucei in the code,but my idea is to introduce a
>>> $organ which takes the organism given by the user i.e., let it be anything
>>>
>>> Pseudomonas, Drosophila, Trypanosoma, Leishmania etc.,  I should get the
>>> sequences related to only those organisms.
>>>
>>> i.e., If the user enters Pseudomonas,the $organ parameter of the code
>>> takes Pseudomonas ,does BLAST and returns only those sequences that produce
>>> significant alignment with Pseudomonas(only).But this is not happening like
>>> that .
>>>
>>> Please help me in this regard.
>>>
>>> Thanks in advance
>>> Roopa
>>>
>>> On Sat, Jan 9, 2010 at 7:29 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>
>>>>  Hi Roopa-- You may get what you want if you make the change.
>>>> With single quotes, ENTREZ_QUERY is set to the literal string
>>>>
>>>>  $organ[ORGN]
>>>>
>>>> while, with double quotes, the variable value will be substituted,
>>>> and the parameter should be set to
>>>>
>>>>  Trypanosoma brucei[ORGN]
>>>>
>>>> I'm guess that it worked because the database ignored the strange
>>>> parameter,
>>>> and returned all the matches. Try this and if it doesn't work I look
>>>> harder.
>>>> cheers,
>>>> Mark
>>>>
>>>>  ----- Original Message -----
>>>>  *From:* Roopa Raghuveer <rtbio.2009 at gmail.com>
>>>>   *To:* Mark A. Jensen <maj at fortinbras.us>
>>>> *Sent:* Saturday, January 09, 2010 1:24 PM
>>>> *Subject:* Re: [Bioperl-l] Regarding blast in Bioperl
>>>>
>>>> hello Mark,
>>>>
>>>> Thanks for your reply.It was working without enclosing $organ[ORGN] in
>>>> double quotations,but. I would like to have only those specific sequences
>>>> which are specific for my Organism i.e., I need sequences only from the
>>>> organism that I entered.
>>>>
>>>> When the organism is Trypanosoma brucei,I could get even Leishmania and
>>>> other species as the similar sequences. But I want to get only trypanosoma
>>>> brucei sequences.
>>>>
>>>> Could you please help me out in this regard?
>>>>
>>>> Roopa.
>>>>
>>>> My output
>>>>
>>>> I/P organism: Trypanosoma brucei
>>>>
>>>> O/P:-
>>>> ref|XM_822292.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1813    0.0
>>>> ref|XM_822286.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 1622    0.0
>>>> ref|XM_816530.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 773    0.0
>>>> ref|XM_816527.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 749    0.0
>>>> ref|XM_838414.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_838409.1|  Leishmania major strain Friedlin protein kin...
>>>> 551    3e-154
>>>> ref|XM_001568451.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 542    2e-151
>>>> ref|XM_001469171.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001469166.1|  Leishmania infantum protein kinase A cata...
>>>> 538    2e-150
>>>> ref|XM_001682462.1|  Leishmania major protein kinase A catalyt...
>>>> 196    3e-47
>>>> ref|XM_804361.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 190    1e-45
>>>> ref|XM_002065851.1|  Drosophila willistoni GK20594 (Dwil\GK205...
>>>> 181    7e-43
>>>> ref|XM_822694.1|  Trypanosoma brucei TREU927 protein kinase A ...
>>>> 179    2e-42
>>>> ref|XM_001563990.1|  Leishmania braziliensis MHOM/BR/75/M2904 ...
>>>> 178    8e-42
>>>> ref|XM_814844.1|  Trypanosoma cruzi strain CL Brener protein k...
>>>> 170    1e-39
>>>> ref|XM_001763039.1|  Physcomitrella patens subsp. patens predi...
>>>> 168    4e-39
>>>> ref|XM_001464886.1|  Leishmania infantum JPCM5 protein kinase ...
>>>> 167    1e-38
>>>> ref|XM_001377302.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 150    1e-33
>>>> ref|XM_001603485.1|  PREDICTED: Nasonia vitripennis similar to...
>>>> 145    5e-32
>>>> ref|XM_416852.2|  PREDICTED: Gallus gallus protein kinase, X-l...
>>>> 143    2e-31
>>>> ref|NM_001016403.2|  Xenopus (Silurana) tropicalis protein kin...
>>>> 143    2e-31
>>>> ref|XM_002009291.1|  Drosophila mojavensis GI11297 (Dmoj\GI112...
>>>> 138    7e-30
>>>> ref|NM_016979.1|  Mus musculus protein kinase, X-linked (Prkx)...
>>>> 138    7e-30
>>>> ref|XM_001495664.2|  PREDICTED: Equus caballus similar to Seri...
>>>> 136    2e-29
>>>> ref|XM_001111571.1|  PREDICTED: Macaca mulatta cAMP-dependent ...
>>>> 136    2e-29
>>>> ref|XM_001611655.1|  Babesia bovis protein kinase domain conta...
>>>> 134    9e-29
>>>> ref|NR_028062.1|  Homo sapiens protein kinase, Y-linked (PRKY)...
>>>> 132    3e-28
>>>> ref|XM_001517795.1|  PREDICTED: Ornithorhynchus anatinus simil...
>>>> 132    3e-28
>>>> ref|XM_685338.2|  PREDICTED: Danio rerio similar to Serine/thr...
>>>> 132    3e-28
>>>> ref|XM_002189865.1|  PREDICTED: Taeniopygia guttata protein ki...
>>>> 131    1e-27
>>>> ref|XM_001362299.1|  PREDICTED: Monodelphis domestica similar ...
>>>> 131    1e-27
>>>> ref|NM_001093198.1|  Xenopus laevis protein kinase, cAMP-depen...
>>>> 129    4e-27
>>>> ref|XM_001461322.1|  Paramecium tetraurelia hypothetical prote...
>>>> 129    4e-27
>>>> ref|NM_001099869.1|  Xenopus laevis cAMP-dependent protein kin...
>>>> 129    4e-27
>>>>
>>>> Roopa.
>>>>
>>>> On Sat, Jan 9, 2010 at 7:05 PM, Mark A. Jensen <maj at fortinbras.us>wrote:
>>>>
>>>>> I see it immediately (from making same bug many times) :
>>>>>
>>>>>
>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>> =>
>>>>> - '$organ[ORGN]');
>>>>> +"$organ[ORGN]");
>>>>>
>>>>>
>>>>> MAJ
>>>>>
>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>> rtbio.2009 at gmail.com>
>>>>> To: "Mark A. Jensen" <maj at fortinbras.us>
>>>>> Cc: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Saturday, January 09, 2010 11:57 AM
>>>>> Subject: Re: [Bioperl-l] Regarding blast in Bioperl
>>>>>
>>>>>
>>>>>
>>>>> Hello all,
>>>>>>
>>>>>> Thanks alot for your reply Mark. It was working for Trypanosoma brucei
>>>>>> as
>>>>>> the organism parameter,but when I tried to use the Organism parameter
>>>>>> from
>>>>>> the user,it was not working i.e., I was unable to get the target
>>>>>> sequences.
>>>>>> Please help me in this regard. My code is
>>>>>>
>>>>>> #!/usr/bin/perl
>>>>>>
>>>>>> #path for extra camel module
>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>> use Roopablast;
>>>>>>
>>>>>>
>>>>>> use Bio::SearchIO;
>>>>>> use Bio::Search::Result::BlastResult;
>>>>>> use Bio::Perl;
>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>> use Bio::Seq;
>>>>>> use Bio::SeqIO;
>>>>>> use Bio::DB::GenBank;
>>>>>>
>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>> my $outstring ="";
>>>>>>
>>>>>> &parse_form;
>>>>>>
>>>>>> print "Content-type: text/html\n\n";
>>>>>> print "<HTML>\n";
>>>>>> print "<head><title>RNAi Result</title>";
>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>> print "</head>\n";
>>>>>> print "<body>\n";
>>>>>> print " Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>> print "</BODY>\n";
>>>>>> print "</HTML>\n";
>>>>>>
>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>> exit if $pid;
>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>
>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>
>>>>>> print OUTFILE "<HTML>\n
>>>>>> <head><title>RNAi Result</title>
>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>> </head>\n
>>>>>> <body>\n
>>>>>>  Your results will appear <a
>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>> wait......<br>
>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>> </BODY>\n
>>>>>> </HTML>\n";
>>>>>>
>>>>>> close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> @compseqs = blastcode($in{'Inputseq'},$in{'Organism'});
>>>>>>
>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>
>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>> $in{'Threshold'});
>>>>>>
>>>>>>
>>>>>> sub blastcode
>>>>>> {
>>>>>>
>>>>>> $inpu1= $_[0];
>>>>>>
>>>>>> $organ= $_[1];
>>>>>>
>>>>>> open(NUC,'>',$nuc);
>>>>>> print NUC $inpu1,"\n";
>>>>>> close(NUC);
>>>>>>
>>>>>> my $prog = 'blastn';
>>>>>> my $db   = 'refseq_rna';
>>>>>> my $e_val= '1e-10';
>>>>>> my $organism= $organ;
>>>>>>
>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>
>>>>>> my @params = ( '-prog' => $prog,
>>>>>>        '-data' => $db,
>>>>>>        '-expect' => $e_val,
>>>>>>        '-readmethod' => 'SearchIO',
>>>>>>       '-Organism'   => $organism );
>>>>>>
>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>            print OUTFILE $inpu1;
>>>>>>             close(OUTFILE);
>>>>>>
>>>>>>
>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY
>>>>>> =>
>>>>>> '$organ[ORGN]');
>>>>>>
>>>>>> #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>
>>>>>>  #change a paramter
>>>>>>
>>>>>> #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>> Brucei[ORGN]';
>>>>>>
>>>>>> #change a paramter
>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>> '$input2[ORGN]';
>>>>>>
>>>>>>  my $v = 1;
>>>>>>  #$v is just to turn on and off the messages
>>>>>>
>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>> '-organism' => $organ );
>>>>>>
>>>>>>
>>>>>> while (my $input = $str->next_seq())
>>>>>> {
>>>>>>  #Blast a sequence against a database:
>>>>>>   #Alternatively, you could  pass in a file with many
>>>>>>   #sequences rather than loop through sequence one at a time
>>>>>>   #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>   #and swap the two lines below for an example of that.
>>>>>>
>>>>>>            #open(OUTFILE,'>',$debugfile);
>>>>>>             # print OUTFILE $input;
>>>>>>             #close(OUTFILE);
>>>>>>
>>>>>>
>>>>>>  my $r = $factory->submit_blast($input);
>>>>>>
>>>>>>               open(OUTFILE,'>',$debugfile);
>>>>>>            #   print OUTFILE $r;
>>>>>>               close(OUTFILE);
>>>>>>
>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>
>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>  #   open(OUTFILE,'>',$debugfile);
>>>>>>   #           print OUTFILE "while entered";
>>>>>>    #         close(OUTFILE);
>>>>>>    foreach my $rid ( @rids ) {
>>>>>>
>>>>>>     #         open(OUTFILE,'>',$debugfile);
>>>>>>      #        print OUTFILE "foreach entered";
>>>>>>       #      close(OUTFILE);
>>>>>>
>>>>>>       my $rc = $factory->retrieve_blast($rid);
>>>>>>
>>>>>>       if( !ref($rc) )
>>>>>>       {
>>>>>>       if( $rc < 0 )
>>>>>>       {
>>>>>>       $factory->remove_rid($rid);
>>>>>>       }
>>>>>>        open(OUTFILE,'>',$debugfile);
>>>>>>        #      print OUTFILE "if entered";
>>>>>>             close(OUTFILE);
>>>>>>        print STDERR "." if ( $v > 0 );
>>>>>>        sleep 5;
>>>>>>       }
>>>>>>      else {
>>>>>>         #    open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "else entered";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>         my $result = $rc->next_result();
>>>>>>        #save the output
>>>>>>       $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>
>>>>>>         open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>         print BLASTDEBUGFILE $result->next_hit();
>>>>>>         close(BLASTDEBUGFILE);
>>>>>>
>>>>>>       my $filename =
>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>
>>>>>>        # open(DEBUGFILE,'>',$debugfile);
>>>>>>        # open(new,'>',$filename);
>>>>>>        # @arra=<new>;
>>>>>>        # print DEBUGFILE @arra;
>>>>>>        # close(DEBUGFILE);
>>>>>>        # close(new);
>>>>>>
>>>>>>        $factory->save_output($filename);
>>>>>>  # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>      # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>      # close(BLASTDEBUGFILE);
>>>>>>
>>>>>>      $factory->remove_rid($rid);
>>>>>>
>>>>>>      open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>      print BLASTDEBUGFILE  $organism;
>>>>>>       close(BLASTDEBUGFILE);
>>>>>>
>>>>>>   # open(OUTFILE,'>',$outfile);
>>>>>>   # print OUTFILE "Test2 $result->database_name()";
>>>>>>   # close(OUTFILE);
>>>>>>
>>>>>> #$hit = $result->next_hit;
>>>>>> #open(new,'>',$debugfile);
>>>>>> #print $hit;
>>>>>> #close(new);
>>>>>>
>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>
>>>>>>           next unless ( $v > 0);
>>>>>>
>>>>>>         #     open(OUTFILE,'>',$debugfile);
>>>>>>          #    print OUTFILE "$hit in while hits";
>>>>>>           #  close(OUTFILE);
>>>>>>
>>>>>>      my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>          my $dna = $sequ->seq();        # get the sequence as a string
>>>>>>                 push(@seqs,$dna);
>>>>>>         }
>>>>>>       }
>>>>>>     }
>>>>>>   }
>>>>>>  }
>>>>>>
>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>  #print OUTFILE $seqs[0];
>>>>>>  #close(OUTFILE);
>>>>>>
>>>>>> return(@seqs);
>>>>>>
>>>>>> }
>>>>>>
>>>>>> Regards,
>>>>>> Roopa.
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 8, 2010 at 4:36 PM, Mark A. Jensen <maj at fortinbras.us>
>>>>>> wrote:
>>>>>>
>>>>>> Hi Roopa--
>>>>>>>
>>>>>>> I got your code to work with the following changes:
>>>>>>>
>>>>>>> +# the input should be a valid FASTA file...
>>>>>>> ...
>>>>>>> open(NUC,'>',$nuc);
>>>>>>> +print NUC ">seq (need a name line for valid fasta)\n";
>>>>>>> print NUC $inpu1, "\n";
>>>>>>> close(NUC);
>>>>>>> ...
>>>>>>>
>>>>>>> +# you can set these header parms in the call itself...
>>>>>>> - my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>> + my $factory = Bio::Tools::Run::RemoteBlast->new(@params,
>>>>>>> -ENTREZ_QUERY =>
>>>>>>> ''Trypanosoma Brucei[ORGN]');
>>>>>>>
>>>>>>>  #change a paramter
>>>>>>> +# commented this out...
>>>>>>> +# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>> 'Trypanosoma
>>>>>>> Brucei[ORGN]';
>>>>>>>
>>>>>>> MAJ
>>>>>>> ----- Original Message ----- From: "Roopa Raghuveer" <
>>>>>>> rtbio.2009 at gmail.com
>>>>>>> >
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, January 08, 2010 10:00 AM
>>>>>>> Subject: [Bioperl-l] Regarding blast in Bioperl
>>>>>>>
>>>>>>>
>>>>>>>  Hello all,
>>>>>>>
>>>>>>>>
>>>>>>>> I was trying Remote blast using Bioperl. My input data is a
>>>>>>>> Trypanosoma
>>>>>>>> brucei sequence in Fasta format. When I was trying to submit to
>>>>>>>> BLAST
>>>>>>>> using
>>>>>>>> the step
>>>>>>>> $r=$factory->submit_blast($input)
>>>>>>>> It was not returning anything which I checked by debugging the code.
>>>>>>>> It is
>>>>>>>> not blasting my input sequence even though I mentioned all the
>>>>>>>> parameters.I
>>>>>>>> would paste the code below.
>>>>>>>>
>>>>>>>> Please help me in solving put this problem. It is very urgent.
>>>>>>>>
>>>>>>>> Regards
>>>>>>>> Roopa.
>>>>>>>>
>>>>>>>> #!/usr/bin/perl
>>>>>>>>
>>>>>>>> #path for extra camel module
>>>>>>>> use lib "/srv/www/htdocs/rain/RNAi/";
>>>>>>>> use Roopablast;
>>>>>>>>
>>>>>>>>
>>>>>>>> use Bio::SearchIO;
>>>>>>>> use Bio::Search::Result::BlastResult;
>>>>>>>> use Bio::Perl;
>>>>>>>> use Bio::Tools::Run::RemoteBlast;
>>>>>>>> use Bio::Seq;
>>>>>>>> use Bio::SeqIO;
>>>>>>>> use Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> $serverpath = "/srv/www/htdocs/rain/RNAi";
>>>>>>>> $serverurl = "http://141.84.66.66/rain/RNAi";
>>>>>>>> $outfile = $serverpath."/rnairesult_".time().".html";
>>>>>>>> $nuc = $serverpath."/nuc".time().".txt";
>>>>>>>> $debugfile = $serverpath."/debug_".time().".txt";
>>>>>>>> $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>> my $outstring ="";
>>>>>>>>
>>>>>>>> &parse_form;
>>>>>>>>
>>>>>>>> print "Content-type: text/html\n\n";
>>>>>>>> print "<HTML>\n";
>>>>>>>> print "<head><title>RNAi Result</title>";
>>>>>>>> print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl/rnairesult_".time().".html\"> \n";
>>>>>>>> print "</head>\n";
>>>>>>>> print "<body>\n";
>>>>>>>> print " Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>";
>>>>>>>> print " Please be patient, runtime can be up to 5 minutes<br>";
>>>>>>>> print " This page will automatically reload in 30 seconds. Roopa";
>>>>>>>> print "</BODY>\n";
>>>>>>>> print "</HTML>\n";
>>>>>>>>
>>>>>>>> defined(my $pid = fork) or die "Can't fork: $!";
>>>>>>>> exit if $pid;
>>>>>>>> open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
>>>>>>>> open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
>>>>>>>> open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile);
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
>>>>>>>> URL=$serverurl//rnairesult_".time().".html\"> \n
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\">
>>>>>>>> </head>\n
>>>>>>>> <body>\n
>>>>>>>>  Your results will appear <a
>>>>>>>> href=$serverurl/rnairesult_".time().".html>here</a><br>
>>>>>>>>  Please be patient, runtime can be up to 5 minutes wait wait
>>>>>>>> wait......<br>
>>>>>>>> This page will automatically reload in 30 seconds Roopa <br>
>>>>>>>> </BODY>\n
>>>>>>>> </HTML>\n";
>>>>>>>>
>>>>>>>> close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> @compseqs = blastcode($in{'Inputseq'});
>>>>>>>>
>>>>>>>> $in{'Inputseq'} =~ s/>.*$//m;
>>>>>>>> $in{'Inputseq'} =~ s/[^TAGC]//gim;
>>>>>>>> $in{'Inputseq'} =~ tr/actg/ACTG/;
>>>>>>>>
>>>>>>>> @out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
>>>>>>>> $in{'Threshold'});
>>>>>>>>
>>>>>>>>
>>>>>>>> sub blastcode
>>>>>>>> {
>>>>>>>>
>>>>>>>> $inpu1= $_[0];
>>>>>>>>
>>>>>>>> #$organ= $_[1];
>>>>>>>>
>>>>>>>> open(NUC,'>',$nuc);
>>>>>>>> print NUC $inpu1;
>>>>>>>> close(NUC);
>>>>>>>>
>>>>>>>> my $prog = 'blastn';
>>>>>>>> my $db   = 'refseq_rna';
>>>>>>>> my $e_val= '1e-10';
>>>>>>>> my $organism= 'Trypanosoma Brucei';
>>>>>>>>
>>>>>>>> $gb = new Bio::DB::GenBank;
>>>>>>>>
>>>>>>>> my @params = ( '-prog' => $prog,
>>>>>>>>       '-data' => $db,
>>>>>>>>       '-expect' => $e_val,
>>>>>>>>       '-readmethod' => 'SearchIO',
>>>>>>>>       '-Organism'   => $organism );
>>>>>>>>
>>>>>>>>          # open(OUTFILE,'>',$debugfile);
>>>>>>>>           #  print OUTFILE @params;
>>>>>>>>           # close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>>>>>>>>
>>>>>>>>  #change a paramter
>>>>>>>>
>>>>>>>> $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
>>>>>>>> Brucei[ORGN]';
>>>>>>>>
>>>>>>>> #change a paramter
>>>>>>>> # $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} =
>>>>>>>> '$input2[ORGN]';
>>>>>>>>
>>>>>>>>  my $v = 1;
>>>>>>>>  #$v is just to turn on and off the messages
>>>>>>>>
>>>>>>>> my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
>>>>>>>> '-organism' => 'Trypanosoma Brucei' );
>>>>>>>>
>>>>>>>>
>>>>>>>> while (my $input = $str->next_seq())
>>>>>>>> {
>>>>>>>>  #Blast a sequence against a database:
>>>>>>>>  #Alternatively, you could  pass in a file with many
>>>>>>>>  #sequences rather than loop through sequence one at a time
>>>>>>>>  #Remove the loop starting 'while (my $input = $str->next_seq())'
>>>>>>>>  #and swap the two lines below for an example of that.
>>>>>>>>
>>>>>>>>           open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE $input;
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  my $r = $factory->submit_blast($input);    #The program stops here
>>>>>>>> it
>>>>>>>> does not return any value and it does not enter the While
>>>>>>>> loop,Please help
>>>>>>>> me in this regard.#
>>>>>>>>              open(OUTFILE,'>',$debugfile);
>>>>>>>>              print OUTFILE $r;
>>>>>>>>              close(OUTFILE);
>>>>>>>>
>>>>>>>>
>>>>>>>>  print STDERR "waiting...." if($v>0);
>>>>>>>>
>>>>>>>>  while ( my @rids = $factory->each_rid ) {
>>>>>>>>    open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "while entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>   foreach my $rid ( @rids ) {
>>>>>>>>
>>>>>>>>             open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "foreach entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>      my $rc = $factory->retrieve_blast($rid);
>>>>>>>>
>>>>>>>>      if( !ref($rc) )
>>>>>>>>      {
>>>>>>>>      if( $rc < 0 )
>>>>>>>>      {
>>>>>>>>      $factory->remove_rid($rid);
>>>>>>>>      }
>>>>>>>>       open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "if entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>       print STDERR "." if ( $v > 0 );
>>>>>>>>       sleep 5;
>>>>>>>>      }
>>>>>>>>     else {
>>>>>>>>            open(OUTFILE,'>',$debugfile);
>>>>>>>>             print OUTFILE "else entered";
>>>>>>>>            close(OUTFILE);
>>>>>>>>
>>>>>>>>        my $result = $rc->next_result();
>>>>>>>>       #save the output
>>>>>>>>      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
>>>>>>>>
>>>>>>>>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>        print BLASTDEBUGFILE $result->next_hit();
>>>>>>>>        close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>      my $filename =
>>>>>>>> $serverpath."/blastdata_".time().$result->query_name()."\.out";
>>>>>>>>
>>>>>>>>       # open(DEBUGFILE,'>',$debugfile);
>>>>>>>>       # open(new,'>',$filename);
>>>>>>>>       # @arra=<new>;
>>>>>>>>       # print DEBUGFILE @arra;
>>>>>>>>       # close(DEBUGFILE);
>>>>>>>>       # close(new);
>>>>>>>>
>>>>>>>>       $factory->save_output($filename);
>>>>>>>>
>>>>>>>>     # open(BLASTDEBUGFILE,'>',$debugfile);
>>>>>>>>     # print BLASTDEBUGFILE  "Hello $rid";
>>>>>>>>     # close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>     $factory->remove_rid($rid);
>>>>>>>>
>>>>>>>>     open(BLASTDEBUGFILE,'>',$blastdebugfile);
>>>>>>>>     print BLASTDEBUGFILE  $organism;
>>>>>>>>      close(BLASTDEBUGFILE);
>>>>>>>>
>>>>>>>>  # open(OUTFILE,'>',$outfile);
>>>>>>>>  # print OUTFILE "Test2 $result->database_name()";
>>>>>>>>  # close(OUTFILE);
>>>>>>>>
>>>>>>>> #$hit = $result->next_hit;
>>>>>>>> #open(new,'>',$debugfile);
>>>>>>>> #print $hit;
>>>>>>>> #close(new);
>>>>>>>>
>>>>>>>>  while ( my $hit = $result->next_hit ) {
>>>>>>>>
>>>>>>>>          next unless ( $v > 0);
>>>>>>>>
>>>>>>>>        #     open(OUTFILE,'>',$debugfile);
>>>>>>>>         #    print OUTFILE "$hit in while hits";
>>>>>>>>          #  close(OUTFILE);
>>>>>>>>
>>>>>>>>     my $sequ = $gb->get_Seq_by_version($hit->name);
>>>>>>>>         my $dna = $sequ->seq();        # get the sequence as a
>>>>>>>> string
>>>>>>>>                push(@seqs,$dna);
>>>>>>>>        }
>>>>>>>>      }
>>>>>>>>    }
>>>>>>>>  }
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  #open(OUTFILE,'>',$debugfile);
>>>>>>>>  #print OUTFILE $seqs[0];
>>>>>>>>  #close(OUTFILE);
>>>>>>>>
>>>>>>>> return(@seqs);
>>>>>>>>
>>>>>>>> }
>>>>>>>>
>>>>>>>> open(OUTFILE, '>',$outfile) || die ;
>>>>>>>>
>>>>>>>> print OUTFILE "<HTML>\n
>>>>>>>> <head><title>RNAi Result</title>
>>>>>>>> <meta http-equiv=\"expires\" content=\"0\"></head>\n
>>>>>>>> <body>\n
>>>>>>>> <p><font face=\"Courier, monospace font set\">
>>>>>>>> Inputsequence: <br>";
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  print OUTFILE substr ($in{'Inputseq'}, $i, 1);
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      print OUTFILE " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      print OUTFILE "<br>\n";
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> print OUTFILE "</font> <p>";
>>>>>>>>
>>>>>>>> $z=@compseqs;
>>>>>>>>
>>>>>>>> for($k=1;$k<$z;$k++) {
>>>>>>>>  print OUTFILE "<font face=\"Courier, monospace font
>>>>>>>> set\"><p>Compare
>>>>>>>> Sequence: <br>";
>>>>>>>>
>>>>>>>>  for ($i=0; $i<length ($compseqs[$k]); $i++) {
>>>>>>>>
>>>>>>>>      print OUTFILE substr ($compseqs[$k], $i, 1);
>>>>>>>>
>>>>>>>>      if ( ($i+1)%10==0){
>>>>>>>>          print OUTFILE " ";
>>>>>>>>      }
>>>>>>>>      if ( ($i+1)%60==0){
>>>>>>>>          print OUTFILE "<br>\n";
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>  print OUTFILE "<p></font>";
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<p>
>>>>>>>> Window: <br>$in{'Windowsize'}
>>>>>>>> <p>
>>>>>>>> <p>
>>>>>>>> Threshold: <br>$in{'Threshold'}
>>>>>>>> <p>";
>>>>>>>> my $j=0;
>>>>>>>>
>>>>>>>> for ($i=0; $i<length ($in{'Inputseq'}); $i++) {
>>>>>>>>
>>>>>>>>  if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
>>>>>>>>      if ($out[$i]->{similar}<=$in{'Threshold'}){
>>>>>>>>          $j=$in{'Windowsize'};
>>>>>>>>      }
>>>>>>>>      $height=$out[$i]->{similar}*5;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ($j>0) {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"green\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>      $j--;
>>>>>>>>  }
>>>>>>>>  else {
>>>>>>>>      print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
>>>>>>>> height=\"5\">";
>>>>>>>>      $outstring .= "<font color=\"red\">".substr ($in{'Inputseq'},
>>>>>>>> $i,
>>>>>>>> 1)."</font>";
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  if ( ($i+1)%10==0){
>>>>>>>>      $outstring .= " ";
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%60==0){
>>>>>>>>      $outstring .= "<br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>>  if ( ($i+1)%800==0){
>>>>>>>>      print OUTFILE "<br><br>\n";
>>>>>>>>
>>>>>>>>  }
>>>>>>>> }
>>>>>>>>
>>>>>>>> print OUTFILE "<br><br><font face=\"Courier, monospace font
>>>>>>>> set\">$outstring</font>";
>>>>>>>>
>>>>>>>> #foreach (@out) {
>>>>>>>> #print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar}
>>>>>>>> matchs<p>";
>>>>>>>> #if ($_->{similar}<=$in{'Threshold'}){
>>>>>>>>
>>>>>>>> #    }
>>>>>>>> #}
>>>>>>>>
>>>>>>>> print OUTFILE "</BODY>\n</HTML>\n";
>>>>>>>>
>>>>>>>> close OUTFILE;
>>>>>>>>
>>>>>>>> #nameprint();
>>>>>>>>
>>>>>>>> sub parse_form {
>>>>>>>>  local ($buffer, @pairs, $pair, $name, $value);
>>>>>>>>  # Read in text
>>>>>>>>  $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
>>>>>>>>  if ($ENV{'REQUEST_METHOD'} eq "POST")
>>>>>>>>  {
>>>>>>>>      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
>>>>>>>>  }
>>>>>>>>  else
>>>>>>>>  {
>>>>>>>>      $buffer = $ENV{'QUERY_STRING'};
>>>>>>>>  }
>>>>>>>>  @pairs = split(/&/, $buffer);
>>>>>>>>  foreach $pair (@pairs)
>>>>>>>>  {
>>>>>>>>      ($name, $value) = split(/=/, $pair);
>>>>>>>>      $value =~ tr/+/ /;
>>>>>>>>      $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
>>>>>>>>      $in{$name} = $value;
>>>>>>>>  }
>>>>>>>> }
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>


From maj at fortinbras.us  Fri Jan 22 12:34:59 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 07:34:59 -0500
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
Message-ID: <BB6A0E3FAC154E8FB690E5749375A1BC@NewLife>

I'm down with that.

----- Original Message ----- 
From: "Jason Stajich" <jason at bioperl.org>
To: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 1:17 AM
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO


> I'm considering putting in allowable initialization parameter (and get/ 
> set) for Bio::AlignIO that would allow setting of the alphabet.  This  
> is then passed to Bio::LocatableSeq creation so that _guess_alphabet  
> isn't called. This will allow removal of warnings about empty  
> sequences because _guess_alphabet won't be called on a sequence if we  
> have explictly set the alphabet.
> 
> This worked great on my local install and tests pass.  Any objections  
> or concerns?
> 
> basically it means when you make an AlignIO you can specify the  
> alphabet i.e.
> 
> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
> file => 'genome.fasaln');
> 
> I have some alignments with empty sequences and I think turning off  
> the warnings is appropriate where I force the alphabet choice. It  
> should also have a very modest speedup benefit too.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From avilella at gmail.com  Fri Jan 22 13:07:26 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 13:07:26 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
Message-ID: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>

Hi,

I would like to write a script that merges fragments in a Bio::SimpleAlign
object on the basis of
some $seq->display_name rule.

I basically want to start with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.234     QWERTYU-------------------
seq2.345     ----------ASDFGH----------
seq2.456     -------------------ZXCVBNM

And end with something like this:

seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM

Can people suggest any Bio::SimpleAlign methods that would help here?

Cheers,

Albert.


From maj at fortinbras.us  Fri Jan 22 13:31:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 08:31:54 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
Message-ID: <EF1FEC1B43C146B6BBF827EA56171777@NewLife>

Here's one of my favorite tricks for this: XOR mask on gap symbol.
MAJ

use Bio::SeqIO;
use Bio::Seq;
use strict; 

my $seqio = Bio::SeqIO->new( -fh => \*DATA );

my $acc = $seqio->next_seq->seq ^ '-';
while ($_ = $seqio->next_seq ) {
    $acc ^= ($_->seq ^ '-');
}
my $mrg = Bio::Seq->new( -id => 'merged',
    -seq => $acc ^ '-' );
1;


__END__
>seq2.234     
QWERTYU-------------------
>seq2.345     
----------ASDFGH----------
>seq2.456     
-------------------ZXCVBNM

----- Original Message ----- 
From: "Albert Vilella" <avilella at gmail.com>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 8:07 AM
Subject: [Bioperl-l] Merging fragments in a simplealign


> Hi,
> 
> I would like to write a script that merges fragments in a Bio::SimpleAlign
> object on the basis of
> some $seq->display_name rule.
> 
> I basically want to start with something like this:
> 
> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> seq2.234     QWERTYU-------------------
> seq2.345     ----------ASDFGH----------
> seq2.456     -------------------ZXCVBNM
> 
> And end with something like this:
> 
> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
> 
> Can people suggest any Bio::SimpleAlign methods that would help here?
> 
> Cheers,
> 
> Albert.
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Fri Jan 22 13:34:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 07:34:07 -0600
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
Message-ID: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>

Sounds good to me.  The warnings are a bit too tight on this module anyway.

I still think we have plans towards refactoring some of this, not sure how far along they are:

http://www.bioperl.org/wiki/Align_Refactor

chris

On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote:

> I'm considering putting in allowable initialization parameter (and get/set) for Bio::AlignIO that would allow setting of the alphabet.  This is then passed to Bio::LocatableSeq creation so that _guess_alphabet isn't called. This will allow removal of warnings about empty sequences because _guess_alphabet won't be called on a sequence if we have explictly set the alphabet.
> 
> This worked great on my local install and tests pass.  Any objections or concerns?
> 
> basically it means when you make an AlignIO you can specify the alphabet i.e.
> 
> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', -file => 'genome.fasaln');
> 
> I have some alignments with empty sequences and I think turning off the warnings is appropriate where I force the alphabet choice. It should also have a very modest speedup benefit too.
> 
> -jason
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Fri Jan 22 13:40:57 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 07:40:57 -0600
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <EF1FEC1B43C146B6BBF827EA56171777@NewLife>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
Message-ID: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>

May be something for the cook/scrapbook?

chris

On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:

> Here's one of my favorite tricks for this: XOR mask on gap symbol.
> MAJ
> 
> use Bio::SeqIO;
> use Bio::Seq;
> use strict; 
> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
> 
> my $acc = $seqio->next_seq->seq ^ '-';
> while ($_ = $seqio->next_seq ) {
>   $acc ^= ($_->seq ^ '-');
> }
> my $mrg = Bio::Seq->new( -id => 'merged',
>   -seq => $acc ^ '-' );
> 1;
> 
> 
> __END__
>> seq2.234     
> QWERTYU-------------------
>> seq2.345     
> ----------ASDFGH----------
>> seq2.456     
> -------------------ZXCVBNM
> 
> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 22, 2010 8:07 AM
> Subject: [Bioperl-l] Merging fragments in a simplealign
> 
> 
>> Hi,
>> I would like to write a script that merges fragments in a Bio::SimpleAlign
>> object on the basis of
>> some $seq->display_name rule.
>> I basically want to start with something like this:
>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> seq2.234     QWERTYU-------------------
>> seq2.345     ----------ASDFGH----------
>> seq2.456     -------------------ZXCVBNM
>> And end with something like this:
>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>> Can people suggest any Bio::SimpleAlign methods that would help here?
>> Cheers,
>> Albert.
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From holland at eaglegenomics.com  Fri Jan 22 10:51:52 2010
From: holland at eaglegenomics.com (Richard Holland)
Date: Fri, 22 Jan 2010 10:51:52 +0000
Subject: [Bioperl-l] [BioSQL-l] SwissProt DE lines and UniProt XML /
	TagTree as XML in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <8FECCBDE-2DE1-40EE-B5A4-73BDAC893E2D@eaglegenomics.com>

Nice idea. Currently, BioJava just stores the complete section as a string without parsing it, but it provides a parser module for converting it into useful tag/value format within a user's program (but not to be stored in BioSQL).

On 21 Jan 2010, at 12:33, Peter wrote:

> Hi all,
> 
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
> 
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
> 
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
> 
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
> 
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
> 
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
> 
> Regards,
> 
> Peter
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biosql-l

--
Richard Holland, BSc MBCS
Operations and Delivery Director, Eagle Genomics Ltd
T: +44 (0)1223 654481 ext 3 | E: holland at eaglegenomics.com
http://www.eaglegenomics.com/


From andrea at biocomp.unibo.it  Fri Jan 22 12:18:32 2010
From: andrea at biocomp.unibo.it (Andrea Pierleoni)
Date: Fri, 22 Jan 2010 13:18:32 +0100 (CET)
Subject: [Bioperl-l] SwissProt DE lines and UniProt XML / TagTree as XML
	in BioSQL
In-Reply-To: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
References: <320fb6e01001210433n6f42e617g6519ee2790d6add5@mail.gmail.com>
Message-ID: <2b6e30c4628585042366646a7b46386e.squirrel@lipid.biocomp.unibo.it>

I think that the point here can be a little broader, since not only the
swissprot DE lines carry complex and structured data.
To define a common, language-independent way to store structured data into
the comment and *_qualifier_value tables of the actual BioSQL schema could
be very useful.
XML looks like a good candidate to me, and the UniprotXML format can be
used as reference or as a template to start from.
Each Bio* project will then parse and report this structured data in its
own programming language data structure.

Andrea


> Hi all,
>
> This is cross posted to try and ensure relevant people see it.
> I suggest we continue the discussion on the BioSQL list
> (for how to serialise structured annotation to BioSQL), and/or
> the OpenBio list (for things like file format naming conventions).
>
> I am hoping we (Bio*) can be consistent in how we parse and load
> into BioSQL the SwissProt DE lines (known as "swiss" format in
> both BioPerl and Biopython's SeqIO, and by EMBOSS) or the
> equivalent UniProt XML tags (which we are tentatively going to
> call the "uniprot" format in Biopython's SeqIO - comments?).
>
> Like BioPerl (etc), Biopython can parse plain text SwissProt ("swiss")
> files and load them into BioSQL. Biopython currently treats the DE
> comment lines as a long string, as BioPerl used to:
>
> http://lists.open-bio.org/pipermail/bioperl-l/2009-May/030041.html
> http://lists.open-bio.org/pipermail/biosql-l/2009-May/001514.html
>
> I understand that BioPerl now turns the SwissProt DE lines into a
> TagTree, and for storing this in BioSQL this gets serialised as XML.
> I would like Biopython to handle this the same way (although rather
> than a Perl TagTree, we'd use a Python structure of course), and
> would appreciate clarification of what exactly was implemented
> (e.g. which bit of the BioPerl source code should be look at,
> and could you show a worked example?).
>
> Andrea Pierlenoin (CC'd - not sure if he is on the BioSQL or
> Open-Bio lists yet) has started work on parsing UniProt XML
> files for Biopython. Here the DE comment lines are already
> provided broken up with XML markup. Hopefully their nested
> structure matches what BioPerl was doing with the SwissProt
> DE lines.
>
> Regards,
>
> Peter
>


From avilella at gmail.com  Fri Jan 22 16:04:13 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 16:04:13 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
Message-ID: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>

Is there/should be a 'have_pairwise_overlap' method similar to this?

# $seq1 and $seq3 have matching ids
my $seq1 = $aln->each_seq_by_id($seq1->display_id);
my $seq3 = $aln->each_seq_by_id($seq3->display_id);

my $ret = $aln->have_pairwise_overlap($seq1,$seq3);

On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu> wrote:

> May be something for the cook/scrapbook?
>
> chris
>
> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>
> > Here's one of my favorite tricks for this: XOR mask on gap symbol.
> > MAJ
> >
> > use Bio::SeqIO;
> > use Bio::Seq;
> > use strict;
> > my $seqio = Bio::SeqIO->new( -fh => \*DATA );
> >
> > my $acc = $seqio->next_seq->seq ^ '-';
> > while ($_ = $seqio->next_seq ) {
> >   $acc ^= ($_->seq ^ '-');
> > }
> > my $mrg = Bio::Seq->new( -id => 'merged',
> >   -seq => $acc ^ '-' );
> > 1;
> >
> >
> > __END__
> >> seq2.234
> > QWERTYU-------------------
> >> seq2.345
> > ----------ASDFGH----------
> >> seq2.456
> > -------------------ZXCVBNM
> >
> > ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
> > To: <bioperl-l at lists.open-bio.org>
> > Sent: Friday, January 22, 2010 8:07 AM
> > Subject: [Bioperl-l] Merging fragments in a simplealign
> >
> >
> >> Hi,
> >> I would like to write a script that merges fragments in a
> Bio::SimpleAlign
> >> object on the basis of
> >> some $seq->display_name rule.
> >> I basically want to start with something like this:
> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> >> seq2.234     QWERTYU-------------------
> >> seq2.345     ----------ASDFGH----------
> >> seq2.456     -------------------ZXCVBNM
> >> And end with something like this:
> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
> >> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
> >> Can people suggest any Bio::SimpleAlign methods that would help here?
> >> Cheers,
> >> Albert.
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From maj at fortinbras.us  Fri Jan 22 16:02:55 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 11:02:55 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
Message-ID: <BE7957A2791345DAB092D997A4656AA8@NewLife>

http://www.bioperl.org/wiki/Merge_gapped_sequences_across_a_common_region
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "Albert Vilella" <avilella at gmail.com>; <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 8:40 AM
Subject: Re: [Bioperl-l] Merging fragments in a simplealign


> May be something for the cook/scrapbook?
> 
> chris
> 
> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
> 
>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>> MAJ
>> 
>> use Bio::SeqIO;
>> use Bio::Seq;
>> use strict; 
>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>> 
>> my $acc = $seqio->next_seq->seq ^ '-';
>> while ($_ = $seqio->next_seq ) {
>>   $acc ^= ($_->seq ^ '-');
>> }
>> my $mrg = Bio::Seq->new( -id => 'merged',
>>   -seq => $acc ^ '-' );
>> 1;
>> 
>> 
>> __END__
>>> seq2.234     
>> QWERTYU-------------------
>>> seq2.345     
>> ----------ASDFGH----------
>>> seq2.456     
>> -------------------ZXCVBNM
>> 
>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 22, 2010 8:07 AM
>> Subject: [Bioperl-l] Merging fragments in a simplealign
>> 
>> 
>>> Hi,
>>> I would like to write a script that merges fragments in a Bio::SimpleAlign
>>> object on the basis of
>>> some $seq->display_name rule.
>>> I basically want to start with something like this:
>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>> seq2.234     QWERTYU-------------------
>>> seq2.345     ----------ASDFGH----------
>>> seq2.456     -------------------ZXCVBNM
>>> And end with something like this:
>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>> Cheers,
>>> Albert.
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
>


From avilella at gmail.com  Fri Jan 22 17:50:57 2010
From: avilella at gmail.com (Albert Vilella)
Date: Fri, 22 Jan 2010 17:50:57 +0000
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
	<358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
Message-ID: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>

Or to rephrase my answer, what is the closest way for the code below that
already exists?

On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:

> Is there/should be a 'have_pairwise_overlap' method similar to this?
>
> # $seq1 and $seq3 have matching ids
> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>
> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>
>
> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>
>> May be something for the cook/scrapbook?
>>
>> chris
>>
>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>
>> > Here's one of my favorite tricks for this: XOR mask on gap symbol.
>> > MAJ
>> >
>> > use Bio::SeqIO;
>> > use Bio::Seq;
>> > use strict;
>> > my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>> >
>> > my $acc = $seqio->next_seq->seq ^ '-';
>> > while ($_ = $seqio->next_seq ) {
>> >   $acc ^= ($_->seq ^ '-');
>> > }
>> > my $mrg = Bio::Seq->new( -id => 'merged',
>> >   -seq => $acc ^ '-' );
>> > 1;
>> >
>> >
>> > __END__
>> >> seq2.234
>> > QWERTYU-------------------
>> >> seq2.345
>> > ----------ASDFGH----------
>> >> seq2.456
>> > -------------------ZXCVBNM
>> >
>> > ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>> >
>> > To: <bioperl-l at lists.open-bio.org>
>> > Sent: Friday, January 22, 2010 8:07 AM
>> > Subject: [Bioperl-l] Merging fragments in a simplealign
>> >
>> >
>> >> Hi,
>> >> I would like to write a script that merges fragments in a
>> Bio::SimpleAlign
>> >> object on the basis of
>> >> some $seq->display_name rule.
>> >> I basically want to start with something like this:
>> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> >> seq2.234     QWERTYU-------------------
>> >> seq2.345     ----------ASDFGH----------
>> >> seq2.456     -------------------ZXCVBNM
>> >> And end with something like this:
>> >> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>> >> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>> >> Can people suggest any Bio::SimpleAlign methods that would help here?
>> >> Cheers,
>> >> Albert.
>> >> _______________________________________________
>> >> Bioperl-l mailing list
>> >> Bioperl-l at lists.open-bio.org
>> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> >>
>> > _______________________________________________
>> > Bioperl-l mailing list
>> > Bioperl-l at lists.open-bio.org
>> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>


From jay at jays.net  Fri Jan 22 18:30:57 2010
From: jay at jays.net (Jay Hannah)
Date: Fri, 22 Jan 2010 12:30:57 -0600
Subject: [Bioperl-l] Bio::BroodComb - RFC
In-Reply-To: <BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>
References: <638696D6-7529-4717-A05A-F1E8FF1C5A8F@jays.net>
	<BE9B5C61-D45E-4DC3-B543-52D96DAB5685@illinois.edu>
Message-ID: <EAD0FFCE-6DDF-4723-8D08-70ECF157FAAA@jays.net>

On Jan 21, 2010, at 10:31 PM, Chris Fields wrote:
> Did you want to release it to CPAN?  I'll take a closer look at the docs to get an idea of what you are doing with it, but from my perspective I can see this becoming a nice general use tool akin to Bio::Perl, maybe a bit more lightweight.

Yes, I was thinking I would. No one has (yet) told me it's the worst idea ever, so I'm feeling encouraged.  :)

Given smallish inputs / databases (up to a few million rows) where some lightweight schema + SQLite + BioPerl can get the job done, it's nice to have a little easy-to-run toolbox. New tables and Roles bolt on easily, so I'll be adding them as they surface at $work[1]. 

Thanks for your interest.   :)

Jay Hannah
http://github.com/jhannah/bio-broodcomb
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From dalalhina at gmail.com  Fri Jan 22 17:31:09 2010
From: dalalhina at gmail.com (hina dalal)
Date: Fri, 22 Jan 2010 17:31:09 +0000
Subject: [Bioperl-l] Bioperl installation failed
Message-ID: <425f75df1001220931t49f5c768j97d91d2dd1757f19@mail.gmail.com>

Hi


I have installed PERL from Activesate and now trying to install bioperl but
can not do it . Neither from PPM (it is showing error ?Ppm install failed:
404 not found?) nor from CPAN / manual installation. It is not allowing me
to download nmake, showing that ?the version of this file is not compatible
with the version of windows you are running. Check your computer system
information to see whether you need 32 bit or 64 bit of this program.? I am
using windows VISTA.


Please help.


Regards


Hina


From H.Dalal at sms.ed.ac.uk  Fri Jan 22 17:34:55 2010
From: H.Dalal at sms.ed.ac.uk (Hina Dalal)
Date: Fri, 22 Jan 2010 17:34:55 +0000
Subject: [Bioperl-l] BioPerl installation failed: please help
Message-ID: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>

Hi

I have installed PERL from Activesate and now trying to install  
bioperl but can not do it . Neither from PPM (it is showing error ?Ppm  
install failed: 404 not found?) nor from CPAN manual installation. It  
is not allowing me to download nmake, showing that ?the version of  
this file is not compatible with the version of windows you are  
running. Check your computer system information to see whether you  
need 32 bit or 64 bit of this program.?

Please help.

Regards

Hina


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From jason at bioperl.org  Fri Jan 22 19:18:30 2010
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 22 Jan 2010 11:18:30 -0800
Subject: [Bioperl-l] forcing alphabet in Bio::AlignIO
In-Reply-To: <55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>
References: <D8B9904F-B381-4153-BFCC-FBA76E55E4C5@bioperl.org>
	<55F51BAA-7DA5-4F32-B680-DCAE1714A5F1@illinois.edu>
Message-ID: <59EC9331-FB2F-4338-AD58-2D501A528A18@bioperl.org>

Done, as of r16739. Look forward to the refactor work too.

-jason
On Jan 22, 2010, at 5:34 AM, Chris Fields wrote:

> Sounds good to me.  The warnings are a bit too tight on this module  
> anyway.
>
> I still think we have plans towards refactoring some of this, not  
> sure how far along they are:
>
> http://www.bioperl.org/wiki/Align_Refactor
>
> chris
>
> On Jan 22, 2010, at 12:17 AM, Jason Stajich wrote:
>
>> I'm considering putting in allowable initialization parameter (and  
>> get/set) for Bio::AlignIO that would allow setting of the  
>> alphabet.  This is then passed to Bio::LocatableSeq creation so  
>> that _guess_alphabet isn't called. This will allow removal of  
>> warnings about empty sequences because _guess_alphabet won't be  
>> called on a sequence if we have explictly set the alphabet.
>>
>> This worked great on my local install and tests pass.  Any  
>> objections or concerns?
>>
>> basically it means when you make an AlignIO you can specify the  
>> alphabet i.e.
>>
>> my $in = Bio::AlignIO->new(-format => 'fasta', -alphabet => 'dna', - 
>> file => 'genome.fasaln');
>>
>> I have some alignments with empty sequences and I think turning off  
>> the warnings is appropriate where I force the alphabet choice. It  
>> should also have a very modest speedup benefit too.
>>
>> -jason
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>> http://twitter.com/hyphaltip
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From cjfields at illinois.edu  Fri Jan 22 19:22:43 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 22 Jan 2010 13:22:43 -0600
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com>
	<EF1FEC1B43C146B6BBF827EA56171777@NewLife>
	<058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu>
	<358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com>
	<358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
Message-ID: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>

This could exist, but should go into a general Utilities module.  Part of the Align refactoring was to pull a good number of the methods into a general utilities module, so this would fit into that category.

chris

On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote:

> Or to rephrase my answer, what is the closest way for the code below that
> already exists?
> 
> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:
> 
>> Is there/should be a 'have_pairwise_overlap' method similar to this?
>> 
>> # $seq1 and $seq3 have matching ids
>> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
>> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>> 
>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>> 
>> 
>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>> 
>>> May be something for the cook/scrapbook?
>>> 
>>> chris
>>> 
>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>> 
>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>>>> MAJ
>>>> 
>>>> use Bio::SeqIO;
>>>> use Bio::Seq;
>>>> use strict;
>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>>>> 
>>>> my $acc = $seqio->next_seq->seq ^ '-';
>>>> while ($_ = $seqio->next_seq ) {
>>>>  $acc ^= ($_->seq ^ '-');
>>>> }
>>>> my $mrg = Bio::Seq->new( -id => 'merged',
>>>>  -seq => $acc ^ '-' );
>>>> 1;
>>>> 
>>>> 
>>>> __END__
>>>>> seq2.234
>>>> QWERTYU-------------------
>>>>> seq2.345
>>>> ----------ASDFGH----------
>>>>> seq2.456
>>>> -------------------ZXCVBNM
>>>> 
>>>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>>>> 
>>>> To: <bioperl-l at lists.open-bio.org>
>>>> Sent: Friday, January 22, 2010 8:07 AM
>>>> Subject: [Bioperl-l] Merging fragments in a simplealign
>>>> 
>>>> 
>>>>> Hi,
>>>>> I would like to write a script that merges fragments in a
>>> Bio::SimpleAlign
>>>>> object on the basis of
>>>>> some $seq->display_name rule.
>>>>> I basically want to start with something like this:
>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>> seq2.234     QWERTYU-------------------
>>>>> seq2.345     ----------ASDFGH----------
>>>>> seq2.456     -------------------ZXCVBNM
>>>>> And end with something like this:
>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>>>> Cheers,
>>>>> Albert.
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> 
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Fri Jan 22 19:29:07 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 14:29:07 -0500
Subject: [Bioperl-l] Merging fragments in a simplealign
In-Reply-To: <14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>
References: <358f4d651001220507g56051c54p6741d682ca148164@mail.gmail.com><EF1FEC1B43C146B6BBF827EA56171777@NewLife><058430B4-F149-438C-A9DD-C9E24545FDBB@illinois.edu><358f4d651001220804o3a96252ctc4b721771668f1ba@mail.gmail.com><358f4d651001220950v605a5b55kef0302ff6270f82e@mail.gmail.com>
	<14824B66-2112-46A1-98BB-02FC592A3A9B@illinois.edu>
Message-ID: <0F7B7E5FE70D4C5CB34B27045561823C@NewLife>

I'd recommend making an enhancement request via Bugzilla, so we don't forget-
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Albert Vilella" <avilella at gmail.com>
Cc: "bioperl-l" <Bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 2:22 PM
Subject: Re: [Bioperl-l] Merging fragments in a simplealign


> This could exist, but should go into a general Utilities module.  Part of the 
> Align refactoring was to pull a good number of the methods into a general 
> utilities module, so this would fit into that category.
>
> chris
>
> On Jan 22, 2010, at 11:50 AM, Albert Vilella wrote:
>
>> Or to rephrase my answer, what is the closest way for the code below that
>> already exists?
>>
>> On Fri, Jan 22, 2010 at 4:04 PM, Albert Vilella <avilella at gmail.com> wrote:
>>
>>> Is there/should be a 'have_pairwise_overlap' method similar to this?
>>>
>>> # $seq1 and $seq3 have matching ids
>>> my $seq1 = $aln->each_seq_by_id($seq1->display_id);
>>> my $seq3 = $aln->each_seq_by_id($seq3->display_id);
>>>
>>> my $ret = $aln->have_pairwise_overlap($seq1,$seq3);
>>>
>>>
>>> On Fri, Jan 22, 2010 at 1:40 PM, Chris Fields <cjfields at illinois.edu>wrote:
>>>
>>>> May be something for the cook/scrapbook?
>>>>
>>>> chris
>>>>
>>>> On Jan 22, 2010, at 7:31 AM, Mark A. Jensen wrote:
>>>>
>>>>> Here's one of my favorite tricks for this: XOR mask on gap symbol.
>>>>> MAJ
>>>>>
>>>>> use Bio::SeqIO;
>>>>> use Bio::Seq;
>>>>> use strict;
>>>>> my $seqio = Bio::SeqIO->new( -fh => \*DATA );
>>>>>
>>>>> my $acc = $seqio->next_seq->seq ^ '-';
>>>>> while ($_ = $seqio->next_seq ) {
>>>>>  $acc ^= ($_->seq ^ '-');
>>>>> }
>>>>> my $mrg = Bio::Seq->new( -id => 'merged',
>>>>>  -seq => $acc ^ '-' );
>>>>> 1;
>>>>>
>>>>>
>>>>> __END__
>>>>>> seq2.234
>>>>> QWERTYU-------------------
>>>>>> seq2.345
>>>>> ----------ASDFGH----------
>>>>>> seq2.456
>>>>> -------------------ZXCVBNM
>>>>>
>>>>> ----- Original Message ----- From: "Albert Vilella" <avilella at gmail.com
>>>>>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Friday, January 22, 2010 8:07 AM
>>>>> Subject: [Bioperl-l] Merging fragments in a simplealign
>>>>>
>>>>>
>>>>>> Hi,
>>>>>> I would like to write a script that merges fragments in a
>>>> Bio::SimpleAlign
>>>>>> object on the basis of
>>>>>> some $seq->display_name rule.
>>>>>> I basically want to start with something like this:
>>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>>> seq2.234     QWERTYU-------------------
>>>>>> seq2.345     ----------ASDFGH----------
>>>>>> seq2.456     -------------------ZXCVBNM
>>>>>> And end with something like this:
>>>>>> seq1.123     QWERTYUIOPASDFGHJKLZXCVBNM
>>>>>> seq2.mrg     QWERTYU---ASDFGH---ZXCVBNM
>>>>>> Can people suggest any Bio::SimpleAlign methods that would help here?
>>>>>> Cheers,
>>>>>> Albert.
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>>
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From maj at fortinbras.us  Fri Jan 22 19:33:41 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 14:33:41 -0500
Subject: [Bioperl-l] BioPerl installation failed: please help
In-Reply-To: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>
References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk>
Message-ID: <2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife>

Hina-- 
See the protocol at 
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
for ActiveState installation. If it doesn't work, please let us know at which 
step the failure happened.
cheers, MAJ
----- Original Message ----- 
From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Friday, January 22, 2010 12:34 PM
Subject: [Bioperl-l] BioPerl installation failed: please help


Hi

I have installed PERL from Activesate and now trying to install
bioperl but can not do it . Neither from PPM (it is showing error "Ppm
install failed: 404 not found") nor from CPAN manual installation. It
is not allowing me to download nmake, showing that "the version of
this file is not compatible with the version of windows you are
running. Check your computer system information to see whether you
need 32 bit or 64 bit of this program."

Please help.

Regards

Hina


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Fri Jan 22 20:13:15 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 22 Jan 2010 15:13:15 -0500
Subject: [Bioperl-l] BioPerl installation failed: please help
In-Reply-To: <20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk>
References: <20100122173455.c19sbarmswswgswc@www.sms.ed.ac.uk><2ABEC492CD49450EAE4BFC7BA763E3DB@NewLife>
	<20100122200118.053j5kc36skow0wg@www.sms.ed.ac.uk>
Message-ID: <9E5DE384E2C8416B8373E390ABDB7DFE@NewLife>

Ok Hina,
I'm not seeing any issues with the presence or availability of 
http://bioperl.org/DIST
from my machine. Can you access that url in a browser? If not, the king of the 
King's
Buildings may not be allowing access. Also, can you do the following:

C:> ppm-shell
ppm> repo list

Note the number of the repo that corresponds to bioperl (if any) and do

ppm> repo describe n

where 'n' is that number, and send the output along.

cheers, MAJ

----- Original Message ----- 
From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Sent: Friday, January 22, 2010 3:01 PM
Subject: Re: [Bioperl-l] BioPerl installation failed: please help


Hi Mark

warm regards

I was following that protocol only , but the problem is when I tried
to do it from PPM, and when I reach at the stem install BioPerl, it is
showing error "Ppm
install failed: 404 not found" in the end. and when I tried it by CPAN
/manual installation, I couldn't download nmake,its showing that "the
version of
this file is not compatible with the version of windows you are
running. Check your computer system information to see whether you
need 32 bit or 64 bit of this program and than contact the software
publisher."


What should I do? Please help.

Regards

Hina


Quoting "Mark A. Jensen" <maj at fortinbras.us>:

> Hina-- See the protocol at
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows#Comand-line_Installation
> for ActiveState installation. If it doesn't work, please let us know at
> which step the failure happened.
> cheers, MAJ
> ----- Original Message ----- From: "Hina Dalal" <H.Dalal at sms.ed.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Friday, January 22, 2010 12:34 PM
> Subject: [Bioperl-l] BioPerl installation failed: please help
>
>
> Hi
>
> I have installed PERL from Activesate and now trying to install
> bioperl but can not do it . Neither from PPM (it is showing error "Ppm
> install failed: 404 not found") nor from CPAN manual installation. It
> is not allowing me to download nmake, showing that "the version of
> this file is not compatible with the version of windows you are
> running. Check your computer system information to see whether you
> need 32 bit or 64 bit of this program."
>
> Please help.
>
> Regards
>
> Hina
>
>
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


From pengyu.ut at gmail.com  Mon Jan 25 01:29:59 2010
From: pengyu.ut at gmail.com (Peng Yu)
Date: Sun, 24 Jan 2010 19:29:59 -0600
Subject: [Bioperl-l] Transcribe in bioperl
Message-ID: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>

I found the function 'translate' in bioperl. But I don't find
'transcribe'. Is there such a function?


From jason at bioperl.org  Mon Jan 25 02:06:48 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 24 Jan 2010 18:06:48 -0800
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
Message-ID: <BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>

What exactly do you want to do?
spliced_seq for a feature would be the closest thing...

-jason
On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:

> I found the function 'translate' in bioperl. But I don't find
> 'transcribe'. Is there such a function?
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From pengyu.ut at gmail.com  Mon Jan 25 02:22:12 2010
From: pengyu.ut at gmail.com (Peng Yu)
Date: Sun, 24 Jan 2010 20:22:12 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com>
	<BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
Message-ID: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>

To convert from T to U. I could use perl's builtin function. But it is
semantically far away from 'transcribe'. If there is a function with
name 'transcribe', it will be better.

On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
> What exactly do you want to do?
> spliced_seq for a feature would be the closest thing...
>
> -jason
> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>
>> I found the function 'translate' in bioperl. But I don't find
>> 'transcribe'. Is there such a function?
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
>
>


From maj at fortinbras.us  Mon Jan 25 02:48:33 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Sun, 24 Jan 2010 21:48:33 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
Message-ID: <FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>

Not a bad idea, a semantics-preserving/checking thing. 
transcribe() could return an object with alphabet == 'rna'
and the T's flipped, or bork if called against an object with alphbet != 'dna'.
I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to 
be stashed), if desired.

----- Original Message ----- 
From: "Peng Yu" <pengyu.ut at gmail.com>
To: "Jason Stajich" <jason at bioperl.org>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Sunday, January 24, 2010 9:22 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


> To convert from T to U. I could use perl's builtin function. But it is
> semantically far away from 'transcribe'. If there is a function with
> name 'transcribe', it will be better.
> 
> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>> What exactly do you want to do?
>> spliced_seq for a feature would be the closest thing...
>>
>> -jason
>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>
>>> I found the function 'translate' in bioperl. But I don't find
>>> 'transcribe'. Is there such a function?
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> Jason Stajich
>> jason.stajich at gmail.com
>> jason at bioperl.org
>> http://fungalgenomes.org/
>> http://twitter.com/hyphaltip
>>
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From cjfields at illinois.edu  Mon Jan 25 04:39:43 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 22:39:43 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
	<FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
Message-ID: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>

I think the main reason there hasn't been a transcribe() is that very few users ask for it.  Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() and/or translate() (i.e. they don't care about the intermediate mRNA).  I don't have a problem with adding a transcribe method to PrimarySeq, but (and Mark has already picked up on this) it should be constrained to DNA only and return RNA.  And there might be a case for adding the analogous reverse_translate().  

Also worth adding this to the proper interface class (PrimarySeqI, I think) so all Seq/PrimarySeq will have it (or have to implement their own).

chris

On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote:

> Not a bad idea, a semantics-preserving/checking thing. transcribe() could return an object with alphabet == 'rna'
> and the T's flipped, or bork if called against an object with alphbet != 'dna'.
> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to be stashed), if desired.
> 
> ----- Original Message ----- From: "Peng Yu" <pengyu.ut at gmail.com>
> To: "Jason Stajich" <jason at bioperl.org>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Sunday, January 24, 2010 9:22 PM
> Subject: Re: [Bioperl-l] Transcribe in bioperl
> 
> 
>> To convert from T to U. I could use perl's builtin function. But it is
>> semantically far away from 'transcribe'. If there is a function with
>> name 'transcribe', it will be better.
>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>>> What exactly do you want to do?
>>> spliced_seq for a feature would be the closest thing...
>>> 
>>> -jason
>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>> 
>>>> I found the function 'translate' in bioperl. But I don't find
>>>> 'transcribe'. Is there such a function?
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> 
>>> --
>>> Jason Stajich
>>> jason.stajich at gmail.com
>>> jason at bioperl.org
>>> http://fungalgenomes.org/
>>> http://twitter.com/hyphaltip
>>> 
>>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 25 04:43:07 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 22:43:07 -0600
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org>
	<366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com>
	<FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
	<B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
Message-ID: <489E0B85-0BC3-45DB-8660-494CF69F35FF@illinois.edu>


On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:

> ...And there might be a case for adding the analogous reverse_translate().  

Bah.  Meant reverse_transcribe().  Ah well.

chris


From dan.kortschak at adelaide.edu.au  Mon Jan 25 05:33:28 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Mon, 25 Jan 2010 16:03:28 +1030
Subject: [Bioperl-l] BEDTools module
Message-ID: <1264397608.4898.9.camel@epistle>

Hi All,

A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
and Ira Hall is now available in the bioperl-run subversion repository
(bioperl-run/trunk r16754).

Using BEDTools you can, among other things:

      * Intersecting two BED files in search of overlapping features.
      * Merging overlapping features.
      * Screening for paired-end (PE) overlaps between PE sequences and
        existing genomic features.
      * Calculating the depth and breadth of sequence coverage across
        defined "windows" in a genome.

(see <http://code.google.com/p/bedtools/> for manuals and downloads).

BEDTools is a suite of 17 commandline executable. The module attempts to
provide and options comprehensively and can return Bio::SeqIO or
Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
where specific handling has not been implemented - please give feedback
on desired features for this).

cheers
Dan


From cjfields at illinois.edu  Mon Jan 25 05:35:06 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Sun, 24 Jan 2010 23:35:06 -0600
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
Message-ID: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>

Just a quick question for those using DNAStatistics.  I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data:

>seq1
GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>seq2
GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>seq3
GGTACCAGCAGGTGGTCCGCCTA------------------------------
>seq4
--------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC

Since seq3 and seq4 don't overlap, the distance can't be calculated.  In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage.  Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere?

chris


From jason at bioperl.org  Mon Jan 25 05:58:03 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sun, 24 Jan 2010 21:58:03 -0800
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
In-Reply-To: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
Message-ID: <B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>

It could also return -1 which is used as place holder for NA in other  
programs that generate distance matrices.
-jason
On Jan 24, 2010, at 9:35 PM, Chris Fields wrote:

> Just a quick question for those using DNAStatistics.  I just fixed a  
> bug in Bio::Align::DNAStatistics that failed with a div by zero  
> error (bug 2901) on this data:
>
>> seq1
> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>> seq2
> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>> seq3
> GGTACCAGCAGGTGGTCCGCCTA------------------------------
>> seq4
> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC
>
> Since seq3 and seq4 don't overlap, the distance can't be  
> calculated.  In our case, I replace the score with 'NA' as a  
> placeholder, but I'm worried about downstream app breakage.  Anyone  
> have an objection to using 'NA' here, or know of ways this may lead  
> to problems elsewhere?
>
> chris
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From maj at fortinbras.us  Mon Jan 25 13:17:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 08:17:54 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
References: <366c6f341001241729t155483a7q65631b2495c338b0@mail.gmail.com><BB7EADC2-3463-4B03-A707-E6BB18618246@bioperl.org><366c6f341001241822t790c19eeo5f5facffa0c35600@mail.gmail.com><FEE5A7EBA9EF4D7A87DA96FCEDB4030A@NewLife>
	<B2FD3BBC-A32F-475C-BF2E-FD0A51F0D96B@illinois.edu>
Message-ID: <ED0F320909EF4DB99FF0C91423F83209@NewLife>

transcribe() and rev_transcribe added to Bio::PrimarySeqI, plus tests in 
t/Seq.t, @ r16757
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Peng Yu" <pengyu.ut at gmail.com>
Sent: Sunday, January 24, 2010 11:39 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


>I think the main reason there hasn't been a transcribe() is that very few users 
>ask for it.  Most just use a quick '$seq =~ tr/T/U/', or use spliced_seq() 
>and/or translate() (i.e. they don't care about the intermediate mRNA).  I don't 
>have a problem with adding a transcribe method to PrimarySeq, but (and Mark has 
>already picked up on this) it should be constrained to DNA only and return RNA. 
>And there might be a case for adding the analogous reverse_translate().
>
> Also worth adding this to the proper interface class (PrimarySeqI, I think) so 
> all Seq/PrimarySeq will have it (or have to implement their own).
>
> chris
>
> On Jan 24, 2010, at 8:48 PM, Mark A. Jensen wrote:
>
>> Not a bad idea, a semantics-preserving/checking thing. transcribe() could 
>> return an object with alphabet == 'rna'
>> and the T's flipped, or bork if called against an object with alphbet != 
>> 'dna'.
>> I can add such a thing to Bio::PrimarySeqI (where all these doodads seem to 
>> be stashed), if desired.
>>
>> ----- Original Message ----- From: "Peng Yu" <pengyu.ut at gmail.com>
>> To: "Jason Stajich" <jason at bioperl.org>
>> Cc: <bioperl-l at lists.open-bio.org>
>> Sent: Sunday, January 24, 2010 9:22 PM
>> Subject: Re: [Bioperl-l] Transcribe in bioperl
>>
>>
>>> To convert from T to U. I could use perl's builtin function. But it is
>>> semantically far away from 'transcribe'. If there is a function with
>>> name 'transcribe', it will be better.
>>> On Sun, Jan 24, 2010 at 8:06 PM, Jason Stajich <jason at bioperl.org> wrote:
>>>> What exactly do you want to do?
>>>> spliced_seq for a feature would be the closest thing...
>>>>
>>>> -jason
>>>> On Jan 24, 2010, at 5:29 PM, Peng Yu wrote:
>>>>
>>>>> I found the function 'translate' in bioperl. But I don't find
>>>>> 'transcribe'. Is there such a function?
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>
>>>> --
>>>> Jason Stajich
>>>> jason.stajich at gmail.com
>>>> jason at bioperl.org
>>>> http://fungalgenomes.org/
>>>> http://twitter.com/hyphaltip
>>>>
>>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From cjfields at illinois.edu  Mon Jan 25 13:23:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 07:23:12 -0600
Subject: [Bioperl-l] BEDTools module
In-Reply-To: <1264397608.4898.9.camel@epistle>
References: <1264397608.4898.9.camel@epistle>
Message-ID: <0F5CE93E-0E6C-4317-806B-A463A9B0917E@illinois.edu>

Great work Dan!  

chris

On Jan 24, 2010, at 11:33 PM, Dan Kortschak wrote:

> Hi All,
> 
> A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
> and Ira Hall is now available in the bioperl-run subversion repository
> (bioperl-run/trunk r16754).
> 
> Using BEDTools you can, among other things:
> 
>      * Intersecting two BED files in search of overlapping features.
>      * Merging overlapping features.
>      * Screening for paired-end (PE) overlaps between PE sequences and
>        existing genomic features.
>      * Calculating the depth and breadth of sequence coverage across
>        defined "windows" in a genome.
> 
> (see <http://code.google.com/p/bedtools/> for manuals and downloads).
> 
> BEDTools is a suite of 17 commandline executable. The module attempts to
> provide and options comprehensively and can return Bio::SeqIO or
> Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
> where specific handling has not been implemented - please give feedback
> on desired features for this).
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Mon Jan 25 13:27:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 07:27:26 -0600
Subject: [Bioperl-l] Distance between non-overlapping sequences in
	DNAStatistics
In-Reply-To: <B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>
References: <192B6949-26CA-45EA-A4E6-FD89F216CA84@illinois.edu>
	<B0F205C8-BA0C-4BF8-9969-5B8AF7172342@bioperl.org>
Message-ID: <D46CA8B2-780B-4AA5-B9E3-07EADC0D79C1@illinois.edu>

That works for me, just want to ensure we're DTRT.  I'll change it over.

chris

On Jan 24, 2010, at 11:58 PM, Jason Stajich wrote:

> It could also return -1 which is used as place holder for NA in other programs that generate distance matrices.
> -jason
> On Jan 24, 2010, at 9:35 PM, Chris Fields wrote:
> 
>> Just a quick question for those using DNAStatistics.  I just fixed a bug in Bio::Align::DNAStatistics that failed with a div by zero error (bug 2901) on this data:
>> 
>>> seq1
>> GGTACCAGCAGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>>> seq2
>> GGTACCAGCTGGTGGGCCGCCTACTGCGCACGCGCGGGTTTGCGGGCAGCCGC
>>> seq3
>> GGTACCAGCAGGTGGTCCGCCTA------------------------------
>>> seq4
>> --------------------------CGCACGCGCGTGTTTGCGGGCAGCCGC
>> 
>> Since seq3 and seq4 don't overlap, the distance can't be calculated.  In our case, I replace the score with 'NA' as a placeholder, but I'm worried about downstream app breakage.  Anyone have an objection to using 'NA' here, or know of ways this may lead to problems elsewhere?
>> 
>> chris
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> --
> Jason Stajich
> jason.stajich at gmail.com
> jason at bioperl.org
> http://fungalgenomes.org/
> http://twitter.com/hyphaltip
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Mon Jan 25 13:41:38 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 08:41:38 -0500
Subject: [Bioperl-l] BEDTools module
In-Reply-To: <1264397608.4898.9.camel@epistle>
References: <1264397608.4898.9.camel@epistle>
Message-ID: <8D494783F87E4C32BD797008E260C3C2@NewLife>

Rock 'n' roll, Dan!
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 12:33 AM
Subject: [Bioperl-l] BEDTools module


> Hi All,
> 
> A wrapper and parser for the BEDTools utilities suite of Aaron Quinlan
> and Ira Hall is now available in the bioperl-run subversion repository
> (bioperl-run/trunk r16754).
> 
> Using BEDTools you can, among other things:
> 
>      * Intersecting two BED files in search of overlapping features.
>      * Merging overlapping features.
>      * Screening for paired-end (PE) overlaps between PE sequences and
>        existing genomic features.
>      * Calculating the depth and breadth of sequence coverage across
>        defined "windows" in a genome.
> 
> (see <http://code.google.com/p/bedtools/> for manuals and downloads).
> 
> BEDTools is a suite of 17 commandline executable. The module attempts to
> provide and options comprehensively and can return Bio::SeqIO or
> Bio::SeqFeature::Collection object where appropriate (or Bio::Root::IO
> where specific handling has not been implemented - please give feedback
> on desired features for this).
> 
> cheers
> Dan
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From rtbio.2009 at gmail.com  Mon Jan 25 13:43:19 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 14:43:19 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
Message-ID: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>

Hello Mark,Chris and all,

This is Roopa again. I have a small problem again. I am working on Remote
blast. The program works well. But the problem is this.  The program
accesses the server and gets the output correctly. I am trying to send the
result sequences into an array and I found that always the first sequence
among the Result sequences is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");
- Show quoted text -


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".
time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


From rtbio.2009 at gmail.com  Mon Jan 25 13:44:57 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 14:44:57 +0100
Subject: [Bioperl-l] remote blast bioperl
Message-ID: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>

Hello all,

I have a small problem again. I am working on Remote blast. The program
works well. But the problem is this.  The program accesses the server and
gets the output correctly. I am trying to send the result sequences into an
array and I found that always the first sequence among the Result sequences
is missing. The code is

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");
- Show quoted text -


while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);


   my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {
      open(OUTFILE,'>',$debugfile);
            #   print OUTFILE "while entered";
              close(OUTFILE);
     foreach my $rid ( @rids ) {

               open(OUTFILE,'>',$debugfile);
 #  print OUTFILE "foreach entered";
              close(OUTFILE);

        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
         open(OUTFILE,'>',$debugfile);
              # print OUTFILE "if entered";
              close(OUTFILE);
         print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
              open(OUTFILE,'>',$debugfile);
              # print OUTFILE "else entered";
              close(OUTFILE);

          my $result = $rc->next_result();
         #save the output
        $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".
time()."\.out";


         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

       open(BLASTDEBUGFILE,'>',$blastdebugfile);
       print BLASTDEBUGFILE  $organism;
        close(BLASTDEBUGFILE);

    # open(OUTFILE,'>',$outfile);
    # print OUTFILE "Test2 $result->database_name()";
    # close(OUTFILE);

#$hit = $result->next_hit;
#open(new,'>',$debugfile);
#print $hit;
#close(new);
$dummy=0;

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);

          #     open(OUTFILE,'>',$debugfile);
           #    print OUTFILE "$hit in while hits";
            #  close(OUTFILE);
 my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dummy;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=@seqs;
 open(OUTFILE,'>',$debugfile);
             #  print OUTFILE $warum;
               print OUTFILE @seqs;

              close(OUTFILE);
return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";


Here in the above code, I was trying to debug the code and trying to get the
count of the array and even the sequence. But when the output data was
giving 1 sequence, the count of the array was 0 and when I tried to print
the output sequence I could not get any. It was the same when the no of
output sequences was  3, I tried to print the sequences but was getting the
count of the array as 2 and was printing only two sequences.

Please help me in sorting out this problem.

Regards,
Roopa.


From cjfields at illinois.edu  Mon Jan 25 14:05:44 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Mon, 25 Jan 2010 08:05:44 -0600
Subject: [Bioperl-l] remote blast bioperl
In-Reply-To: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>
References: <c7cac1601001250544u18ba2e60oc64b44259d1906cf@mail.gmail.com>
Message-ID: <7E402CC5-9C66-4315-B437-7C4EC2317371@illinois.edu>

Roopa,

We have received all 4+ of your posts.  There is absolutely no need for you to keep repeatedly posting the same thing to the list.  Be patient, we'll try to get to you as soon as we can!

chris

On Jan 25, 2010, at 7:44 AM, Roopa Raghuveer wrote:

> Hello all,
> 
> I have a small problem again. I am working on Remote blast. The program works well. But the problem is this.  The program accesses the server and gets the output correctly. I am trying to send the result sequences into an array and I found that always the first sequence among the Result sequences is missing. The code is
> 
>  my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' , '-organism' => "$organ\[ORGN]");
> - Show quoted text -
> 
> 
> while (my $input = $str->next_seq())
> {
>    #Blast a sequence against a database:
>     #Alternatively, you could  pass in a file with many
>     #sequences rather than loop through sequence one at a time
>     #Remove the loop starting 'while (my $input = $str->next_seq())'
>     #and swap the two lines below for an example of that.
> 
>              open(OUTFILE,'>',$debugfile);
>                print OUTFILE $input;
>               close(OUTFILE);
> 
> 
>    my $r = $factory->submit_blast($input);
> 
>                 open(OUTFILE,'>',$debugfile);
>              #   print OUTFILE $r;
>                 close(OUTFILE);
> 
> 
>    print STDERR "waiting...." if($v>0);
> 
>   while ( my @rids = $factory->each_rid ) {
>       open(OUTFILE,'>',$debugfile);
>             #   print OUTFILE "while entered";
>               close(OUTFILE);
>      foreach my $rid ( @rids ) {
> 
>                open(OUTFILE,'>',$debugfile);
>  #  print OUTFILE "foreach entered";
>               close(OUTFILE);
> 
>         my $rc = $factory->retrieve_blast($rid);
> 
>         if( !ref($rc) )
>         {
>         if( $rc < 0 )
>         {
>         $factory->remove_rid($rid);
>         }
>          open(OUTFILE,'>',$debugfile);
>               # print OUTFILE "if entered";
>               close(OUTFILE);
>          print STDERR "." if ( $v > 0 );
>          sleep 5;
>         }
>        else {
>               open(OUTFILE,'>',$debugfile);
>               # print OUTFILE "else entered";
>               close(OUTFILE);
> 
>           my $result = $rc->next_result();
>          #save the output
>         $blastdebugfile = $serverpath."/blastdebug_".time().".txt";
> 
>           open(BLASTDEBUGFILE,'>',$blastdebugfile);
>           print BLASTDEBUGFILE $result->next_hit();
>           close(BLASTDEBUGFILE);
> 
>         my $filename = $serverpath."/blastdata_".
> time()."\.out";
> 
> 
>          # open(DEBUGFILE,'>',$debugfile);
>          # open(new,'>',$filename);
>          # @arra=<new>;
>          # print DEBUGFILE @arra;
>          # close(DEBUGFILE);
>          # close(new);
> 
>          $factory->save_output($filename);
> 
>        # open(BLASTDEBUGFILE,'>',$debugfile);
>        # print BLASTDEBUGFILE  "Hello $rid";
>        # close(BLASTDEBUGFILE);
> 
>        $factory->remove_rid($rid);
> 
>        open(BLASTDEBUGFILE,'>',$blastdebugfile);
>        print BLASTDEBUGFILE  $organism;
>         close(BLASTDEBUGFILE);
> 
>     # open(OUTFILE,'>',$outfile);
>     # print OUTFILE "Test2 $result->database_name()";
>     # close(OUTFILE);
> 
> #$hit = $result->next_hit;
> #open(new,'>',$debugfile);
> #print $hit;
> #close(new);
> $dummy=0;
> 
>    while ( my $hit = $result->next_hit ) {
> 
>             next unless ( $v >= 0);
> 
>           #     open(OUTFILE,'>',$debugfile);
>            #    print OUTFILE "$hit in while hits";
>             #  close(OUTFILE);
>  my $sequ = $gb->get_Seq_by_version($hit->name);
>            my $dna = $sequ->seq(); # get the sequence as a string
>         $dummy++;
>              open(OUTFILE,'>',$debugfile);
>           #     print OUTFILE $dummy;
>               close(OUTFILE);
>           push(@seqs,$dna);
>          }
>         }
>       }
>     }
>   }
> 
> $warum=@seqs;
>  open(OUTFILE,'>',$debugfile);
>              #  print OUTFILE $warum;
>                print OUTFILE @seqs;
> 
>               close(OUTFILE);
> return(@seqs);
> }
> 
> open(OUTFILE, '>',$outfile) || die ;
> 
> print OUTFILE "<HTML>\n
> <head><title>RNAi Result</title>
> <meta http-equiv=\"expires\" content=\"0\"></head>\n
> <body>\n
> <p><font face=\"Courier, monospace font set\">
> Inputsequence: <br>";
> 
> 
> Here in the above code, I was trying to debug the code and trying to get the count of the array and even the sequence. But when the output data was giving 1 sequence, the count of the array was 0 and when I tried to print the output sequence I could not get any. It was the same when the no of output sequences was  3, I tried to print the sequences but was getting the count of the array as 2 and was printing only two sequences.
> 
> Please help me in sorting out this problem.
> 
> Regards,
> Roopa.


From jiann-jy at hotmail.com  Mon Jan 25 02:03:55 2010
From: jiann-jy at hotmail.com (JY)
Date: Sun, 24 Jan 2010 18:03:55 -0800 (PST)
Subject: [Bioperl-l] how to retrieve accession number by taxon id??
Message-ID: <4cef88b5-fa53-4e63-9167-30075c10a058@k19g2000yqc.googlegroups.com>

i need to retrieve accession number and sequence to complete one of my
part in my project, but how to retrieve accession number  by the taxon
id.


From lpaulet at ual.es  Mon Jan 25 20:25:55 2010
From: lpaulet at ual.es (Lorenzo Carretero-Paulet)
Date: Mon, 25 Jan 2010 21:25:55 +0100
Subject: [Bioperl-l] HTMLResultWriter
Message-ID: <4B5DFE53.2000201@ual.es>

Hi all,

I'm trying to generate a subroutine that performs a BLAST search and 
returns the corresponding reports in txt, xml and html format. I?m 
experiencing problems with the latter, as the program returns the 
following error message:

"Can't call method "next_result" without a package or object reference 
at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e 
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e 
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                 -file   => ">$outputfilenameH");
while( my $result = _$blast_report_->next_result ) { # get a result from 
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


From lpaulet at ual.es  Mon Jan 25 20:31:08 2010
From: lpaulet at ual.es (lpaulet at ual.es)
Date: Mon, 25 Jan 2010 21:31:08 +0100
Subject: [Bioperl-l] HTMLResultWriter
Message-ID: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>

Hi all,

I'm trying to generate a subroutine that performs a BLAST search and  
returns the corresponding reports in txt, xml and html format. I?m  
experiencing problems with the latter, as the program returns the  
following error message:

"Can't call method "next_result" without a package or object reference at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my $blast_report =qx/$blast -p blastp -d $database -i $query -e  
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e  
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                  -file   => ">$outputfilenameH");
while( my $result = $blast_report->next_result ) { # get a result from  
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


From dan.kortschak at adelaide.edu.au  Mon Jan 25 21:00:37 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 26 Jan 2010 07:30:37 +1030
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
Message-ID: <1264453237.4552.3.camel@epistle>

A reverse_translate to IUPAC degenerate codes is not a bad idea,
particularly for PCR primer design.

Dan

On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org
wrote:
> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:
> 
> > ...And there might be a case for adding the analogous
> reverse_translate().  
> 
> Bah.  Meant reverse_transcribe().  Ah well.
> 
> chris


From maj at fortinbras.us  Mon Jan 25 21:07:49 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 16:07:49 -0500
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>
References: <20100125213108.zws18jpd8gwwkssk@webmail.ual.es>
Message-ID: <F5772AAC495D475DBEEEF2311B16F941@NewLife>

Lorenzo--
your $blast_report is set to be (some of) the text returned
by a system call of a blast program; this isn't going to be
an object of any kind, and so no functions can be
called from it (as at "$blast_report->next_result"). You need
to parse the text generated by the blast call using Bio::SearchIO
to get a Bio::Search::Result::BlastResult object.
you could do

@blast_lines = qx/ ...your blast call... /;
open my $bf, ">my.blast";
print $bf, @blast_lines;
close $bf;
$blast_result = Bio::SearchIO->new(-file=>'my.blast',
                                                        -format => 'blast');

and carry on from there. But why not look at
Bio::Tools::Run::StandAloneBlast or
Bio::Tools::Run::StandAloneBlastPlus
to run your blasts within perl? These wrap the blast
programs and deliver BioPerl objects, rather than
plain text output.
cheers MAJ
----- Original Message ----- 
From: <lpaulet at ual.es>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 3:31 PM
Subject: [Bioperl-l] HTMLResultWriter


Hi all,

I'm trying to generate a subroutine that performs a BLAST search and
returns the corresponding reports in txt, xml and html format. I?m
experiencing problems with the latter, as the program returns the
following error message:

"Can't call method "next_result" without a package or object reference at..."

sub blasting    {
my ($query, $E_value) = @_;
my ($outputfilenameB, $outputfilenameX, $outputfilenameH);
$outputfilenameB=$query.".BLAST.txt";
$outputfilenameX=$query.".BLAST.xml";
$outputfilenameH=$query.".BLAST.html";
#legacy_blast.pl blastall -i query -d nr -o blast.out --path /opt/blast/bin
print qx(du -s /tmp);
my $blast_report =qx/$blast -p blastp -d $database -i $query -e
$E_value -b 20000 -o $outputfilenameB/;
my $XMLblast_report =qx/$blast -p blastp -d $database -i $query -e
$E_value -m 7 -b 20000 -o $outputfilenameX/;

my $writerhtml = new Bio::SearchIO::Writer::HTMLResultWriter();
my $outhtml = new Bio::SearchIO(-writer => $writerhtml,
                                  -file   => ">$outputfilenameH");
while( my $result = $blast_report->next_result ) { # get a result from
Bio::SearchIO parsing or build it up in memory
$outhtml->write_result($result);
}
}

Can anyone  see where the problem is?
Cheers!
Lorenzo


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From David.Messina at sbc.su.se  Mon Jan 25 21:09:24 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 25 Jan 2010 22:09:24 +0100
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <4B5DFE53.2000201@ual.es>
References: <4B5DFE53.2000201@ual.es>
Message-ID: <FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>

> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e $E_value -b 20000 -o $outputfilenameB/;

> while( my $result = _$blast_report_->next_result ) { # get a result from Bio::SearchIO parsing or build it up in memory


_$blast_report_ is not a valid variable name, as far as I know. Plus there's a space between report and the final '_' in the first of the above two lines.

Does this code compile?

Dave


From Russell.Smithies at agresearch.co.nz  Mon Jan 25 21:14:15 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Tue, 26 Jan 2010 10:14:15 +1300
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
References: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>

That's a fair mix of incomplete code you've supplied!!
Did you read the documentation for RemoteBlast? The example there will do 99% of what you want.
http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm

I'm not entirely sure what you're trying to do (as you've left out a bit of your code) but I assume you're trying to retrieve and print the sequence for each hit.

Here's something that works, not sure exactly what/why you want to print but it should get you a bit further.

--Russell


================================
#!perl -w

use Bio::Tools::Run::RemoteBlast;
use Bio::DB::GenBank;

use CGI ':standard';

use strict;

my $q = new CGI;

my @params = (
               -prog         => 'blastn',
               -data         => 'nr',
               -expect       => '1e-30',
               -entrez_query => 'Homo sapiens [ORGN]',
               -readmethod   => 'SearchIO'
);

my $gb = Bio::DB::GenBank->new;

my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

#$v is just to turn on and off the messages
my $v = 1;

my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" );

while ( my $input = $str->next_seq() ) {

  my $r = $factory->submit_blast($input);

  print STDERR "waiting..." if ( $v > 0 );
  while ( my @rids = $factory->each_rid ) {
    foreach my $rid (@rids) {
      my @seqs = ();
      my $rc   = $factory->retrieve_blast($rid);
      if ( !ref($rc) ) {
        if ( $rc < 0 ) {
          $factory->remove_rid($rid);
        }
        print STDERR "." if ( $v > 0 );
        sleep 5;
      }
      else {
        my $result = $rc->next_result();

        #save the blast output
        my $filename = $result->query_accession . '.out';
        $factory->save_output($filename);
        $factory->remove_rid($rid);
        print "\nQuery Name: ", $result->query_name(), "\n";
        while ( my $hit = $result->next_hit ) {

          # store the hit sequences
          push @seqs, $gb->get_Seq_by_version( $hit->name );

          next unless ( $v > 0 );
          print "\thit name is ", $hit->name, "\n";
          while ( my $hsp = $hit->next_hsp ) {
            print "\t\tscore is ", $hsp->score, "\n";
          }
        }

        ## print the seqs you've retrieved??
        open( OUTFILE, '>', $result->query_accession . '.htm' );
        print OUTFILE $q->start_html('RNAi Result'),
          $q->h1('RNAi Result'),
          $q->h2('Input'),
          $q->pre( toString($input) ),
          $q->h2('Output');

        foreach (@seqs) {

          #there's probably a better way of printing the seq
          print OUTFILE $q->pre( toString($_) );
        }
        print OUTFILE $q->end_html;
        close OUTFILE;
      }
    }
  }
}

sub toString {
  my $s = shift;
  return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq;
}


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From biopython at maubp.freeserve.co.uk  Mon Jan 25 21:24:33 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 25 Jan 2010 21:24:33 +0000
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264453237.4552.3.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
Message-ID: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>

On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak
<dan.kortschak at adelaide.edu.au> wrote:
> A reverse_translate to IUPAC degenerate codes is not a bad idea,
> particularly for PCR primer design.

I would say it could be a bad idea. For any protein string there are
multiple possible back translations, and this cannot be captured
fully as a nucleotide string even using the IUPAC ambiguity chars.

We debated this back and forth for Biopython, and decided to leave it
out. It wasn't possible for a simple back translate to a simple string to
handle the use cases we considered, and other options like returning
a regular expression covering all possible back translations were too
complex (for a core sequence method/function).

Peter


From jason at bioperl.org  Mon Jan 25 21:26:55 2010
From: jason at bioperl.org (Jason Stajich)
Date: Mon, 25 Jan 2010 13:26:55 -0800
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
Message-ID: <98995830-DC7F-4404-A216-874EF5799DB6@bioperl.org>

It was already implemented several years ago -- reverse_translate  
Bio::Tools::CodonTable -> revtanslate


   my $seqobj    = Bio::PrimarySeq->new(-seq => 'FHGERHEL');
   my $iupac_str = $myCodonTable->reverse_translate_all($seqobj);


Chris had meant to say reverse_transcribe of RNA -> DNA FWIW.

-jason
On Jan 25, 2010, at 1:24 PM, Peter wrote:

> On Mon, Jan 25, 2010 at 9:00 PM, Dan Kortschak
> <dan.kortschak at adelaide.edu.au> wrote:
>> A reverse_translate to IUPAC degenerate codes is not a bad idea,
>> particularly for PCR primer design.
>
> I would say it could be a bad idea. For any protein string there are
> multiple possible back translations, and this cannot be captured
> fully as a nucleotide string even using the IUPAC ambiguity chars.
>
> We debated this back and forth for Biopython, and decided to leave it
> out. It wasn't possible for a simple back translate to a simple  
> string to
> handle the use cases we considered, and other options like returning
> a regular expression covering all possible back translations were too
> complex (for a core sequence method/function).
>
> Peter
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From maj at fortinbras.us  Mon Jan 25 21:19:24 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Mon, 25 Jan 2010 16:19:24 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264453237.4552.3.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
Message-ID: <72B106F0D5FF4F1E858CC9BD1EF33142@NewLife>

I think we have that functionality in Bio::Tools::SeqPattern, 
courtesy of Bruno V---
----- Original Message ----- 
From: "Dan Kortschak" <dan.kortschak at adelaide.edu.au>
To: <bioperl-l at lists.open-bio.org>
Sent: Monday, January 25, 2010 4:00 PM
Subject: Re: [Bioperl-l] Transcribe in bioperl


>A reverse_translate to IUPAC degenerate codes is not a bad idea,
> particularly for PCR primer design.
> 
> Dan
> 
> On Mon, 2010-01-25 at 09:05 -0500, bioperl-l-request at lists.open-bio.org
> wrote:
>> On Jan 24, 2010, at 10:39 PM, Chris Fields wrote:
>> 
>> > ...And there might be a case for adding the analogous
>> reverse_translate().  
>> 
>> Bah.  Meant reverse_transcribe().  Ah well.
>> 
>> chris
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From dan.kortschak at adelaide.edu.au  Mon Jan 25 21:38:44 2010
From: dan.kortschak at adelaide.edu.au (Dan Kortschak)
Date: Tue, 26 Jan 2010 08:08:44 +1030
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org>
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com>
Message-ID: <1264455524.4552.23.camel@epistle>

Good to see that these ideas have been considered.

I'd be interested to see this discussion, or at least the point dealing
with the problems that might arise. I'm at a loss as to how ambiguity
codes can't completely describe all possible coding sequences for any
given codon table (via Bio::Tools::CodonTable - in fact this already has
the revtranslate that could be fitted into a Bio::PrimarySeq method - to
answer Mark and Jason's comments, I think that /if/ a reverse_translate
method exists, it makes logical sense to have it tied to a sequence
object, calling the B:T:CT method on the seq object itself rather than
only in Bio::Tools, 2?). Pete, tcn you provide an example of the
problems?

thanks
Dan

On Mon, 2010-01-25 at 21:24 +0000, Peter wrote:
> I would say it could be a bad idea. For any protein string there are
> multiple possible back translations, and this cannot be captured
> fully as a nucleotide string even using the IUPAC ambiguity chars.


From lpaulet at ual.es  Mon Jan 25 21:53:07 2010
From: lpaulet at ual.es (lpaulet at ual.es)
Date: Mon, 25 Jan 2010 22:53:07 +0100
Subject: [Bioperl-l] HTMLResultWriter
In-Reply-To: <FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>
References: <4B5DFE53.2000201@ual.es>
	<FA18BF1A-7E2C-447B-9D81-2B597B76A77A@sbc.su.se>
Message-ID: <20100125225307.2zl2cn2hkcsgccso@webmail.ual.es>

Thanks Dave and Mark.

Quoting Dave Messina <David.Messina at sbc.su.se>:

>> my _$blast_report _=qx/$blast -p blastp -d $database -i $query -e   
>> $E_value -b 20000 -o $outputfilenameB/;
>
>> while( my $result = _$blast_report_->next_result ) { # get a result  
>>  from Bio::SearchIO parsing or build it up in memory
>
>
> _$blast_report_ is not a valid variable name, as far as I know. Plus  
>  there's a space between report and the final '_' in the first of  
> the  above two lines.
>
> Does this code compile?
>
> Dave
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From rtbio.2009 at gmail.com  Mon Jan 25 22:35:32 2010
From: rtbio.2009 at gmail.com (Roopa Raghuveer)
Date: Mon, 25 Jan 2010 23:35:32 +0100
Subject: [Bioperl-l] Regarding blast in Bioperl
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>
References: <c7cac1601001250543w646d26dat9d0eb16019079945@mail.gmail.com>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC908B@exchsth.agresearch.co.nz>
Message-ID: <c7cac1601001251435k7b75ffbbj64cfa36faf8d89bb@mail.gmail.com>

Hello Russell,

Thank you very much for your reply. My problem is that Remote blast is
getting well executed with my code and I am getting the .out file with
sequences producing significant alignments. But, when I am trying to
retrieve the sequences into an array @seqs, I am able to retrieve all the
sequences except for the first hit. If the number of hits that I get in the
.out file to be 3, I am able to retrieve only 2 hits i.e., I am able to get
only 2 sequences. If there is only one significant hit for my sequence, then
the name and description of the sequence appears in the .out file, but I am
unable to get it into the array,the array count shows 0 and there would not
be any sequence in the array.

I hope that you have got me now.

Here comes my code,

use Bio::SearchIO;
use Bio::Search::Result::BlastResult;
use Bio::Perl;
use Bio::Tools::Run::RemoteBlast;
use Bio::Seq;
use Bio::SeqIO;
use Bio::DB::GenBank;

$serverpath = "/srv/www/htdocs/rain/RNAi";
$serverurl = "http://141.84.66.66/rain/RNAi";
$outfile = $serverpath."/rnairesult_".time().".html";
$nuc = $serverpath."/nuc".time().".txt";
$debugfile = $serverpath."/debug_".time().".txt";
$blastdebugfile = $serverpath."/blastdebug_".time().".txt";

my $outstring ="";

&parse_form;

print "Content-type: text/html\n\n";
print "<HTML>\n";
print "<head><title>RNAi Result</title>";
print "<META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl/rnairesult_".time().".html\"> \n";
print "</head>\n";
print "<body>\n";
print " Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>";
print " Please be patient, runtime can be up to 5 minutes<br>";
print " This page will automatically reload in 30 seconds.";
print "</BODY>\n";
print "</HTML>\n";

defined(my $pid = fork) or die "Can't fork: $!";
exit if $pid;
open STDIN, '/dev/null' or die "Can't read /dev/null: $!";
open STDOUT, '>/dev/null' or die "Can't write to /dev/null: $!";
open STDERR, '>&STDOUT' or die "Can't dup stdout: $!";


open(OUTFILE, '>',$outfile);

print OUTFILE "<HTML>\n
 <head><title>RNAi Result</title>
 <META HTTP-EQUIV=\"Refresh\" CONTENT=\"30;
URL=$serverurl//rnairesult_".time().".html\"> \n
 <meta http-equiv=\"expires\" content=\"0\">
 </head>\n
 <body>\n
  Your results will appear <a
href=$serverurl/rnairesult_".time().".html>here</a><br>
  Please be patient, runtime can be up to 5 minutes <br>
 This page will automatically reload in 30 seconds  <br>
 </BODY>\n
 </HTML>\n";

close(OUTFILE);


@compseqs = blastcode($in{'Inputseq'},$in{'Organism'});

$in{'Inputseq'} =~ s/>.*$//m;
$in{'Inputseq'} =~ s/[^TAGC]//gim;
$in{'Inputseq'} =~ tr/actg/ACTG/;

@out = similar($in{'Inputseq'}, \@compseqs, $in{'Windowsize'},
$in{'Threshold'});


sub blastcode
{

$inpu1= $_[0];

$organ= $_[1];

open(NUC,'>',$nuc);
print NUC $inpu1,"\n";
close(NUC);

 my $prog = 'blastn';
 my $db   = 'refseq_rna';
 my $e_val= '1e-10';
 my $organism= $organ;

$gb = new Bio::DB::GenBank;

 my @params = ( '-prog' => $prog,
         '-data' => $db,
         '-expect' => $e_val,
         '-readmethod' => 'SearchIO',
         '-Organism'   => $organism );

            # open(OUTFILE,'>',$debugfile);
             #  print OUTFILE @params;
             # close(OUTFILE);


my $factory = Bio::Tools::Run::RemoteBlast->new(@params, -ENTREZ_QUERY =>
"$organ\[ORGN]");

 #my $factory = Bio::Tools::Run::RemoteBlast->new(@params);

  #change a paramter

 #$Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = 'Trypanosoma
Brucei[ORGN]';

#change a paramter
# $Bio::Tools::Run::RemoteBlast::HEADER{'ENTREZ_QUERY'} = '$input2[ORGN]';

  my $v = 1;
  #$v is just to turn on and off the messages

 my $str = Bio::SeqIO->new('-file' => $nuc , '-format' => 'fasta' ,
'-organism' => "$organ\[ORGN]");

while (my $input = $str->next_seq())
{
   #Blast a sequence against a database:
    #Alternatively, you could  pass in a file with many
    #sequences rather than loop through sequence one at a time
    #Remove the loop starting 'while (my $input = $str->next_seq())'
    #and swap the two lines below for an example of that.

             open(OUTFILE,'>',$debugfile);
               print OUTFILE $input;
              close(OUTFILE);
 my $r = $factory->submit_blast($input);

                open(OUTFILE,'>',$debugfile);
             #   print OUTFILE $r;
                close(OUTFILE);


   print STDERR "waiting...." if($v>0);

  while ( my @rids = $factory->each_rid ) {

     foreach my $rid ( @rids ) {


        my $rc = $factory->retrieve_blast($rid);

        if( !ref($rc) )
        {
        if( $rc < 0 )
        {
        $factory->remove_rid($rid);
        }
       print STDERR "." if ( $v > 0 );
         sleep 5;
        }
       else {
          my $result = $rc->next_result();
         #save the output

      $blastdebugfile = $serverpath."/blastdebug_".time().".txt";

          open(BLASTDEBUGFILE,'>',$blastdebugfile);
          print BLASTDEBUGFILE $result->next_hit();
          close(BLASTDEBUGFILE);

        my $filename = $serverpath."/blastdata_".time()."\.out";

         # open(DEBUGFILE,'>',$debugfile);
         # open(new,'>',$filename);
         # @arra=<new>;
         # print DEBUGFILE @arra;
         # close(DEBUGFILE);
         # close(new);

         $factory->save_output($filename);

       # open(BLASTDEBUGFILE,'>',$debugfile);
       # print BLASTDEBUGFILE  "Hello $rid";
       # close(BLASTDEBUGFILE);

       $factory->remove_rid($rid);

   while ( my $hit = $result->next_hit ) {

            next unless ( $v >= 0);


       my $sequ = $gb->get_Seq_by_version($hit->name);
           my $dna = $sequ->seq(); # get the sequence as a string
        $dummy++;
             open(OUTFILE,'>',$debugfile);
             open(OUTFILE,'>',$debugfile);
          #     print OUTFILE $dna;
              close(OUTFILE);
          push(@seqs,$dna);
         }
        }
      }
    }
  }

$warum=scalar(@seqs);
              open(OUTFILE,'>',$debugfile);
               print OUTFILE $warum;
             #  print OUTFILE @seqs;
              close(OUTFILE);
      return(@seqs);
}

open(OUTFILE, '>',$outfile) || die ;

print OUTFILE "<HTML>\n
<head><title>RNAi Result</title>
<meta http-equiv=\"expires\" content=\"0\"></head>\n
<body>\n
<p><font face=\"Courier, monospace font set\">
Inputsequence: <br>";

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

        print OUTFILE substr ($in{'Inputseq'}, $i, 1);

        if ( ($i+1)%10==0){
                print OUTFILE " ";
        }
        if ( ($i+1)%60==0){
                print OUTFILE "<br>\n";
        }
}


print OUTFILE "</font> <p>";

$z=@compseqs;

for($k=0;$k<$z;$k++) {
        print OUTFILE "<font face=\"Courier, monospace font set\"><p>Compare
Sequence: <br>";

        for ($i=0; $i<length ($compseqs[$k]); $i++) {

                print OUTFILE substr ($compseqs[$k], $i, 1);

                if ( ($i+1)%10==0){
                        print OUTFILE " ";
                }
                if ( ($i+1)%60==0){
                        print OUTFILE "<br>\n";
                }
        }
        print OUTFILE "<p></font>";
}

print OUTFILE "<p>
Window: <br>$in{'Windowsize'}
<p>
<p>
Threshold: <br>$in{'Threshold'}
<p>";
my $j=0;

for ($i=0; $i<length ($in{'Inputseq'}); $i++) {

        if ($i<=length ($in{'Inputseq'})-$in{'Windowsize'}){
                if ($out[$i]->{similar}<=$in{'Threshold'}){
                        $j=$in{'Windowsize'};
                }
                $height=$out[$i]->{similar}*5;
        }

        if ($j>0) {
                print OUTFILE "<img src=\"$serverurl/blue.gif\" width=\"1\"
height=\"5\">";
                $outstring .= "<font color=\"green\">".substr
($in{'Inputseq'}, $i, 1)."</font>";
                $j--;
        }
        else {
                print OUTFILE "<img src=\"$serverurl/red.gif\" width=\"1\"
height=\"5\">";
                $outstring .= "<font color=\"red\">".substr
($in{'Inputseq'}, $i, 1)."</font>";
        }

        if ( ($i+1)%10==0){
                $outstring .= " ";
        }
        if ( ($i+1)%60==0){
                $outstring .= "<br>\n";

        }
        if ( ($i+1)%800==0){
                print OUTFILE "<br><br>\n";

        }
}

print OUTFILE "<br><br><font face=\"Courier, monospace font
set\">$outstring</font>";

#foreach (@out) {
#print OUTFILE "<p>Sequence: $_->{sequence}: $_->{similar} matchs<p>";
#if ($_->{similar}<=$in{'Threshold'}){

#       }
#}

print OUTFILE "</BODY>\n</HTML>\n";

close OUTFILE;

#nameprint();

sub parse_form {
    local ($buffer, @pairs, $pair, $name, $value);
    # Read in text
    $ENV{'REQUEST_METHOD'} =~ tr/a-z/A-Z/;
    if ($ENV{'REQUEST_METHOD'} eq "POST")
    {
      read(STDIN, $buffer, $ENV{'CONTENT_LENGTH'});
    }
    else
    {
        $buffer = $ENV{'QUERY_STRING'};
    }
    @pairs = split(/&/, $buffer);
    foreach $pair (@pairs)
    {
        ($name, $value) = split(/=/, $pair);
        $value =~ tr/+/ /;
        $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
        $in{$name} = $value;
    }
}

Regards,
Roopa.


On Mon, Jan 25, 2010 at 10:14 PM, Smithies, Russell <
Russell.Smithies at agresearch.co.nz> wrote:

> That's a fair mix of incomplete code you've supplied!!
> Did you read the documentation for RemoteBlast? The example there will do
> 99% of what you want.
> http://search.cpan.org/~cjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm<http://search.cpan.org/%7Ecjfields/BioPerl-1.6.1/Bio/Tools/Run/RemoteBlast.pm>
>
> I'm not entirely sure what you're trying to do (as you've left out a bit of
> your code) but I assume you're trying to retrieve and print the sequence for
> each hit.
>
> Here's something that works, not sure exactly what/why you want to print
> but it should get you a bit further.
>
> --Russell
>
>
> ================================
> #!perl -w
>
> use Bio::Tools::Run::RemoteBlast;
> use Bio::DB::GenBank;
>
> use CGI ':standard';
>
> use strict;
>
> my $q = new CGI;
>
> my @params = (
>               -prog         => 'blastn',
>               -data         => 'nr',
>               -expect       => '1e-30',
>               -entrez_query => 'Homo sapiens [ORGN]',
>               -readmethod   => 'SearchIO'
> );
>
> my $gb = Bio::DB::GenBank->new;
>
> my $factory = Bio::Tools::Run::RemoteBlast->new(@params);
>
> #$v is just to turn on and off the messages
> my $v = 1;
>
> my $str = Bio::SeqIO->new( -file => 'test.faa', -format => "fasta" );
>
> while ( my $input = $str->next_seq() ) {
>
>   my $r = $factory->submit_blast($input);
>
>   print STDERR "waiting..." if ( $v > 0 );
>  while ( my @rids = $factory->each_rid ) {
>     foreach my $rid (@rids) {
>      my @seqs = ();
>       my $rc   = $factory->retrieve_blast($rid);
>      if ( !ref($rc) ) {
>        if ( $rc < 0 ) {
>          $factory->remove_rid($rid);
>        }
>         print STDERR "." if ( $v > 0 );
>        sleep 5;
>      }
>      else {
>         my $result = $rc->next_result();
>
>         #save the blast output
>        my $filename = $result->query_accession . '.out';
>        $factory->save_output($filename);
>        $factory->remove_rid($rid);
>        print "\nQuery Name: ", $result->query_name(), "\n";
>         while ( my $hit = $result->next_hit ) {
>
>           # store the hit sequences
>          push @seqs, $gb->get_Seq_by_version( $hit->name );
>
>          next unless ( $v > 0 );
>          print "\thit name is ", $hit->name, "\n";
>          while ( my $hsp = $hit->next_hsp ) {
>            print "\t\tscore is ", $hsp->score, "\n";
>          }
>        }
>
>        ## print the seqs you've retrieved??
>        open( OUTFILE, '>', $result->query_accession . '.htm' );
>        print OUTFILE $q->start_html('RNAi Result'),
>          $q->h1('RNAi Result'),
>          $q->h2('Input'),
>          $q->pre( toString($input) ),
>          $q->h2('Output');
>
>        foreach (@seqs) {
>
>          #there's probably a better way of printing the seq
>          print OUTFILE $q->pre( toString($_) );
>        }
>        print OUTFILE $q->end_html;
>        close OUTFILE;
>      }
>    }
>  }
> }
>
> sub toString {
>  my $s = shift;
>  return '>' . $s->display_id . " " . $s->desc . "\n" . $s->seq;
> }
>
>
> =======================================================================
> Attention: The information contained in this message and/or attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or privileged
> material. Any review, retransmission, dissemination or other use of, or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> =======================================================================
>


From ajmackey at gmail.com  Tue Jan 26 13:24:43 2010
From: ajmackey at gmail.com (Aaron Mackey)
Date: Tue, 26 Jan 2010 08:24:43 -0500
Subject: [Bioperl-l] Transcribe in bioperl
In-Reply-To: <1264455524.4552.23.camel@epistle>
References: <mailman.32682.1264428347.2696.bioperl-l@lists.open-bio.org> 
	<1264453237.4552.3.camel@epistle>
	<320fb6e01001251324r3e8ec3adt3c6d6f16a4839f56@mail.gmail.com> 
	<1264455524.4552.23.camel@epistle>
Message-ID: <24c96eca1001260524s3d46e850hfdcc461e22210972@mail.gmail.com>

There's also Bio::Tools::IUPAC; given a sequence with IUPAC ambiguity codes,
it provides a SeqIO stream that enumerates all the possible unambiguous
realizations.  Not the right solution for every situation, but quite useful
when you need it.

-Aaron


On Mon, Jan 25, 2010 at 4:38 PM, Dan Kortschak <
dan.kortschak at adelaide.edu.au> wrote:

> Good to see that these ideas have been considered.
>
> I'd be interested to see this discussion, or at least the point dealing
> with the problems that might arise. I'm at a loss as to how ambiguity
> codes can't completely describe all possible coding sequences for any
> given codon table (via Bio::Tools::CodonTable - in fact this already has
> the revtranslate that could be fitted into a Bio::PrimarySeq method - to
> answer Mark and Jason's comments, I think that /if/ a reverse_translate
> method exists, it makes logical sense to have it tied to a sequence
> object, calling the B:T:CT method on the seq object itself rather than
> only in Bio::Tools, 2?). Pete, tcn you provide an example of the
> problems?
>
> thanks
> Dan
>
> On Mon, 2010-01-25 at 21:24 +0000, Peter wrote:
> > I would say it could be a bad idea. For any protein string there are
> > multiple possible back translations, and this cannot be captured
> > fully as a nucleotide string even using the IUPAC ambiguity chars.
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From nml5566 at gmail.com  Tue Jan 26 21:10:54 2010
From: nml5566 at gmail.com (Nathan Liles)
Date: Tue, 26 Jan 2010 15:10:54 -0600
Subject: [Bioperl-l] SVN access
Message-ID: <4B5F5A5E.2070406@gmail.com>

Does anyone know who I need to talk to for getting developer access for 
the Bioperl SVN? I want to submit a patch to the genbank2gff3 converter.

Thanks,
Nathan


From Russell.Smithies at agresearch.co.nz  Wed Jan 27 01:40:40 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 14:40:40 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>

Grrrrrr, I hate eutils!!!!

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
STACK: get_desc.pl:32
-----------------------------------------------------------


Nice error message though :-)


--Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> Sent: Monday, 11 January 2010 10:05 a.m.
> To: 'Chris Fields'
> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> I've started to go off eUtils recently (not BioPerl's fault) as I've often
> been finding that with large queries, chunks of the resulting data is
> missing.
> For example, before Xmas I was creating species-specific databases by
> using eUtils to get a list of GI numbers back for a taxid, then retrieving
> the fasta sequences in chunks of 500.
> Very regularly, in the middle of the fasta there would be a message about
> resource unavailable eg.
>   >test_sequence_1
>   TACGATCATCGCTResource UnavailableTACGACTCTGCT
>   >test_sequence_2
>   TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> 
> Often this wasn't detected until formatdb complained about invalid
> characters.
> Inquiries to NCBI as to why this was happening and what to do about it
> returned stupid answers ("do each sequence manually thru the web
> interface", or "use eUtils").
> As we have a nice fast network connection, I now prefer to download very
> large gzip files (i.e. all of refseq) and extract what I need.
> 
> I can't help but think that NCBI could solve a lot of problems if they
> gzipped the output from eUtils queries - it's something I've requested
> regularly for the last 5 years or so!!
> 
> --Russell
> 
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Monday, 11 January 2010 9:50 a.m.
> > To: Smithies, Russell
> > Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> >
> > One could also use Bio::DB::Taxonomy, which indexes the same files or
> > (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
> > details).
> >
> > chris
> >
> > On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >
> > > An alternate non-BioPerly way (that may be faster given NCBI's
> flakiness
> > lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
> > files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
> and
> > do lookups.
> > > In that same dir, taxdump.tar.gz contains a file called names.dmp
> which
> > lists taxids and descriptions (and synonyms)
> > >
> > > If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> > could do this:
> > >
> > >   my $taxid  = $gi_taxid_nucl{$accession};
> > >   my $org_name = $names{$taxid};
> > >
> > > --Russell
> > >
> > >
> > >> -----Original Message-----
> > >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > >> Sent: Saturday, 26 December 2009 4:52 p.m.
> > >> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >> number?
> > >>
> > >> Bhakti,
> > >> The following example (using EUtilities) may serve your purpose:
> > >>
> > >> use Bio::DB::EUtilities;
> > >>
> > >> my (%taxa, @taxa);
> > >> my (%names, %idmap);
> > >>
> > >> # these are protein ids; nuc ids will work by changing -dbfrom =>
> > >> 'nucleotide',
> > >> # (probably)
> > >>
> > >> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > >>
> > >> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > >>                                       -db => 'taxonomy',
> > >>                                       -dbfrom => 'protein',
> > >>                                       -correspondence => 1,
> > >>                                       -id => \@ids);
> > >>
> > >> # iterate through the LinkSet objects
> > >> while (my $ds = $factory->next_LinkSet) {
> > >>    $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > >> }
> > >>
> > >> @taxa = @taxa{@ids};
> > >>
> > >> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > >>        -db    => 'taxonomy',
> > >>        -id    => \@taxa );
> > >>
> > >> while (local $_ = $factory->next_DocSum) {
> > >>    $names{($_->get_contents_by_name('TaxId'))[0]} =
> > >> ($_->get_contents_by_name('ScientificName'))[0];
> > >> }
> > >>
> > >> foreach (@ids) {
> > >>    $idmap{$_} = $names{$taxa{$_}};
> > >> }
> > >>
> > >> # %idmap is
> > >> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > >> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > >> #    68536103 => 'Corynebacterium jeikeium K411'
> > >> #    730439 => 'Bacillus caldolyticus'
> > >> #    89318838 => undef    (this record has been removed from the db)
> > >>
> > >> 1;
> > >>
> > >> You probably will need to break up your 30000 into chunks
> > >> (say, 1000-3000 each), and do the above on each chunk with a
> > >>
> > >> sleep 3;
> > >>
> > >> or so separating the queries.
> > >> MAJ
> > >> ----- Original Message -----
> > >> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > >> To: <bioperl-l at lists.open-bio.org>
> > >> Sent: Friday, December 25, 2009 9:46 PM
> > >> Subject: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > >>
> > >>
> > >>> Hi,
> > >>>
> > >>> Does anyone know how to retrieve the "Source" or the "Species name"
> > >> given
> > >>> the accession number using Bioperl.   I have these 30,000 accession
> > >> numbers
> > >>> for which I need to get the source organisms.  Any kind of help will
> > be
> > >>> appreciated.
> > >>>
> > >>> Thanks
> > >>>
> > >>> BD
> > >>> _______________________________________________
> > >>> Bioperl-l mailing list
> > >>> Bioperl-l at lists.open-bio.org
> > >>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>
> > >>>
> > >>
> > >> _______________________________________________
> > >> Bioperl-l mailing list
> > >> Bioperl-l at lists.open-bio.org
> > >> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> =======================================================================
> > > Attention: The information contained in this message and/or
> attachments
> > > from AgResearch Limited is intended only for the persons or entities
> > > to which it is addressed and may contain confidential and/or
> privileged
> > > material. Any review, retransmission, dissemination or other use of,
> or
> > > taking of any action in reliance upon, this information by persons or
> > > entities other than the intended recipients is prohibited by
> AgResearch
> > > Limited. If you have received this message in error, please notify the
> > > sender immediately.
> > >
> =======================================================================
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jan 27 01:46:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 26 Jan 2010 19:46:26 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
Message-ID: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>

It's unfortunate but I have heard this problem popping up quite a bit more frequently lately.  Not to push too many buttons but NCBI isn't very forthcoming with help these days; they have become quite insular.  Not sure if they're short-staffed due to budget or if there are other issues.

chris

On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:

> Grrrrrr, I hate eutils!!!!
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: NCBI esearch fatal error: Search Backend failed: Error 111 (Connection refused)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> STACK: get_desc.pl:32
> -----------------------------------------------------------
> 
> 
> Nice error message though :-)
> 
> 
> --Russell
> 
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>> Sent: Monday, 11 January 2010 10:05 a.m.
>> To: 'Chris Fields'
>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> I've started to go off eUtils recently (not BioPerl's fault) as I've often
>> been finding that with large queries, chunks of the resulting data is
>> missing.
>> For example, before Xmas I was creating species-specific databases by
>> using eUtils to get a list of GI numbers back for a taxid, then retrieving
>> the fasta sequences in chunks of 500.
>> Very regularly, in the middle of the fasta there would be a message about
>> resource unavailable eg.
>>> test_sequence_1
>>  TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>> test_sequence_2
>>  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>> 
>> Often this wasn't detected until formatdb complained about invalid
>> characters.
>> Inquiries to NCBI as to why this was happening and what to do about it
>> returned stupid answers ("do each sequence manually thru the web
>> interface", or "use eUtils").
>> As we have a nice fast network connection, I now prefer to download very
>> large gzip files (i.e. all of refseq) and extract what I need.
>> 
>> I can't help but think that NCBI could solve a lot of problems if they
>> gzipped the output from eUtils queries - it's something I've requested
>> regularly for the last 5 years or so!!
>> 
>> --Russell
>> 
>> 
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>> To: Smithies, Russell
>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>> 
>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for the
>>> details).
>>> 
>>> chris
>>> 
>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>> 
>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>> flakiness
>>> lately) would be to download the gi_taxid_nucl.zip or gi_taxid_prot.zip
>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>> and
>>> do lookups.
>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>> which
>>> lists taxids and descriptions (and synonyms)
>>>> 
>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>> could do this:
>>>> 
>>>>  my $taxid  = $gi_taxid_nucl{$accession};
>>>>  my $org_name = $names{$taxid};
>>>> 
>>>> --Russell
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>> 
>>>>> Bhakti,
>>>>> The following example (using EUtilities) may serve your purpose:
>>>>> 
>>>>> use Bio::DB::EUtilities;
>>>>> 
>>>>> my (%taxa, @taxa);
>>>>> my (%names, %idmap);
>>>>> 
>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>> 'nucleotide',
>>>>> # (probably)
>>>>> 
>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>> 
>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>                                      -db => 'taxonomy',
>>>>>                                      -dbfrom => 'protein',
>>>>>                                      -correspondence => 1,
>>>>>                                      -id => \@ids);
>>>>> 
>>>>> # iterate through the LinkSet objects
>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>   $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>> }
>>>>> 
>>>>> @taxa = @taxa{@ids};
>>>>> 
>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>       -db    => 'taxonomy',
>>>>>       -id    => \@taxa );
>>>>> 
>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>   $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>> }
>>>>> 
>>>>> foreach (@ids) {
>>>>>   $idmap{$_} = $names{$taxa{$_}};
>>>>> }
>>>>> 
>>>>> # %idmap is
>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>> 
>>>>> 1;
>>>>> 
>>>>> You probably will need to break up your 30000 into chunks
>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>> 
>>>>> sleep 3;
>>>>> 
>>>>> or so separating the queries.
>>>>> MAJ
>>>>> ----- Original Message -----
>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>>>> 
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>> given
>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>> numbers
>>>>>> for which I need to get the source organisms.  Any kind of help will
>>> be
>>>>>> appreciated.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> BD
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> 
>>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>> =======================================================================
>>>> Attention: The information contained in this message and/or
>> attachments
>>>> from AgResearch Limited is intended only for the persons or entities
>>>> to which it is addressed and may contain confidential and/or
>> privileged
>>>> material. Any review, retransmission, dissemination or other use of,
>> or
>>>> taking of any action in reliance upon, this information by persons or
>>>> entities other than the intended recipients is prohibited by
>> AgResearch
>>>> Limited. If you have received this message in error, please notify the
>>>> sender immediately.
>>>> 
>> =======================================================================
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Wed Jan 27 01:59:15 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 14:59:15 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>

I've had a wide selection of errors lately:

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable)
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
STACK: get_desc.pl:32
-----------------------------------------------------------

And I never get a good explanation from NCBI or suggestions on how to avoid it.


--Russell
	

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 27 January 2010 2:46 p.m.
> To: Smithies, Russell
> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> It's unfortunate but I have heard this problem popping up quite a bit more
> frequently lately.  Not to push too many buttons but NCBI isn't very
> forthcoming with help these days; they have become quite insular.  Not
> sure if they're short-staffed due to budget or if there are other issues.
> 
> chris
> 
> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> 
> > Grrrrrr, I hate eutils!!!!
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> (Connection refused)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > STACK: Bio::Tools::EUtilities::parse_data
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > STACK: Bio::Tools::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > STACK: Bio::DB::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > STACK: get_desc.pl:32
> > -----------------------------------------------------------
> >
> >
> > Nice error message though :-)
> >
> >
> > --Russell
> >
> >> -----Original Message-----
> >> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> >> Sent: Monday, 11 January 2010 10:05 a.m.
> >> To: 'Chris Fields'
> >> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> I've started to go off eUtils recently (not BioPerl's fault) as I've
> often
> >> been finding that with large queries, chunks of the resulting data is
> >> missing.
> >> For example, before Xmas I was creating species-specific databases by
> >> using eUtils to get a list of GI numbers back for a taxid, then
> retrieving
> >> the fasta sequences in chunks of 500.
> >> Very regularly, in the middle of the fasta there would be a message
> about
> >> resource unavailable eg.
> >>> test_sequence_1
> >>  TACGATCATCGCTResource UnavailableTACGACTCTGCT
> >>> test_sequence_2
> >>  TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> >>
> >> Often this wasn't detected until formatdb complained about invalid
> >> characters.
> >> Inquiries to NCBI as to why this was happening and what to do about it
> >> returned stupid answers ("do each sequence manually thru the web
> >> interface", or "use eUtils").
> >> As we have a nice fast network connection, I now prefer to download
> very
> >> large gzip files (i.e. all of refseq) and extract what I need.
> >>
> >> I can't help but think that NCBI could solve a lot of problems if they
> >> gzipped the output from eUtils queries - it's something I've requested
> >> regularly for the last 5 years or so!!
> >>
> >> --Russell
> >>
> >>
> >>> -----Original Message-----
> >>> From: Chris Fields [mailto:cjfields at illinois.edu]
> >>> Sent: Monday, 11 January 2010 9:50 a.m.
> >>> To: Smithies, Russell
> >>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
> >>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >>> number?
> >>>
> >>> One could also use Bio::DB::Taxonomy, which indexes the same files or
> >>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> the
> >>> details).
> >>>
> >>> chris
> >>>
> >>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >>>
> >>>> An alternate non-BioPerly way (that may be faster given NCBI's
> >> flakiness
> >>> lately) would be to download the gi_taxid_nucl.zip or
> gi_taxid_prot.zip
> >>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
> >> and
> >>> do lookups.
> >>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> >> which
> >>> lists taxids and descriptions (and synonyms)
> >>>>
> >>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
> >>> could do this:
> >>>>
> >>>>  my $taxid  = $gi_taxid_nucl{$accession};
> >>>>  my $org_name = $names{$taxid};
> >>>>
> >>>> --Russell
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> >>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> >>>>> number?
> >>>>>
> >>>>> Bhakti,
> >>>>> The following example (using EUtilities) may serve your purpose:
> >>>>>
> >>>>> use Bio::DB::EUtilities;
> >>>>>
> >>>>> my (%taxa, @taxa);
> >>>>> my (%names, %idmap);
> >>>>>
> >>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >>>>> 'nucleotide',
> >>>>> # (probably)
> >>>>>
> >>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>>>>
> >>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>>>>                                      -db => 'taxonomy',
> >>>>>                                      -dbfrom => 'protein',
> >>>>>                                      -correspondence => 1,
> >>>>>                                      -id => \@ids);
> >>>>>
> >>>>> # iterate through the LinkSet objects
> >>>>> while (my $ds = $factory->next_LinkSet) {
> >>>>>   $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >>>>> }
> >>>>>
> >>>>> @taxa = @taxa{@ids};
> >>>>>
> >>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>>>>       -db    => 'taxonomy',
> >>>>>       -id    => \@taxa );
> >>>>>
> >>>>> while (local $_ = $factory->next_DocSum) {
> >>>>>   $names{($_->get_contents_by_name('TaxId'))[0]} =
> >>>>> ($_->get_contents_by_name('ScientificName'))[0];
> >>>>> }
> >>>>>
> >>>>> foreach (@ids) {
> >>>>>   $idmap{$_} = $names{$taxa{$_}};
> >>>>> }
> >>>>>
> >>>>> # %idmap is
> >>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> >>>>> #    730439 => 'Bacillus caldolyticus'
> >>>>> #    89318838 => undef    (this record has been removed from the db)
> >>>>>
> >>>>> 1;
> >>>>>
> >>>>> You probably will need to break up your 30000 into chunks
> >>>>> (say, 1000-3000 each), and do the above on each chunk with a
> >>>>>
> >>>>> sleep 3;
> >>>>>
> >>>>> or so separating the queries.
> >>>>> MAJ
> >>>>> ----- Original Message -----
> >>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >>>>> To: <bioperl-l at lists.open-bio.org>
> >>>>> Sent: Friday, December 25, 2009 9:46 PM
> >>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> >>> number?
> >>>>>
> >>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
> >>>>> given
> >>>>>> the accession number using Bioperl.   I have these 30,000 accession
> >>>>> numbers
> >>>>>> for which I need to get the source organisms.  Any kind of help
> will
> >>> be
> >>>>>> appreciated.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>> BD
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> Bioperl-l mailing list
> >>>>> Bioperl-l at lists.open-bio.org
> >>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >> =======================================================================
> >>>> Attention: The information contained in this message and/or
> >> attachments
> >>>> from AgResearch Limited is intended only for the persons or entities
> >>>> to which it is addressed and may contain confidential and/or
> >> privileged
> >>>> material. Any review, retransmission, dissemination or other use of,
> >> or
> >>>> taking of any action in reliance upon, this information by persons or
> >>>> entities other than the intended recipients is prohibited by
> >> AgResearch
> >>>> Limited. If you have received this message in error, please notify
> the
> >>>> sender immediately.
> >>>>
> >> =======================================================================
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>
> >>
> >> _______________________________________________
> >> Bioperl-l mailing list
> >> Bioperl-l at lists.open-bio.org
> >> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Wed Jan 27 02:42:22 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 26 Jan 2010 20:42:22 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
Message-ID: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>

Makes me wonder if they're pushing more users towards the SOAP-based services and away from eutils.

chris

On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:

> I've had a wide selection of errors lately:
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource temporarily unavailable)
> STACK: Error::throw
> STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::Tools::EUtilities::parse_data /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> STACK: Bio::Tools::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> STACK: Bio::DB::EUtilities::get_ids /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> STACK: get_desc.pl:32
> -----------------------------------------------------------
> 
> And I never get a good explanation from NCBI or suggestions on how to avoid it.
> 
> 
> --Russell
> 	
> 
>> -----Original Message-----
>> From: Chris Fields [mailto:cjfields at illinois.edu]
>> Sent: Wednesday, 27 January 2010 2:46 p.m.
>> To: Smithies, Russell
>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>> number?
>> 
>> It's unfortunate but I have heard this problem popping up quite a bit more
>> frequently lately.  Not to push too many buttons but NCBI isn't very
>> forthcoming with help these days; they have become quite insular.  Not
>> sure if they're short-staffed due to budget or if there are other issues.
>> 
>> chris
>> 
>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
>> 
>>> Grrrrrr, I hate eutils!!!!
>>> 
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
>> (Connection refused)
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>> STACK: Bio::Tools::EUtilities::parse_data
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>>> STACK: Bio::Tools::EUtilities::get_ids
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>>> STACK: Bio::DB::EUtilities::get_ids
>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>>> STACK: get_desc.pl:32
>>> -----------------------------------------------------------
>>> 
>>> 
>>> Nice error message though :-)
>>> 
>>> 
>>> --Russell
>>> 
>>>> -----Original Message-----
>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>>>> Sent: Monday, 11 January 2010 10:05 a.m.
>>>> To: 'Chris Fields'
>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>> number?
>>>> 
>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
>> often
>>>> been finding that with large queries, chunks of the resulting data is
>>>> missing.
>>>> For example, before Xmas I was creating species-specific databases by
>>>> using eUtils to get a list of GI numbers back for a taxid, then
>> retrieving
>>>> the fasta sequences in chunks of 500.
>>>> Very regularly, in the middle of the fasta there would be a message
>> about
>>>> resource unavailable eg.
>>>>> test_sequence_1
>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>>>> test_sequence_2
>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>>>> 
>>>> Often this wasn't detected until formatdb complained about invalid
>>>> characters.
>>>> Inquiries to NCBI as to why this was happening and what to do about it
>>>> returned stupid answers ("do each sequence manually thru the web
>>>> interface", or "use eUtils").
>>>> As we have a nice fast network connection, I now prefer to download
>> very
>>>> large gzip files (i.e. all of refseq) and extract what I need.
>>>> 
>>>> I can't help but think that NCBI could solve a lot of problems if they
>>>> gzipped the output from eUtils queries - it's something I've requested
>>>> regularly for the last 5 years or so!!
>>>> 
>>>> --Russell
>>>> 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>>>> To: Smithies, Russell
>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>> 
>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
>> the
>>>>> details).
>>>>> 
>>>>> chris
>>>>> 
>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>>>> 
>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>>>> flakiness
>>>>> lately) would be to download the gi_taxid_nucl.zip or
>> gi_taxid_prot.zip
>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>>>> and
>>>>> do lookups.
>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>>>> which
>>>>> lists taxids and descriptions (and synonyms)
>>>>>> 
>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>>>> could do this:
>>>>>> 
>>>>>> my $taxid  = $gi_taxid_nucl{$accession};
>>>>>> my $org_name = $names{$taxid};
>>>>>> 
>>>>>> --Russell
>>>>>> 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
>> accession
>>>>>>> number?
>>>>>>> 
>>>>>>> Bhakti,
>>>>>>> The following example (using EUtilities) may serve your purpose:
>>>>>>> 
>>>>>>> use Bio::DB::EUtilities;
>>>>>>> 
>>>>>>> my (%taxa, @taxa);
>>>>>>> my (%names, %idmap);
>>>>>>> 
>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>>>> 'nucleotide',
>>>>>>> # (probably)
>>>>>>> 
>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>>>> 
>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>>>                                     -db => 'taxonomy',
>>>>>>>                                     -dbfrom => 'protein',
>>>>>>>                                     -correspondence => 1,
>>>>>>>                                     -id => \@ids);
>>>>>>> 
>>>>>>> # iterate through the LinkSet objects
>>>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>>>> }
>>>>>>> 
>>>>>>> @taxa = @taxa{@ids};
>>>>>>> 
>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>>>      -db    => 'taxonomy',
>>>>>>>      -id    => \@taxa );
>>>>>>> 
>>>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>>>> }
>>>>>>> 
>>>>>>> foreach (@ids) {
>>>>>>>  $idmap{$_} = $names{$taxa{$_}};
>>>>>>> }
>>>>>>> 
>>>>>>> # %idmap is
>>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>>>> 
>>>>>>> 1;
>>>>>>> 
>>>>>>> You probably will need to break up your 30000 into chunks
>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>>>> 
>>>>>>> sleep 3;
>>>>>>> 
>>>>>>> or so separating the queries.
>>>>>>> MAJ
>>>>>>> ----- Original Message -----
>>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>>>> 
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>>>> given
>>>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>>>> numbers
>>>>>>>> for which I need to get the source organisms.  Any kind of help
>> will
>>>>> be
>>>>>>>> appreciated.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> 
>>>>>>>> BD
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>> 
>>>> =======================================================================
>>>>>> Attention: The information contained in this message and/or
>>>> attachments
>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>> to which it is addressed and may contain confidential and/or
>>>> privileged
>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>> or
>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>> entities other than the intended recipients is prohibited by
>>>> AgResearch
>>>>>> Limited. If you have received this message in error, please notify
>> the
>>>>>> sender immediately.
>>>>>> 
>>>> =======================================================================
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Bioperl-l mailing list
>>>>>> Bioperl-l at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Wed Jan 27 02:45:58 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Wed, 27 Jan 2010 15:45:58 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>

Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today).

--Russell

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Wednesday, 27 January 2010 3:42 p.m.
> To: Smithies, Russell
> Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Makes me wonder if they're pushing more users towards the SOAP-based
> services and away from eutils.
> 
> chris
> 
> On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> 
> > I've had a wide selection of errors lately:
> >
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource
> temporarily unavailable)
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > STACK: Bio::Tools::EUtilities::parse_data
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > STACK: Bio::Tools::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > STACK: Bio::DB::EUtilities::get_ids
> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > STACK: get_desc.pl:32
> > -----------------------------------------------------------
> >
> > And I never get a good explanation from NCBI or suggestions on how to
> avoid it.
> >
> >
> > --Russell
> >
> >
> >> -----Original Message-----
> >> From: Chris Fields [mailto:cjfields at illinois.edu]
> >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> >> To: Smithies, Russell
> >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >> number?
> >>
> >> It's unfortunate but I have heard this problem popping up quite a bit
> more
> >> frequently lately.  Not to push too many buttons but NCBI isn't very
> >> forthcoming with help these days; they have become quite insular.  Not
> >> sure if they're short-staffed due to budget or if there are other
> issues.
> >>
> >> chris
> >>
> >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> >>
> >>> Grrrrrr, I hate eutils!!!!
> >>>
> >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> >> (Connection refused)
> >>> STACK: Error::throw
> >>> STACK: Bio::Root::Root::throw
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> >>> STACK: Bio::Tools::EUtilities::parse_data
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> >>> STACK: Bio::Tools::EUtilities::get_ids
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> >>> STACK: Bio::DB::EUtilities::get_ids
> >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> >>> STACK: get_desc.pl:32
> >>> -----------------------------------------------------------
> >>>
> >>>
> >>> Nice error message though :-)
> >>>
> >>>
> >>> --Russell
> >>>
> >>>> -----Original Message-----
> >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> >>>> To: 'Chris Fields'
> >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> bio.org'
> >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> >>>> number?
> >>>>
> >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
> >> often
> >>>> been finding that with large queries, chunks of the resulting data is
> >>>> missing.
> >>>> For example, before Xmas I was creating species-specific databases by
> >>>> using eUtils to get a list of GI numbers back for a taxid, then
> >> retrieving
> >>>> the fasta sequences in chunks of 500.
> >>>> Very regularly, in the middle of the fasta there would be a message
> >> about
> >>>> resource unavailable eg.
> >>>>> test_sequence_1
> >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> >>>>> test_sequence_2
> >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> >>>>
> >>>> Often this wasn't detected until formatdb complained about invalid
> >>>> characters.
> >>>> Inquiries to NCBI as to why this was happening and what to do about
> it
> >>>> returned stupid answers ("do each sequence manually thru the web
> >>>> interface", or "use eUtils").
> >>>> As we have a nice fast network connection, I now prefer to download
> >> very
> >>>> large gzip files (i.e. all of refseq) and extract what I need.
> >>>>
> >>>> I can't help but think that NCBI could solve a lot of problems if
> they
> >>>> gzipped the output from eUtils queries - it's something I've
> requested
> >>>> regularly for the last 5 years or so!!
> >>>>
> >>>> --Russell
> >>>>
> >>>>
> >>>>> -----Original Message-----
> >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> >>>>> To: Smithies, Russell
> >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> bio.org'
> >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> >>>>> number?
> >>>>>
> >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files
> or
> >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> >> the
> >>>>> details).
> >>>>>
> >>>>> chris
> >>>>>
> >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> >>>>>
> >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> >>>> flakiness
> >>>>> lately) would be to download the gi_taxid_nucl.zip or
> >> gi_taxid_prot.zip
> >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a
> hash
> >>>> and
> >>>>> do lookups.
> >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> >>>> which
> >>>>> lists taxids and descriptions (and synonyms)
> >>>>>>
> >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so
> I
> >>>>> could do this:
> >>>>>>
> >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> >>>>>> my $org_name = $names{$taxid};
> >>>>>>
> >>>>>> --Russell
> >>>>>>
> >>>>>>
> >>>>>>> -----Original Message-----
> >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> >> accession
> >>>>>>> number?
> >>>>>>>
> >>>>>>> Bhakti,
> >>>>>>> The following example (using EUtilities) may serve your purpose:
> >>>>>>>
> >>>>>>> use Bio::DB::EUtilities;
> >>>>>>>
> >>>>>>> my (%taxa, @taxa);
> >>>>>>> my (%names, %idmap);
> >>>>>>>
> >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> >>>>>>> 'nucleotide',
> >>>>>>> # (probably)
> >>>>>>>
> >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> >>>>>>>
> >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> >>>>>>>                                     -db => 'taxonomy',
> >>>>>>>                                     -dbfrom => 'protein',
> >>>>>>>                                     -correspondence => 1,
> >>>>>>>                                     -id => \@ids);
> >>>>>>>
> >>>>>>> # iterate through the LinkSet objects
> >>>>>>> while (my $ds = $factory->next_LinkSet) {
> >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> >>>>>>> }
> >>>>>>>
> >>>>>>> @taxa = @taxa{@ids};
> >>>>>>>
> >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> >>>>>>>      -db    => 'taxonomy',
> >>>>>>>      -id    => \@taxa );
> >>>>>>>
> >>>>>>> while (local $_ = $factory->next_DocSum) {
> >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> >>>>>>> }
> >>>>>>>
> >>>>>>> foreach (@ids) {
> >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> >>>>>>> }
> >>>>>>>
> >>>>>>> # %idmap is
> >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> >>>>>>> #    730439 => 'Bacillus caldolyticus'
> >>>>>>> #    89318838 => undef    (this record has been removed from the
> db)
> >>>>>>>
> >>>>>>> 1;
> >>>>>>>
> >>>>>>> You probably will need to break up your 30000 into chunks
> >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> >>>>>>>
> >>>>>>> sleep 3;
> >>>>>>>
> >>>>>>> or so separating the queries.
> >>>>>>> MAJ
> >>>>>>> ----- Original Message -----
> >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> >>>>>>> To: <bioperl-l at lists.open-bio.org>
> >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> >>>>> number?
> >>>>>>>
> >>>>>>>
> >>>>>>>> Hi,
> >>>>>>>>
> >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> name"
> >>>>>>> given
> >>>>>>>> the accession number using Bioperl.   I have these 30,000
> accession
> >>>>>>> numbers
> >>>>>>>> for which I need to get the source organisms.  Any kind of help
> >> will
> >>>>> be
> >>>>>>>> appreciated.
> >>>>>>>>
> >>>>>>>> Thanks
> >>>>>>>>
> >>>>>>>> BD
> >>>>>>>> _______________________________________________
> >>>>>>>> Bioperl-l mailing list
> >>>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Bioperl-l mailing list
> >>>>>>> Bioperl-l at lists.open-bio.org
> >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>>>
> >>>>
> =======================================================================
> >>>>>> Attention: The information contained in this message and/or
> >>>> attachments
> >>>>>> from AgResearch Limited is intended only for the persons or
> entities
> >>>>>> to which it is addressed and may contain confidential and/or
> >>>> privileged
> >>>>>> material. Any review, retransmission, dissemination or other use
> of,
> >>>> or
> >>>>>> taking of any action in reliance upon, this information by persons
> or
> >>>>>> entities other than the intended recipients is prohibited by
> >>>> AgResearch
> >>>>>> Limited. If you have received this message in error, please notify
> >> the
> >>>>>> sender immediately.
> >>>>>>
> >>>>
> =======================================================================
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Bioperl-l mailing list
> >>>>>> Bioperl-l at lists.open-bio.org
> >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Bioperl-l mailing list
> >>>> Bioperl-l at lists.open-bio.org
> >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Wed Jan 27 15:14:22 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Wed, 27 Jan 2010 10:14:22 -0500
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com><1CE23DE1068C4FA2BD543D167A1AA901@NewLife><18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz><F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz><18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz><4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu><18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
Message-ID: <C1C922A99DF24679955608955B2A73B1@NewLife>

Precisely the MO behind SoapEU...get the jump on 'em.
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Smithies, Russell" <Russell.Smithies at agresearch.co.nz>
Cc: <bioperl-l at lists.open-bio.org>; "'Mark A. Jensen'" <maj at fortinbras.us>
Sent: Tuesday, January 26, 2010 9:42 PM
Subject: Re: [Bioperl-l] how to retrieve organism name from accession number?


> Makes me wonder if they're pushing more users towards the SOAP-based services 
> and away from eutils.
>
> chris
>
> On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
>
>> I've had a wide selection of errors lately:
>>
>> ------------- EXCEPTION: Bio::Root::Exception -------------
>> MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource 
>> temporarily unavailable)
>> STACK: Error::throw
>> STACK: Bio::Root::Root::throw 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>> STACK: Bio::Tools::EUtilities::parse_data 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>> STACK: Bio::Tools::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>> STACK: Bio::DB::EUtilities::get_ids 
>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>> STACK: get_desc.pl:32
>> -----------------------------------------------------------
>>
>> And I never get a good explanation from NCBI or suggestions on how to avoid 
>> it.
>>
>>
>> --Russell
>>
>>
>>> -----Original Message-----
>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>> Sent: Wednesday, 27 January 2010 2:46 p.m.
>>> To: Smithies, Russell
>>> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>> number?
>>>
>>> It's unfortunate but I have heard this problem popping up quite a bit more
>>> frequently lately.  Not to push too many buttons but NCBI isn't very
>>> forthcoming with help these days; they have become quite insular.  Not
>>> sure if they're short-staffed due to budget or if there are other issues.
>>>
>>> chris
>>>
>>> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
>>>
>>>> Grrrrrr, I hate eutils!!!!
>>>>
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
>>> (Connection refused)
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
>>>> STACK: Bio::Tools::EUtilities::parse_data
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
>>>> STACK: Bio::Tools::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
>>>> STACK: Bio::DB::EUtilities::get_ids
>>> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
>>>> STACK: get_desc.pl:32
>>>> -----------------------------------------------------------
>>>>
>>>>
>>>> Nice error message though :-)
>>>>
>>>>
>>>> --Russell
>>>>
>>>>> -----Original Message-----
>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
>>>>> Sent: Monday, 11 January 2010 10:05 a.m.
>>>>> To: 'Chris Fields'
>>>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>> number?
>>>>>
>>>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
>>> often
>>>>> been finding that with large queries, chunks of the resulting data is
>>>>> missing.
>>>>> For example, before Xmas I was creating species-specific databases by
>>>>> using eUtils to get a list of GI numbers back for a taxid, then
>>> retrieving
>>>>> the fasta sequences in chunks of 500.
>>>>> Very regularly, in the middle of the fasta there would be a message
>>> about
>>>>> resource unavailable eg.
>>>>>> test_sequence_1
>>>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
>>>>>> test_sequence_2
>>>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
>>>>>
>>>>> Often this wasn't detected until formatdb complained about invalid
>>>>> characters.
>>>>> Inquiries to NCBI as to why this was happening and what to do about it
>>>>> returned stupid answers ("do each sequence manually thru the web
>>>>> interface", or "use eUtils").
>>>>> As we have a nice fast network connection, I now prefer to download
>>> very
>>>>> large gzip files (i.e. all of refseq) and extract what I need.
>>>>>
>>>>> I can't help but think that NCBI could solve a lot of problems if they
>>>>> gzipped the output from eUtils queries - it's something I've requested
>>>>> regularly for the last 5 years or so!!
>>>>>
>>>>> --Russell
>>>>>
>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
>>>>>> Sent: Monday, 11 January 2010 9:50 a.m.
>>>>>> To: Smithies, Russell
>>>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-bio.org'
>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>
>>>>>> One could also use Bio::DB::Taxonomy, which indexes the same files or
>>>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
>>> the
>>>>>> details).
>>>>>>
>>>>>> chris
>>>>>>
>>>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
>>>>>>
>>>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
>>>>> flakiness
>>>>>> lately) would be to download the gi_taxid_nucl.zip or
>>> gi_taxid_prot.zip
>>>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a hash
>>>>> and
>>>>>> do lookups.
>>>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
>>>>> which
>>>>>> lists taxids and descriptions (and synonyms)
>>>>>>>
>>>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so I
>>>>>> could do this:
>>>>>>>
>>>>>>> my $taxid  = $gi_taxid_nucl{$accession};
>>>>>>> my $org_name = $names{$taxid};
>>>>>>>
>>>>>>> --Russell
>>>>>>>
>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
>>>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
>>>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
>>>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
>>>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
>>> accession
>>>>>>>> number?
>>>>>>>>
>>>>>>>> Bhakti,
>>>>>>>> The following example (using EUtilities) may serve your purpose:
>>>>>>>>
>>>>>>>> use Bio::DB::EUtilities;
>>>>>>>>
>>>>>>>> my (%taxa, @taxa);
>>>>>>>> my (%names, %idmap);
>>>>>>>>
>>>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
>>>>>>>> 'nucleotide',
>>>>>>>> # (probably)
>>>>>>>>
>>>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
>>>>>>>>
>>>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
>>>>>>>>                                     -db => 'taxonomy',
>>>>>>>>                                     -dbfrom => 'protein',
>>>>>>>>                                     -correspondence => 1,
>>>>>>>>                                     -id => \@ids);
>>>>>>>>
>>>>>>>> # iterate through the LinkSet objects
>>>>>>>> while (my $ds = $factory->next_LinkSet) {
>>>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
>>>>>>>> }
>>>>>>>>
>>>>>>>> @taxa = @taxa{@ids};
>>>>>>>>
>>>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
>>>>>>>>      -db    => 'taxonomy',
>>>>>>>>      -id    => \@taxa );
>>>>>>>>
>>>>>>>> while (local $_ = $factory->next_DocSum) {
>>>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
>>>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
>>>>>>>> }
>>>>>>>>
>>>>>>>> foreach (@ids) {
>>>>>>>>  $idmap{$_} = $names{$taxa{$_}};
>>>>>>>> }
>>>>>>>>
>>>>>>>> # %idmap is
>>>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
>>>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
>>>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
>>>>>>>> #    730439 => 'Bacillus caldolyticus'
>>>>>>>> #    89318838 => undef    (this record has been removed from the db)
>>>>>>>>
>>>>>>>> 1;
>>>>>>>>
>>>>>>>> You probably will need to break up your 30000 into chunks
>>>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
>>>>>>>>
>>>>>>>> sleep 3;
>>>>>>>>
>>>>>>>> or so separating the queries.
>>>>>>>> MAJ
>>>>>>>> ----- Original Message -----
>>>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
>>>>>>>> To: <bioperl-l at lists.open-bio.org>
>>>>>>>> Sent: Friday, December 25, 2009 9:46 PM
>>>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
>>>>>> number?
>>>>>>>>
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Does anyone know how to retrieve the "Source" or the "Species name"
>>>>>>>> given
>>>>>>>>> the accession number using Bioperl.   I have these 30,000 accession
>>>>>>>> numbers
>>>>>>>>> for which I need to get the source organisms.  Any kind of help
>>> will
>>>>>> be
>>>>>>>>> appreciated.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> BD
>>>>>>>>> _______________________________________________
>>>>>>>>> Bioperl-l mailing list
>>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Bioperl-l mailing list
>>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>>>
>>>>> =======================================================================
>>>>>>> Attention: The information contained in this message and/or
>>>>> attachments
>>>>>>> from AgResearch Limited is intended only for the persons or entities
>>>>>>> to which it is addressed and may contain confidential and/or
>>>>> privileged
>>>>>>> material. Any review, retransmission, dissemination or other use of,
>>>>> or
>>>>>>> taking of any action in reliance upon, this information by persons or
>>>>>>> entities other than the intended recipients is prohibited by
>>>>> AgResearch
>>>>>>> Limited. If you have received this message in error, please notify
>>> the
>>>>>>> sender immediately.
>>>>>>>
>>>>> =======================================================================
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Bioperl-l mailing list
>>>>>>> Bioperl-l at lists.open-bio.org
>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Bioperl-l mailing list
>>>>> Bioperl-l at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> 


From bhakti.dwivedi at gmail.com  Wed Jan 27 19:42:06 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Wed, 27 Jan 2010 14:42:06 -0500
Subject: [Bioperl-l] Designing primers from multiple sequence alignment of
	amino acid sequences
Message-ID: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>

Hi,

I have to design primers from the multiple sequence alignments of amino acid
sequences.  The sequences I am working with are quite diverged and often the
available primer design programs (such as CODEHOP/iCODEHOP) fail to find any
primer sets.   But, when I look  at the alignment manually, I could see the
regions that I could use to make primers.

So I  designed the degenerate primers the old-fashioned way, starting from
selecting the conserved regions (6-10aa long) from the alignment  to
translating the selected regions to DNA using the appropriate codon usage
table, and then finally checking the primer sets (potential forward and
reverse primers) using tools like OLIGOANALYZER.  In the end, I did find few
good primer sets, but getting them to work in reality is something I will
have to wait and see.

While doing this process manually, I really felt the need to automate it (it
was not just one alignment I did, I worked with several of those).   I was
wondering if there is anyway bioperl can help me here, or making a perl
script is the only way to go.

I would appreciate your suggestions/comments.  Thanks!  (apologize for a
long email..)


Regards
Bhakti


From Kevin.M.Brown at asu.edu  Wed Jan 27 20:23:57 2010
From: Kevin.M.Brown at asu.edu (Kevin Brown)
Date: Wed, 27 Jan 2010 13:23:57 -0700
Subject: [Bioperl-l] Designing primers from multiple sequence alignment
	ofamino acid sequences
In-Reply-To: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>
References: <b643abd21001271142y1734a9dua9c55aa88049d7bc@mail.gmail.com>
Message-ID: <1A4207F8295607498283FE9E93B775B4068498DB@EX02.asurite.ad.asu.edu>

Bioperl is just a collection of tools, not a full blown application.
Most of what you want can be done with the objects available from within
the toolkit, but the application (perl script) would still need to be
written to put the objects to use. You could use clustalw from within
perl to align the sequences (Bio::Tools::Run::Alignment::Clustalw), find
the conserved regions (Bio::SimpleAlign), reverse translate them
(Bio::Tools::CodonTable), then come up with an algorithm for primer
analysis and selction (or even use other apps like primer3
(Bio::Tools::Run::Primer3) from within perl).

Kevin Brown
Center for Innovations in Medicine
Biodesign Institute
Arizona State University  

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org 
> [mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of 
> Bhakti Dwivedi
> Sent: Wednesday, January 27, 2010 12:42 PM
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Designing primers from multiple sequence 
> alignment ofamino acid sequences
> 
> Hi,
> 
> I have to design primers from the multiple sequence 
> alignments of amino acid
> sequences.  The sequences I am working with are quite 
> diverged and often the
> available primer design programs (such as CODEHOP/iCODEHOP) 
> fail to find any
> primer sets.   But, when I look  at the alignment manually, I 
> could see the
> regions that I could use to make primers.
> 
> So I  designed the degenerate primers the old-fashioned way, 
> starting from
> selecting the conserved regions (6-10aa long) from the alignment  to
> translating the selected regions to DNA using the appropriate 
> codon usage
> table, and then finally checking the primer sets (potential 
> forward and
> reverse primers) using tools like OLIGOANALYZER.  In the end, 
> I did find few
> good primer sets, but getting them to work in reality is 
> something I will
> have to wait and see.
> 
> While doing this process manually, I really felt the need to 
> automate it (it
> was not just one alignment I did, I worked with several of 
> those).   I was
> wondering if there is anyway bioperl can help me here, or 
> making a perl
> script is the only way to go.
> 
> I would appreciate your suggestions/comments.  Thanks!  
> (apologize for a
> long email..)
> 
> 
> Regards
> Bhakti
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From mike.stubbington at bbsrc.ac.uk  Thu Jan 28 15:41:49 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Thu, 28 Jan 2010 15:41:49 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
Message-ID: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>

Dear all,

I am attempting to blast some primers against the mouse genome. I have created a local mouse genome blast database and I can search against it using 'blastn' at the command line. 

I have perl code that creates an array of bioperl sequence objects called @primers

I then create a StandAloneBlastPlus factory using the following code?

	my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
		-db_dir => '/Users/stubbing/localBlast/',
		-db_name => 'MouseGenome'
	);

and then attempt to blast my primers using this?

	my @shortPrimers;
	my $count=1;
	foreach (@primers) {
		my $currentSeq = $_;
		print "Checking primer $count/$primerNumber ";
		if ($_->length < 40) {
			push(@shortPrimers,$_);
			print "Too short!\n";
		}
		else {
			print "BLASTing...";
			my $blastResult = $blastFactory->blastn(-query => $currentSeq);
		}
		$count++;
	}

This fails with the following error?

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> line 532.

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

Line 63 in my code is (as you might expect) the one that calls blastn on my factory object.

I'd appreciate any help you might be able to provide to shed light on this.

Thanks in advance,

Mike


From maj at fortinbras.us  Thu Jan 28 15:56:14 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 10:56:14 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
Message-ID: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>

Mike - please try updating your bioperl-live (the core) to the latest code 
(revision 16761 or so).
CommandExts is a work in progress; from the stack errors it looks like you've 
got an older version.
Try it then ping us back, if you would--
Thanks
Mark
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 10:41 AM
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error 
running blastn


Dear all,

I am attempting to blast some primers against the mouse genome. I have created a 
local mouse genome blast database and I can search against it using 'blastn' at 
the command line.

I have perl code that creates an array of bioperl sequence objects called 
@primers

I then create a StandAloneBlastPlus factory using the following code?

my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_dir => '/Users/stubbing/localBlast/',
-db_name => 'MouseGenome'
);

and then attempt to blast my primers using this?

my @shortPrimers;
my $count=1;
foreach (@primers) {
my $currentSeq = $_;
print "Checking primer $count/$primerNumber ";
if ($_->length < 40) {
push(@shortPrimers,$_);
print "Too short!\n";
}
else {
print "BLASTing...";
my $blastResult = $blastFactory->blastn(-query => $currentSeq);
}
$count++;
}

This fails with the following error?

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

Line 63 in my code is (as you might expect) the one that calls blastn on my 
factory object.

I'd appreciate any help you might be able to provide to shed light on this.

Thanks in advance,

Mike


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From mike.stubbington at bbsrc.ac.uk  Thu Jan 28 16:18:12 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Thu, 28 Jan 2010 16:18:12 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
In-Reply-To: <56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
	<56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
Message-ID: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>

Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : Illegal seek at /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> line 532.

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code 
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've 
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error 
> running blastn
> 
> 
> Dear all,
> 
> I am attempting to blast some primers against the mouse genome. I have created a 
> local mouse genome blast database and I can search against it using 'blastn' at 
> the command line.
> 
> I have perl code that creates an array of bioperl sequence objects called 
> @primers
> 
> I then create a StandAloneBlastPlus factory using the following code?
> 
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
> 
> and then attempt to blast my primers using this?
> 
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
> 
> This fails with the following error?
> 
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA> 
> line 532.
> 
> STACK Bio::Tools::Run::WrapperBase::_run 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
> 
> Line 63 in my code is (as you might expect) the one that calls blastn on my 
> factory object.
> 
> I'd appreciate any help you might be able to provide to shed light on this.
> 
> Thanks in advance,
> 
> Mike
> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Thu Jan 28 16:28:52 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 11:28:52 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk>
	<56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
Message-ID: <C7FF329BCA044F19B3D690FE67319192@NewLife>

Thanks Mike-- will have a look asap- cheers MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 11:18 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
> running blastn
>
>
> Dear all,
>
> I am attempting to blast some primers against the mouse genome. I have created 
> a
> local mouse genome blast database and I can search against it using 'blastn' 
> at
> the command line.
>
> I have perl code that creates an array of bioperl sequence objects called
> @primers
>
> I then create a StandAloneBlastPlus factory using the following code?
>
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
>
> and then attempt to blast my primers using this?
>
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
>
> This fails with the following error?
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> Line 63 in my code is (as you might expect) the one that calls blastn on my
> factory object.
>
> I'd appreciate any help you might be able to provide to shed light on this.
>
> Thanks in advance,
>
> Mike
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From cjfields at illinois.edu  Thu Jan 28 18:26:27 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 12:26:27 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
Message-ID: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu>

Russell,

Just curious, but have you tried setting the return email parameter
(-email)?  NCBI recently stated that all queries would eventually
require a return email of some sort (not sure if it's validated or not).
I think that was set for around late spring.  I'm changing the code in
svn to require it for that very purpose.

chris  


 Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi still works if you don't mind a bit of manual button clicking. It's handling chunks of 100,000 records OK (today).
> 
> --Russell
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > To: Smithies, Russell
> > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > 
> > Makes me wonder if they're pushing more users towards the SOAP-based
> > services and away from eutils.
> > 
> > chris
> > 
> > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > 
> > > I've had a wide selection of errors lately:
> > >
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11 (Resource
> > temporarily unavailable)
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > STACK: Bio::Tools::EUtilities::parse_data
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > STACK: Bio::Tools::EUtilities::get_ids
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > STACK: Bio::DB::EUtilities::get_ids
> > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > STACK: get_desc.pl:32
> > > -----------------------------------------------------------
> > >
> > > And I never get a good explanation from NCBI or suggestions on how to
> > avoid it.
> > >
> > >
> > > --Russell
> > >
> > >
> > >> -----Original Message-----
> > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > >> To: Smithies, Russell
> > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > >> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >> number?
> > >>
> > >> It's unfortunate but I have heard this problem popping up quite a bit
> > more
> > >> frequently lately.  Not to push too many buttons but NCBI isn't very
> > >> forthcoming with help these days; they have become quite insular.  Not
> > >> sure if they're short-staffed due to budget or if there are other
> > issues.
> > >>
> > >> chris
> > >>
> > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > >>
> > >>> Grrrrrr, I hate eutils!!!!
> > >>>
> > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > >> (Connection refused)
> > >>> STACK: Error::throw
> > >>> STACK: Bio::Root::Root::throw
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > >>> STACK: Bio::Tools::EUtilities::parse_data
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > >>> STACK: Bio::Tools::EUtilities::get_ids
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > >>> STACK: Bio::DB::EUtilities::get_ids
> > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > >>> STACK: get_desc.pl:32
> > >>> -----------------------------------------------------------
> > >>>
> > >>>
> > >>> Nice error message though :-)
> > >>>
> > >>>
> > >>> --Russell
> > >>>
> > >>>> -----Original Message-----
> > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > >>>> To: 'Chris Fields'
> > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > bio.org'
> > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > >>>> number?
> > >>>>
> > >>>> I've started to go off eUtils recently (not BioPerl's fault) as I've
> > >> often
> > >>>> been finding that with large queries, chunks of the resulting data is
> > >>>> missing.
> > >>>> For example, before Xmas I was creating species-specific databases by
> > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > >> retrieving
> > >>>> the fasta sequences in chunks of 500.
> > >>>> Very regularly, in the middle of the fasta there would be a message
> > >> about
> > >>>> resource unavailable eg.
> > >>>>> test_sequence_1
> > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > >>>>> test_sequence_2
> > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > >>>>
> > >>>> Often this wasn't detected until formatdb complained about invalid
> > >>>> characters.
> > >>>> Inquiries to NCBI as to why this was happening and what to do about
> > it
> > >>>> returned stupid answers ("do each sequence manually thru the web
> > >>>> interface", or "use eUtils").
> > >>>> As we have a nice fast network connection, I now prefer to download
> > >> very
> > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > >>>>
> > >>>> I can't help but think that NCBI could solve a lot of problems if
> > they
> > >>>> gzipped the output from eUtils queries - it's something I've
> > requested
> > >>>> regularly for the last 5 years or so!!
> > >>>>
> > >>>> --Russell
> > >>>>
> > >>>>
> > >>>>> -----Original Message-----
> > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > >>>>> To: Smithies, Russell
> > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > bio.org'
> > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > >>>>> number?
> > >>>>>
> > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same files
> > or
> > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD for
> > >> the
> > >>>>> details).
> > >>>>>
> > >>>>> chris
> > >>>>>
> > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > >>>>>
> > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > >>>> flakiness
> > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > >> gi_taxid_prot.zip
> > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into a
> > hash
> > >>>> and
> > >>>>> do lookups.
> > >>>>>> In that same dir, taxdump.tar.gz contains a file called names.dmp
> > >>>> which
> > >>>>> lists taxids and descriptions (and synonyms)
> > >>>>>>
> > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes so
> > I
> > >>>>> could do this:
> > >>>>>>
> > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > >>>>>> my $org_name = $names{$taxid};
> > >>>>>>
> > >>>>>> --Russell
> > >>>>>>
> > >>>>>>
> > >>>>>>> -----Original Message-----
> > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > >> accession
> > >>>>>>> number?
> > >>>>>>>
> > >>>>>>> Bhakti,
> > >>>>>>> The following example (using EUtilities) may serve your purpose:
> > >>>>>>>
> > >>>>>>> use Bio::DB::EUtilities;
> > >>>>>>>
> > >>>>>>> my (%taxa, @taxa);
> > >>>>>>> my (%names, %idmap);
> > >>>>>>>
> > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom =>
> > >>>>>>> 'nucleotide',
> > >>>>>>> # (probably)
> > >>>>>>>
> > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > >>>>>>>
> > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > >>>>>>>                                     -db => 'taxonomy',
> > >>>>>>>                                     -dbfrom => 'protein',
> > >>>>>>>                                     -correspondence => 1,
> > >>>>>>>                                     -id => \@ids);
> > >>>>>>>
> > >>>>>>> # iterate through the LinkSet objects
> > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> @taxa = @taxa{@ids};
> > >>>>>>>
> > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > >>>>>>>      -db    => 'taxonomy',
> > >>>>>>>      -id    => \@taxa );
> > >>>>>>>
> > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> foreach (@ids) {
> > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > >>>>>>> }
> > >>>>>>>
> > >>>>>>> # %idmap is
> > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > >>>>>>> #    89318838 => undef    (this record has been removed from the
> > db)
> > >>>>>>>
> > >>>>>>> 1;
> > >>>>>>>
> > >>>>>>> You probably will need to break up your 30000 into chunks
> > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > >>>>>>>
> > >>>>>>> sleep 3;
> > >>>>>>>
> > >>>>>>> or so separating the queries.
> > >>>>>>> MAJ
> > >>>>>>> ----- Original Message -----
> > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from accession
> > >>>>> number?
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Hi,
> > >>>>>>>>
> > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > name"
> > >>>>>>> given
> > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > accession
> > >>>>>>> numbers
> > >>>>>>>> for which I need to get the source organisms.  Any kind of help
> > >> will
> > >>>>> be
> > >>>>>>>> appreciated.
> > >>>>>>>>
> > >>>>>>>> Thanks
> > >>>>>>>>
> > >>>>>>>> BD
> > >>>>>>>> _______________________________________________
> > >>>>>>>> Bioperl-l mailing list
> > >>>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>> _______________________________________________
> > >>>>>>> Bioperl-l mailing list
> > >>>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>>>
> > >>>>
> > =======================================================================
> > >>>>>> Attention: The information contained in this message and/or
> > >>>> attachments
> > >>>>>> from AgResearch Limited is intended only for the persons or
> > entities
> > >>>>>> to which it is addressed and may contain confidential and/or
> > >>>> privileged
> > >>>>>> material. Any review, retransmission, dissemination or other use
> > of,
> > >>>> or
> > >>>>>> taking of any action in reliance upon, this information by persons
> > or
> > >>>>>> entities other than the intended recipients is prohibited by
> > >>>> AgResearch
> > >>>>>> Limited. If you have received this message in error, please notify
> > >> the
> > >>>>>> sender immediately.
> > >>>>>>
> > >>>>
> > =======================================================================
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> Bioperl-l mailing list
> > >>>>>> Bioperl-l at lists.open-bio.org
> > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >>>>
> > >>>>
> > >>>> _______________________________________________
> > >>>> Bioperl-l mailing list
> > >>>> Bioperl-l at lists.open-bio.org
> > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From maj at fortinbras.us  Thu Jan 28 18:47:04 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 13:47:04 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
Message-ID: <FD6E9A89F6034CCB856E22553ED893D7@NewLife>

Hi Mike,
Believe I found the real bug causing the problem (was not accounting for
the db_dir parameter). Crashes should now also throw much more helpful
errors. Please try the code at r16774, and shout back.
thanks --
MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 11:18 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi,

Thanks for the suggestion. Unfortunately it still fails - error as follows:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : Illegal seek at 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
line 532.

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

M

On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:

> Mike - please try updating your bioperl-live (the core) to the latest code
> (revision 16761 or so).
> CommandExts is a work in progress; from the stack errors it looks like you've
> got an older version.
> Try it then ping us back, if you would--
> Thanks
> Mark
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 10:41 AM
> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
> running blastn
>
>
> Dear all,
>
> I am attempting to blast some primers against the mouse genome. I have created 
> a
> local mouse genome blast database and I can search against it using 'blastn' 
> at
> the command line.
>
> I have perl code that creates an array of bioperl sequence objects called
> @primers
>
> I then create a StandAloneBlastPlus factory using the following code?
>
> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
> -db_dir => '/Users/stubbing/localBlast/',
> -db_name => 'MouseGenome'
> );
>
> and then attempt to blast my primers using this?
>
> my @shortPrimers;
> my $count=1;
> foreach (@primers) {
> my $currentSeq = $_;
> print "Checking primer $count/$primerNumber ";
> if ($_->length < 40) {
> push(@shortPrimers,$_);
> print "Too short!\n";
> }
> else {
> print "BLASTing...";
> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
> }
> $count++;
> }
>
> This fails with the following error?
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> Line 63 in my code is (as you might expect) the one that calls blastn on my
> factory object.
>
> I'd appreciate any help you might be able to provide to shed light on this.
>
> Thanks in advance,
>
> Mike
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From cjfields at illinois.edu  Thu Jan 28 19:00:26 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:00:26 -0600
Subject: [Bioperl-l] EUtilities policy change
Message-ID: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>

All,

Per NCBI's recent change in eutils user policy (effective June 1):

http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html

Both the tool and email parameters ('-tool', '-email') are now required
when making requests.  Note this will significantly break all modules
requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
and Taxonomy stuff as well, IIRC).  This also applies to web services
(SOAP-based access).  Mark, not sure how this affects your SOAP-based
modules.

I have reconfigured Bio::DB::EUtilities to follow this policy; the
default tool setting has been 'bioperl' and will remain that way.
However, there has been no default email, therefore setting this is now
required for future requests unless we (the bioperl devs) decide there
is a safe default email to utilize.  My gut tells me, however, that
falling back to a default email opens up a can of worms for the devs and
is very likely a 'BAD IDEA'(TM).  

Regardless, be aware that, after June 1, NCBI will very likely exclude
requests with no email and will notify users who are considered to be
violating their policies.

I will likely make further changes to Bio::DB::EUtilities in the
meantime to ensure that using the tools by default will not violate
NCBI's policy (e.g. override this at your own risk).  

chris


From maj at fortinbras.us  Thu Jan 28 19:05:43 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 14:05:43 -0500
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
Message-ID: <8F49B5ED151143FA86E977B4D4F44265@NewLife>

Thanks Chris-- 
The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
I agree that a default email is a bad idea (tm) (unless maybe it's 
hilmar's...?). I'd say a warning on unset email parameters is a responsible
"there be dragons" sort of treatment.
MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
Cc: "Mark A. Jensen" <maj at fortinbras.us>
Sent: Thursday, January 28, 2010 2:00 PM
Subject: EUtilities policy change


> All,
> 
> Per NCBI's recent change in eutils user policy (effective June 1):
> 
> http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
> 
> Both the tool and email parameters ('-tool', '-email') are now required
> when making requests.  Note this will significantly break all modules
> requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
> and Taxonomy stuff as well, IIRC).  This also applies to web services
> (SOAP-based access).  Mark, not sure how this affects your SOAP-based
> modules.
> 
> I have reconfigured Bio::DB::EUtilities to follow this policy; the
> default tool setting has been 'bioperl' and will remain that way.
> However, there has been no default email, therefore setting this is now
> required for future requests unless we (the bioperl devs) decide there
> is a safe default email to utilize.  My gut tells me, however, that
> falling back to a default email opens up a can of worms for the devs and
> is very likely a 'BAD IDEA'(TM).  
> 
> Regardless, be aware that, after June 1, NCBI will very likely exclude
> requests with no email and will notify users who are considered to be
> violating their policies.
> 
> I will likely make further changes to Bio::DB::EUtilities in the
> meantime to ensure that using the tools by default will not violate
> NCBI's policy (e.g. override this at your own risk).  
> 
> chris
> 
> 
>


From cjfields at illinois.edu  Thu Jan 28 19:18:22 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:18:22 -0600
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <8F49B5ED151143FA86E977B4D4F44265@NewLife>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu>
	<8F49B5ED151143FA86E977B4D4F44265@NewLife>
Message-ID: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu>

I think warning is fine for now.  I've reimplemented that so it occurs
lazily (warns only when a request is actually made).

Will also change the tool to 'BioPerl' (currently 'bioperl', all lc).
We'll obviously have to address this in the test suite as well in some
way, maybe ask for an email if network tests are requested.

chris 

On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote:
> Thanks Chris-- 
> The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
> I agree that a default email is a bad idea (tm) (unless maybe it's 
> hilmar's...?). I'd say a warning on unset email parameters is a responsible
> "there be dragons" sort of treatment.
> MAJ
> ----- Original Message ----- 
> From: "Chris Fields" <cjfields at illinois.edu>
> To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
> Cc: "Mark A. Jensen" <maj at fortinbras.us>
> Sent: Thursday, January 28, 2010 2:00 PM
> Subject: EUtilities policy change
> 
> 
> > All,
> > 
> > Per NCBI's recent change in eutils user policy (effective June 1):
> > 
> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
> > 
> > Both the tool and email parameters ('-tool', '-email') are now required
> > when making requests.  Note this will significantly break all modules
> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
> > and Taxonomy stuff as well, IIRC).  This also applies to web services
> > (SOAP-based access).  Mark, not sure how this affects your SOAP-based
> > modules.
> > 
> > I have reconfigured Bio::DB::EUtilities to follow this policy; the
> > default tool setting has been 'bioperl' and will remain that way.
> > However, there has been no default email, therefore setting this is now
> > required for future requests unless we (the bioperl devs) decide there
> > is a safe default email to utilize.  My gut tells me, however, that
> > falling back to a default email opens up a can of worms for the devs and
> > is very likely a 'BAD IDEA'(TM).  
> > 
> > Regardless, be aware that, after June 1, NCBI will very likely exclude
> > requests with no email and will notify users who are considered to be
> > violating their policies.
> > 
> > I will likely make further changes to Bio::DB::EUtilities in the
> > meantime to ensure that using the tools by default will not violate
> > NCBI's policy (e.g. override this at your own risk).  
> > 
> > chris
> > 
> > 
> >


From Russell.Smithies at agresearch.co.nz  Thu Jan 28 19:25:38 2010
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 29 Jan 2010 08:25:38 +1300
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
	<1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
Message-ID: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>

Yes, I usually set the 'tool' and 'email' parameters.
I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well...

--Russell

> -----Original Message-----
> From: Chris Fields [mailto:cjfields at illinois.edu]
> Sent: Friday, 29 January 2010 7:26 a.m.
> To: Smithies, Russell
> Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> number?
> 
> Russell,
> 
> Just curious, but have you tried setting the return email parameter
> (-email)?  NCBI recently stated that all queries would eventually
> require a return email of some sort (not sure if it's validated or not).
> I think that was set for around late spring.  I'm changing the code in
> svn to require it for that very purpose.
> 
> chris
> 
> 
>  Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi
> still works if you don't mind a bit of manual button clicking. It's
> handling chunks of 100,000 records OK (today).
> >
> > --Russell
> >
> > > -----Original Message-----
> > > From: Chris Fields [mailto:cjfields at illinois.edu]
> > > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > > To: Smithies, Russell
> > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > > number?
> > >
> > > Makes me wonder if they're pushing more users towards the SOAP-based
> > > services and away from eutils.
> > >
> > > chris
> > >
> > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > >
> > > > I've had a wide selection of errors lately:
> > > >
> > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11
> (Resource
> > > temporarily unavailable)
> > > > STACK: Error::throw
> > > > STACK: Bio::Root::Root::throw
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > STACK: Bio::Tools::EUtilities::parse_data
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > STACK: Bio::Tools::EUtilities::get_ids
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > STACK: Bio::DB::EUtilities::get_ids
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > STACK: get_desc.pl:32
> > > > -----------------------------------------------------------
> > > >
> > > > And I never get a good explanation from NCBI or suggestions on how
> to
> > > avoid it.
> > > >
> > > >
> > > > --Russell
> > > >
> > > >
> > > >> -----Original Message-----
> > > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > > >> To: Smithies, Russell
> > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> > > >> number?
> > > >>
> > > >> It's unfortunate but I have heard this problem popping up quite a
> bit
> > > more
> > > >> frequently lately.  Not to push too many buttons but NCBI isn't
> very
> > > >> forthcoming with help these days; they have become quite insular.
> Not
> > > >> sure if they're short-staffed due to budget or if there are other
> > > issues.
> > > >>
> > > >> chris
> > > >>
> > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > > >>
> > > >>> Grrrrrr, I hate eutils!!!!
> > > >>>
> > > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > > >> (Connection refused)
> > > >>> STACK: Error::throw
> > > >>> STACK: Bio::Root::Root::throw
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > >>> STACK: Bio::Tools::EUtilities::parse_data
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > >>> STACK: Bio::Tools::EUtilities::get_ids
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > >>> STACK: Bio::DB::EUtilities::get_ids
> > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > >>> STACK: get_desc.pl:32
> > > >>> -----------------------------------------------------------
> > > >>>
> > > >>>
> > > >>> Nice error message though :-)
> > > >>>
> > > >>>
> > > >>> --Russell
> > > >>>
> > > >>>> -----Original Message-----
> > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > > >>>> To: 'Chris Fields'
> > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > > bio.org'
> > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> accession
> > > >>>> number?
> > > >>>>
> > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as
> I've
> > > >> often
> > > >>>> been finding that with large queries, chunks of the resulting
> data is
> > > >>>> missing.
> > > >>>> For example, before Xmas I was creating species-specific
> databases by
> > > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > > >> retrieving
> > > >>>> the fasta sequences in chunks of 500.
> > > >>>> Very regularly, in the middle of the fasta there would be a
> message
> > > >> about
> > > >>>> resource unavailable eg.
> > > >>>>> test_sequence_1
> > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > > >>>>> test_sequence_2
> > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > > >>>>
> > > >>>> Often this wasn't detected until formatdb complained about
> invalid
> > > >>>> characters.
> > > >>>> Inquiries to NCBI as to why this was happening and what to do
> about
> > > it
> > > >>>> returned stupid answers ("do each sequence manually thru the web
> > > >>>> interface", or "use eUtils").
> > > >>>> As we have a nice fast network connection, I now prefer to
> download
> > > >> very
> > > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > > >>>>
> > > >>>> I can't help but think that NCBI could solve a lot of problems if
> > > they
> > > >>>> gzipped the output from eUtils queries - it's something I've
> > > requested
> > > >>>> regularly for the last 5 years or so!!
> > > >>>>
> > > >>>> --Russell
> > > >>>>
> > > >>>>
> > > >>>>> -----Original Message-----
> > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > > >>>>> To: Smithies, Russell
> > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > > bio.org'
> > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > accession
> > > >>>>> number?
> > > >>>>>
> > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same
> files
> > > or
> > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD
> for
> > > >> the
> > > >>>>> details).
> > > >>>>>
> > > >>>>> chris
> > > >>>>>
> > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > > >>>>>
> > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > > >>>> flakiness
> > > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > > >> gi_taxid_prot.zip
> > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into
> a
> > > hash
> > > >>>> and
> > > >>>>> do lookups.
> > > >>>>>> In that same dir, taxdump.tar.gz contains a file called
> names.dmp
> > > >>>> which
> > > >>>>> lists taxids and descriptions (and synonyms)
> > > >>>>>>
> > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes
> so
> > > I
> > > >>>>> could do this:
> > > >>>>>>
> > > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > > >>>>>> my $org_name = $names{$taxid};
> > > >>>>>>
> > > >>>>>> --Russell
> > > >>>>>>
> > > >>>>>>
> > > >>>>>>> -----Original Message-----
> > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > >> accession
> > > >>>>>>> number?
> > > >>>>>>>
> > > >>>>>>> Bhakti,
> > > >>>>>>> The following example (using EUtilities) may serve your
> purpose:
> > > >>>>>>>
> > > >>>>>>> use Bio::DB::EUtilities;
> > > >>>>>>>
> > > >>>>>>> my (%taxa, @taxa);
> > > >>>>>>> my (%names, %idmap);
> > > >>>>>>>
> > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom
> =>
> > > >>>>>>> 'nucleotide',
> > > >>>>>>> # (probably)
> > > >>>>>>>
> > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > > >>>>>>>
> > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > > >>>>>>>                                     -db => 'taxonomy',
> > > >>>>>>>                                     -dbfrom => 'protein',
> > > >>>>>>>                                     -correspondence => 1,
> > > >>>>>>>                                     -id => \@ids);
> > > >>>>>>>
> > > >>>>>>> # iterate through the LinkSet objects
> > > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> @taxa = @taxa{@ids};
> > > >>>>>>>
> > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > > >>>>>>>      -db    => 'taxonomy',
> > > >>>>>>>      -id    => \@taxa );
> > > >>>>>>>
> > > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> foreach (@ids) {
> > > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > > >>>>>>> }
> > > >>>>>>>
> > > >>>>>>> # %idmap is
> > > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > > >>>>>>> #    89318838 => undef    (this record has been removed from
> the
> > > db)
> > > >>>>>>>
> > > >>>>>>> 1;
> > > >>>>>>>
> > > >>>>>>> You probably will need to break up your 30000 into chunks
> > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > > >>>>>>>
> > > >>>>>>> sleep 3;
> > > >>>>>>>
> > > >>>>>>> or so separating the queries.
> > > >>>>>>> MAJ
> > > >>>>>>> ----- Original Message -----
> > > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from
> accession
> > > >>>>> number?
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>>> Hi,
> > > >>>>>>>>
> > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > > name"
> > > >>>>>>> given
> > > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > > accession
> > > >>>>>>> numbers
> > > >>>>>>>> for which I need to get the source organisms.  Any kind of
> help
> > > >> will
> > > >>>>> be
> > > >>>>>>>> appreciated.
> > > >>>>>>>>
> > > >>>>>>>> Thanks
> > > >>>>>>>>
> > > >>>>>>>> BD
> > > >>>>>>>> _______________________________________________
> > > >>>>>>>> Bioperl-l mailing list
> > > >>>>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>> _______________________________________________
> > > >>>>>>> Bioperl-l mailing list
> > > >>>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>>>
> > > >>>>
> > >
> =======================================================================
> > > >>>>>> Attention: The information contained in this message and/or
> > > >>>> attachments
> > > >>>>>> from AgResearch Limited is intended only for the persons or
> > > entities
> > > >>>>>> to which it is addressed and may contain confidential and/or
> > > >>>> privileged
> > > >>>>>> material. Any review, retransmission, dissemination or other
> use
> > > of,
> > > >>>> or
> > > >>>>>> taking of any action in reliance upon, this information by
> persons
> > > or
> > > >>>>>> entities other than the intended recipients is prohibited by
> > > >>>> AgResearch
> > > >>>>>> Limited. If you have received this message in error, please
> notify
> > > >> the
> > > >>>>>> sender immediately.
> > > >>>>>>
> > > >>>>
> > >
> =======================================================================
> > > >>>>>>
> > > >>>>>> _______________________________________________
> > > >>>>>> Bioperl-l mailing list
> > > >>>>>> Bioperl-l at lists.open-bio.org
> > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >>>>
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> Bioperl-l mailing list
> > > >>>> Bioperl-l at lists.open-bio.org
> > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > >
> > > >
> > > > _______________________________________________
> > > > Bioperl-l mailing list
> > > > Bioperl-l at lists.open-bio.org
> > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


From cjfields at illinois.edu  Thu Jan 28 19:30:12 2010
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 28 Jan 2010 13:30:12 -0600
Subject: [Bioperl-l] how to retrieve organism name from accession number?
In-Reply-To: <18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>
References: <b643abd20912251846n2aef6b91jd3ce41f8bf1b2a8d@mail.gmail.com>
	<1CE23DE1068C4FA2BD543D167A1AA901@NewLife>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E2D@exchsth.agresearch.co.nz>
	<F88D394E-3521-478D-8C96-2D29070EBDFD@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C613B7E64@exchsth.agresearch.co.nz>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC95F4@exchsth.agresearch.co.nz>
	<4EDF3781-7694-4A81-97E8-E953D6DC07E5@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC961E@exchsth.agresearch.co.nz>
	<5FF9B845-2B6A-4B8C-B8AE-A12E0CBCA2D6@illinois.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC964C@exchsth.agresearch.co.nz>
	<1264703187.5473.10.camel@cjfields.igb.uiuc.edu>
	<18DF7D20DFEC044098A1062202F5FFF32C61AC9AF3@exchsth.agresearch.co.nz>
Message-ID: <1264707012.5473.51.camel@cjfields.igb.uiuc.edu>

Russell,

Okay, just wanted to make sure.  The email/tool requirements weren't
actually enforced up until now, which is forcing us to do a bit of
re-work on the various tools that don't have it set by default (at least
warn users unaware of it).  

And I agree, gzipped archives would be nice!

chris

On Fri, 2010-01-29 at 08:25 +1300, Smithies, Russell wrote:
> Yes, I usually set the 'tool' and 'email' parameters.
> I went to NCBI back in 2006 and did their "PowerScripting" course where they pointed out a lot of the requirements for using eUtils. I think I requested results returned gzipped back then as well...
> 
> --Russell
> 
> > -----Original Message-----
> > From: Chris Fields [mailto:cjfields at illinois.edu]
> > Sent: Friday, 29 January 2010 7:26 a.m.
> > To: Smithies, Russell
> > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > number?
> > 
> > Russell,
> > 
> > Just curious, but have you tried setting the return email parameter
> > (-email)?  NCBI recently stated that all queries would eventually
> > require a return email of some sort (not sure if it's validated or not).
> > I think that was set for around late spring.  I'm changing the code in
> > svn to require it for that very purpose.
> > 
> > chris
> > 
> > 
> >  Wed, 2010-01-27 at 15:45 +1300, Smithies, Russell wrote:
> > > Batch-entrez http://www.ncbi.nlm.nih.gov/portal/utils/batchentrez_p.cgi
> > still works if you don't mind a bit of manual button clicking. It's
> > handling chunks of 100,000 records OK (today).
> > >
> > > --Russell
> > >
> > > > -----Original Message-----
> > > > From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > Sent: Wednesday, 27 January 2010 3:42 p.m.
> > > > To: Smithies, Russell
> > > > Cc: 'bioperl-l at lists.open-bio.org'; 'Mark A. Jensen'
> > > > Subject: Re: [Bioperl-l] how to retrieve organism name from accession
> > > > number?
> > > >
> > > > Makes me wonder if they're pushing more users towards the SOAP-based
> > > > services and away from eutils.
> > > >
> > > > chris
> > > >
> > > > On Jan 26, 2010, at 7:59 PM, Smithies, Russell wrote:
> > > >
> > > > > I've had a wide selection of errors lately:
> > > > >
> > > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > > MSG: NCBI esearch fatal error: Search Backend failed: Error 11
> > (Resource
> > > > temporarily unavailable)
> > > > > STACK: Error::throw
> > > > > STACK: Bio::Root::Root::throw
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > > STACK: Bio::Tools::EUtilities::parse_data
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > > STACK: Bio::Tools::EUtilities::get_ids
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > > STACK: Bio::DB::EUtilities::get_ids
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > > STACK: get_desc.pl:32
> > > > > -----------------------------------------------------------
> > > > >
> > > > > And I never get a good explanation from NCBI or suggestions on how
> > to
> > > > avoid it.
> > > > >
> > > > >
> > > > > --Russell
> > > > >
> > > > >
> > > > >> -----Original Message-----
> > > > >> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > >> Sent: Wednesday, 27 January 2010 2:46 p.m.
> > > > >> To: Smithies, Russell
> > > > >> Cc: 'Mark A. Jensen'; 'bioperl-l at lists.open-bio.org'
> > > > >> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >> number?
> > > > >>
> > > > >> It's unfortunate but I have heard this problem popping up quite a
> > bit
> > > > more
> > > > >> frequently lately.  Not to push too many buttons but NCBI isn't
> > very
> > > > >> forthcoming with help these days; they have become quite insular.
> > Not
> > > > >> sure if they're short-staffed due to budget or if there are other
> > > > issues.
> > > > >>
> > > > >> chris
> > > > >>
> > > > >> On Jan 26, 2010, at 7:40 PM, Smithies, Russell wrote:
> > > > >>
> > > > >>> Grrrrrr, I hate eutils!!!!
> > > > >>>
> > > > >>> ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > >>> MSG: NCBI esearch fatal error: Search Backend failed: Error 111
> > > > >> (Connection refused)
> > > > >>> STACK: Error::throw
> > > > >>> STACK: Bio::Root::Root::throw
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > >>> STACK: Bio::Tools::EUtilities::parse_data
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:332
> > > > >>> STACK: Bio::Tools::EUtilities::get_ids
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/Tools/EUtilities.pm:441
> > > > >>> STACK: Bio::DB::EUtilities::get_ids
> > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/DB/EUtilities.pm:363
> > > > >>> STACK: get_desc.pl:32
> > > > >>> -----------------------------------------------------------
> > > > >>>
> > > > >>>
> > > > >>> Nice error message though :-)
> > > > >>>
> > > > >>>
> > > > >>> --Russell
> > > > >>>
> > > > >>>> -----Original Message-----
> > > > >>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >>>> bounces at lists.open-bio.org] On Behalf Of Smithies, Russell
> > > > >>>> Sent: Monday, 11 January 2010 10:05 a.m.
> > > > >>>> To: 'Chris Fields'
> > > > >>>> Cc: 'Bhakti Dwivedi'; 'Mark A. Jensen'; 'bioperl-l at lists.open-
> > > > bio.org'
> > > > >>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >>>> number?
> > > > >>>>
> > > > >>>> I've started to go off eUtils recently (not BioPerl's fault) as
> > I've
> > > > >> often
> > > > >>>> been finding that with large queries, chunks of the resulting
> > data is
> > > > >>>> missing.
> > > > >>>> For example, before Xmas I was creating species-specific
> > databases by
> > > > >>>> using eUtils to get a list of GI numbers back for a taxid, then
> > > > >> retrieving
> > > > >>>> the fasta sequences in chunks of 500.
> > > > >>>> Very regularly, in the middle of the fasta there would be a
> > message
> > > > >> about
> > > > >>>> resource unavailable eg.
> > > > >>>>> test_sequence_1
> > > > >>>> TACGATCATCGCTResource UnavailableTACGACTCTGCT
> > > > >>>>> test_sequence_2
> > > > >>>> TACGTACTACGATCGATCATCACTATCGTCATACTACTACTGACT
> > > > >>>>
> > > > >>>> Often this wasn't detected until formatdb complained about
> > invalid
> > > > >>>> characters.
> > > > >>>> Inquiries to NCBI as to why this was happening and what to do
> > about
> > > > it
> > > > >>>> returned stupid answers ("do each sequence manually thru the web
> > > > >>>> interface", or "use eUtils").
> > > > >>>> As we have a nice fast network connection, I now prefer to
> > download
> > > > >> very
> > > > >>>> large gzip files (i.e. all of refseq) and extract what I need.
> > > > >>>>
> > > > >>>> I can't help but think that NCBI could solve a lot of problems if
> > > > they
> > > > >>>> gzipped the output from eUtils queries - it's something I've
> > > > requested
> > > > >>>> regularly for the last 5 years or so!!
> > > > >>>>
> > > > >>>> --Russell
> > > > >>>>
> > > > >>>>
> > > > >>>>> -----Original Message-----
> > > > >>>>> From: Chris Fields [mailto:cjfields at illinois.edu]
> > > > >>>>> Sent: Monday, 11 January 2010 9:50 a.m.
> > > > >>>>> To: Smithies, Russell
> > > > >>>>> Cc: 'Mark A. Jensen'; 'Bhakti Dwivedi'; 'bioperl-l at lists.open-
> > > > bio.org'
> > > > >>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > > accession
> > > > >>>>> number?
> > > > >>>>>
> > > > >>>>> One could also use Bio::DB::Taxonomy, which indexes the same
> > files
> > > > or
> > > > >>>>> (alternatively) makes the eutil calls (see Bio::DB::Taxonomy POD
> > for
> > > > >> the
> > > > >>>>> details).
> > > > >>>>>
> > > > >>>>> chris
> > > > >>>>>
> > > > >>>>> On Jan 10, 2010, at 2:34 PM, Smithies, Russell wrote:
> > > > >>>>>
> > > > >>>>>> An alternate non-BioPerly way (that may be faster given NCBI's
> > > > >>>> flakiness
> > > > >>>>> lately) would be to download the gi_taxid_nucl.zip or
> > > > >> gi_taxid_prot.zip
> > > > >>>>> files from ftp://ftp.ncbi.nih.gov/pub/taxonomy/, load them into
> > a
> > > > hash
> > > > >>>> and
> > > > >>>>> do lookups.
> > > > >>>>>> In that same dir, taxdump.tar.gz contains a file called
> > names.dmp
> > > > >>>> which
> > > > >>>>> lists taxids and descriptions (and synonyms)
> > > > >>>>>>
> > > > >>>>>> If it was me, I'd split gi_taxid_nucl and names.dmp into hashes
> > so
> > > > I
> > > > >>>>> could do this:
> > > > >>>>>>
> > > > >>>>>> my $taxid  = $gi_taxid_nucl{$accession};
> > > > >>>>>> my $org_name = $names{$taxid};
> > > > >>>>>>
> > > > >>>>>> --Russell
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>> -----Original Message-----
> > > > >>>>>>> From: bioperl-l-bounces at lists.open-bio.org [mailto:bioperl-l-
> > > > >>>>>>> bounces at lists.open-bio.org] On Behalf Of Mark A. Jensen
> > > > >>>>>>> Sent: Saturday, 26 December 2009 4:52 p.m.
> > > > >>>>>>> To: Bhakti Dwivedi; bioperl-l at lists.open-bio.org
> > > > >>>>>>> Subject: Re: [Bioperl-l] how to retrieve organism name from
> > > > >> accession
> > > > >>>>>>> number?
> > > > >>>>>>>
> > > > >>>>>>> Bhakti,
> > > > >>>>>>> The following example (using EUtilities) may serve your
> > purpose:
> > > > >>>>>>>
> > > > >>>>>>> use Bio::DB::EUtilities;
> > > > >>>>>>>
> > > > >>>>>>> my (%taxa, @taxa);
> > > > >>>>>>> my (%names, %idmap);
> > > > >>>>>>>
> > > > >>>>>>> # these are protein ids; nuc ids will work by changing -dbfrom
> > =>
> > > > >>>>>>> 'nucleotide',
> > > > >>>>>>> # (probably)
> > > > >>>>>>>
> > > > >>>>>>> my @ids = qw(1621261 89318838 68536103 20807972 730439);
> > > > >>>>>>>
> > > > >>>>>>> my $factory = Bio::DB::EUtilities->new(-eutil => 'elink',
> > > > >>>>>>>                                     -db => 'taxonomy',
> > > > >>>>>>>                                     -dbfrom => 'protein',
> > > > >>>>>>>                                     -correspondence => 1,
> > > > >>>>>>>                                     -id => \@ids);
> > > > >>>>>>>
> > > > >>>>>>> # iterate through the LinkSet objects
> > > > >>>>>>> while (my $ds = $factory->next_LinkSet) {
> > > > >>>>>>>  $taxa{($ds->get_submitted_ids)[0]} = ($ds->get_ids)[0]
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> @taxa = @taxa{@ids};
> > > > >>>>>>>
> > > > >>>>>>> $factory = Bio::DB::EUtilities->new(-eutil => 'esummary',
> > > > >>>>>>>      -db    => 'taxonomy',
> > > > >>>>>>>      -id    => \@taxa );
> > > > >>>>>>>
> > > > >>>>>>> while (local $_ = $factory->next_DocSum) {
> > > > >>>>>>>  $names{($_->get_contents_by_name('TaxId'))[0]} =
> > > > >>>>>>> ($_->get_contents_by_name('ScientificName'))[0];
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> foreach (@ids) {
> > > > >>>>>>>  $idmap{$_} = $names{$taxa{$_}};
> > > > >>>>>>> }
> > > > >>>>>>>
> > > > >>>>>>> # %idmap is
> > > > >>>>>>> #    1621261 => 'Mycobacterium tuberculosis H37Rv'
> > > > >>>>>>> #    20807972 => 'Thermoanaerobacter tengcongensis MB4'
> > > > >>>>>>> #    68536103 => 'Corynebacterium jeikeium K411'
> > > > >>>>>>> #    730439 => 'Bacillus caldolyticus'
> > > > >>>>>>> #    89318838 => undef    (this record has been removed from
> > the
> > > > db)
> > > > >>>>>>>
> > > > >>>>>>> 1;
> > > > >>>>>>>
> > > > >>>>>>> You probably will need to break up your 30000 into chunks
> > > > >>>>>>> (say, 1000-3000 each), and do the above on each chunk with a
> > > > >>>>>>>
> > > > >>>>>>> sleep 3;
> > > > >>>>>>>
> > > > >>>>>>> or so separating the queries.
> > > > >>>>>>> MAJ
> > > > >>>>>>> ----- Original Message -----
> > > > >>>>>>> From: "Bhakti Dwivedi" <bhakti.dwivedi at gmail.com>
> > > > >>>>>>> To: <bioperl-l at lists.open-bio.org>
> > > > >>>>>>> Sent: Friday, December 25, 2009 9:46 PM
> > > > >>>>>>> Subject: [Bioperl-l] how to retrieve organism name from
> > accession
> > > > >>>>> number?
> > > > >>>>>>>
> > > > >>>>>>>
> > > > >>>>>>>> Hi,
> > > > >>>>>>>>
> > > > >>>>>>>> Does anyone know how to retrieve the "Source" or the "Species
> > > > name"
> > > > >>>>>>> given
> > > > >>>>>>>> the accession number using Bioperl.   I have these 30,000
> > > > accession
> > > > >>>>>>> numbers
> > > > >>>>>>>> for which I need to get the source organisms.  Any kind of
> > help
> > > > >> will
> > > > >>>>> be
> > > > >>>>>>>> appreciated.
> > > > >>>>>>>>
> > > > >>>>>>>> Thanks
> > > > >>>>>>>>
> > > > >>>>>>>> BD
> > > > >>>>>>>> _______________________________________________
> > > > >>>>>>>> Bioperl-l mailing list
> > > > >>>>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>
> > > > >>>>>>> _______________________________________________
> > > > >>>>>>> Bioperl-l mailing list
> > > > >>>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>>>
> > > > >>>>
> > > >
> > =======================================================================
> > > > >>>>>> Attention: The information contained in this message and/or
> > > > >>>> attachments
> > > > >>>>>> from AgResearch Limited is intended only for the persons or
> > > > entities
> > > > >>>>>> to which it is addressed and may contain confidential and/or
> > > > >>>> privileged
> > > > >>>>>> material. Any review, retransmission, dissemination or other
> > use
> > > > of,
> > > > >>>> or
> > > > >>>>>> taking of any action in reliance upon, this information by
> > persons
> > > > or
> > > > >>>>>> entities other than the intended recipients is prohibited by
> > > > >>>> AgResearch
> > > > >>>>>> Limited. If you have received this message in error, please
> > notify
> > > > >> the
> > > > >>>>>> sender immediately.
> > > > >>>>>>
> > > > >>>>
> > > >
> > =======================================================================
> > > > >>>>>>
> > > > >>>>>> _______________________________________________
> > > > >>>>>> Bioperl-l mailing list
> > > > >>>>>> Bioperl-l at lists.open-bio.org
> > > > >>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >>>>
> > > > >>>>
> > > > >>>> _______________________________________________
> > > > >>>> Bioperl-l mailing list
> > > > >>>> Bioperl-l at lists.open-bio.org
> > > > >>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Bioperl-l mailing list
> > > > > Bioperl-l at lists.open-bio.org
> > > > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > >
> > >
> > > _______________________________________________
> > > Bioperl-l mailing list
> > > Bioperl-l at lists.open-bio.org
> > > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> > 
> 


From maj at fortinbras.us  Thu Jan 28 19:55:31 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Thu, 28 Jan 2010 14:55:31 -0500
Subject: [Bioperl-l] EUtilities policy change
In-Reply-To: <1264706302.5473.48.camel@cjfields.igb.uiuc.edu>
References: <1264705226.5473.35.camel@cjfields.igb.uiuc.edu><8F49B5ED151143FA86E977B4D4F44265@NewLife>
	<1264706302.5473.48.camel@cjfields.igb.uiuc.edu>
Message-ID: <CD70565A9D3F44C4A0D7BA6462E021E0@NewLife>

Ok, SoapEU now warns on no email; passes email onto the fetch stage
during autofetch -- cheers MAJ
----- Original Message ----- 
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl-l" <bioperl-l at lists.open-bio.org>
Sent: Thursday, January 28, 2010 2:18 PM
Subject: Re: [Bioperl-l] EUtilities policy change


>I think warning is fine for now.  I've reimplemented that so it occurs
> lazily (warns only when a request is actually made).
> 
> Will also change the tool to 'BioPerl' (currently 'bioperl', all lc).
> We'll obviously have to address this in the test suite as well in some
> way, maybe ask for an email if network tests are requested.
> 
> chris 
> 
> On Thu, 2010-01-28 at 14:05 -0500, Mark A. Jensen wrote:
>> Thanks Chris-- 
>> The soap modules currently set tool to "SoapEUtilities(BioPerl)". 
>> I agree that a default email is a bad idea (tm) (unless maybe it's 
>> hilmar's...?). I'd say a warning on unset email parameters is a responsible
>> "there be dragons" sort of treatment.
>> MAJ
>> ----- Original Message ----- 
>> From: "Chris Fields" <cjfields at illinois.edu>
>> To: "BioPerl-l" <bioperl-l at lists.open-bio.org>
>> Cc: "Mark A. Jensen" <maj at fortinbras.us>
>> Sent: Thursday, January 28, 2010 2:00 PM
>> Subject: EUtilities policy change
>> 
>> 
>> > All,
>> > 
>> > Per NCBI's recent change in eutils user policy (effective June 1):
>> > 
>> > http://bioperl.org/pipermail/bioperl-l/2009-December/031698.html
>> > 
>> > Both the tool and email parameters ('-tool', '-email') are now required
>> > when making requests.  Note this will significantly break all modules
>> > requiring remote access to eutils (Bio::DB::GenBank/GenPept, some Biblio
>> > and Taxonomy stuff as well, IIRC).  This also applies to web services
>> > (SOAP-based access).  Mark, not sure how this affects your SOAP-based
>> > modules.
>> > 
>> > I have reconfigured Bio::DB::EUtilities to follow this policy; the
>> > default tool setting has been 'bioperl' and will remain that way.
>> > However, there has been no default email, therefore setting this is now
>> > required for future requests unless we (the bioperl devs) decide there
>> > is a safe default email to utilize.  My gut tells me, however, that
>> > falling back to a default email opens up a can of worms for the devs and
>> > is very likely a 'BAD IDEA'(TM).  
>> > 
>> > Regardless, be aware that, after June 1, NCBI will very likely exclude
>> > requests with no email and will notify users who are considered to be
>> > violating their policies.
>> > 
>> > I will likely make further changes to Bio::DB::EUtilities in the
>> > meantime to ensure that using the tools by default will not violate
>> > NCBI's policy (e.g. override this at your own risk).  
>> > 
>> > chris
>> > 
>> > 
>> >
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
>


From chapmanb at 50mail.com  Thu Jan 28 20:35:05 2010
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 28 Jan 2010 15:35:05 -0500
Subject: [Bioperl-l] OpenBio solution challenge: Project updates at BOSC 2010
Message-ID: <20100128203505.GG40046@sobchak.mgh.harvard.edu>

Hello all;
The BOSC 2010 organizing committee is hard at work getting prepared for this
July's meeting in Boston:

http://www.open-bio.org/wiki/BOSC_2010

One of the items we've traditionally had at the conference is a project 
update from each of the OpenBio affiliated groups. This year, we're thinking
about organizing these talks around a central theme: the OpenBio solution
challenge. We start with a biological question of general interest, and each
of the project talks would focus around how you would solve that problem 
using your toolkit and programming language.

This is meant to provide a challenge for OpenBio contributors, a nice tutorial
style overview of various projects and approaches for other programmers, and a
fun opportunity to compete and learn from other projects. Conference attendees
will vote on their favorite solution, with the winner receiving fame and
fortune (warning: fortune not guaranteed).

For this to be successful, it of course requires interest and enthusiasm from
y'all fine folks involved with the projects. Specifically:

- Is there interest from your group in participating in the challenge? You'll
  want at least a few people to work on it, and someone to give a presentation 
  at BOSC.

- Do you have suggestions on a good theme or specific biological problem to
  tackle? We'll hope to pick something in a sweet spot that is challenging 
  enough to be of interest, yet reasonable for presentation and preparation.

Let's discuss ideas and get this together. Since the schedule for BOSC is
developing rapidly, please give us an idea if you're interested by
February 12th, and copy responses to the BOSC mailing list as a central 
place for discussion.

bosc at open-bio.org

Thanks,
Brad, Michael, and the BOSC organizing committee


From markw at illuminae.com  Thu Jan 28 21:17:44 2010
From: markw at illuminae.com (Mark Wilkinson)
Date: Thu, 28 Jan 2010 13:17:44 -0800
Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project
 updates at BOSC 2010
In-Reply-To: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
Message-ID: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>


Brad, this sounds exciting!

One thing strikes me, though - by asking for the sub-projects to propose
the "grand challenge" themselves the one thing you can guarantee is that
the "grand challenge" is solvable (or more likely, already solved!)

Other "grand challenge" kinds of meetings have an independent third party
pose the problem that has to be solved, and then all groups work toward a
solution and compare their results.  This would, IMO, be more revealing of
the "state of the art" in each Open-Bio project, and point out where the
weaknesses are that we should be focusing on...  Someone (for example,
you!) could act as the moderator to ensure that the "grand challenge" was
at least a reasonable one, within the scope of what an Open-Bio project
*should* be able to solve...

Just my CAD $0.02

Mark


On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman <chapmanb at 50mail.com>  
wrote:

> Hello all;
> The BOSC 2010 organizing committee is hard at work getting prepared for  
> this
> July's meeting in Boston:
>
> http://www.open-bio.org/wiki/BOSC_2010
>
> One of the items we've traditionally had at the conference is a project
> update from each of the OpenBio affiliated groups. This year, we're  
> thinking
> about organizing these talks around a central theme: the OpenBio solution
> challenge. We start with a biological question of general interest, and  
> each
> of the project talks would focus around how you would solve that problem
> using your toolkit and programming language.
>
> This is meant to provide a challenge for OpenBio contributors, a nice  
> tutorial
> style overview of various projects and approaches for other programmers,  
> and a
> fun opportunity to compete and learn from other projects. Conference  
> attendees
> will vote on their favorite solution, with the winner receiving fame and
> fortune (warning: fortune not guaranteed).
>
> For this to be successful, it of course requires interest and enthusiasm  
> from
> y'all fine folks involved with the projects. Specifically:
>
> - Is there interest from your group in participating in the challenge?  
> You'll
>   want at least a few people to work on it, and someone to give a  
> presentation
>   at BOSC.
>
> - Do you have suggestions on a good theme or specific biological problem  
> to
>   tackle? We'll hope to pick something in a sweet spot that is  
> challenging
>   enough to be of interest, yet reasonable for presentation and  
> preparation.
>
> Let's discuss ideas and get this together. Since the schedule for BOSC is
> developing rapidly, please give us an idea if you're interested by
> February 12th, and copy responses to the BOSC mailing list as a central
> place for discussion.
>
> bosc at open-bio.org
>
> Thanks,
> Brad, Michael, and the BOSC organizing committee
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev


-- 
Mark D Wilkinson, PI Bioinformatics
Assistant Professor, Medical Genetics
The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research
Providence Heart + Lung Institute
University of British Columbia - St. Paul's Hospital
Vancouver, BC, Canada


From HWillis at scripps.edu  Fri Jan 29 01:03:10 2010
From: HWillis at scripps.edu (Scooter Willis)
Date: Thu, 28 Jan 2010 20:03:10 -0500
Subject: [Bioperl-l] [Biojava-dev] [MOBY-dev] OpenBio solution
 challenge: Project updates at BOSC 2010
In-Reply-To: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
	<op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
Message-ID: <716E205A-5196-409F-A7BC-EF0F52AA997A@scripps.edu>

Brad

I agree with Mark that a particular problem may be biased towards a toolkit/language. Another approach would be to list a collection of problems and each group would then pick a problem to present. Could be a little more interesting to the audience as you are exposed to different problems and the various strengths of each toolkit. This could also help guide future development in the other toolkits as you would benefit from learning about the api and/or programming language. Each group would register a problem that they are going to present. From the group of problems not picked that becomes the surprise challenge where each group has 24 hours to either put together a presentation or an actual solution.

Scooter


On Jan 28, 2010, at 4:17 PM, Mark Wilkinson wrote:

> 
> Brad, this sounds exciting!
> 
> One thing strikes me, though - by asking for the sub-projects to propose
> the "grand challenge" themselves the one thing you can guarantee is that
> the "grand challenge" is solvable (or more likely, already solved!)
> 
> Other "grand challenge" kinds of meetings have an independent third party
> pose the problem that has to be solved, and then all groups work toward a
> solution and compare their results.  This would, IMO, be more revealing of
> the "state of the art" in each Open-Bio project, and point out where the
> weaknesses are that we should be focusing on...  Someone (for example,
> you!) could act as the moderator to ensure that the "grand challenge" was
> at least a reasonable one, within the scope of what an Open-Bio project
> *should* be able to solve...
> 
> Just my CAD $0.02
> 
> Mark
> 
> 
> 
> On Thu, 28 Jan 2010 12:35:05 -0800, Brad Chapman <chapmanb at 50mail.com>  
> wrote:
> 
>> Hello all;
>> The BOSC 2010 organizing committee is hard at work getting prepared for  
>> this
>> July's meeting in Boston:
>> 
>> http://www.open-bio.org/wiki/BOSC_2010
>> 
>> One of the items we've traditionally had at the conference is a project
>> update from each of the OpenBio affiliated groups. This year, we're  
>> thinking
>> about organizing these talks around a central theme: the OpenBio solution
>> challenge. We start with a biological question of general interest, and  
>> each
>> of the project talks would focus around how you would solve that problem
>> using your toolkit and programming language.
>> 
>> This is meant to provide a challenge for OpenBio contributors, a nice  
>> tutorial
>> style overview of various projects and approaches for other programmers,  
>> and a
>> fun opportunity to compete and learn from other projects. Conference  
>> attendees
>> will vote on their favorite solution, with the winner receiving fame and
>> fortune (warning: fortune not guaranteed).
>> 
>> For this to be successful, it of course requires interest and enthusiasm  
>> from
>> y'all fine folks involved with the projects. Specifically:
>> 
>> - Is there interest from your group in participating in the challenge?  
>> You'll
>>  want at least a few people to work on it, and someone to give a  
>> presentation
>>  at BOSC.
>> 
>> - Do you have suggestions on a good theme or specific biological problem  
>> to
>>  tackle? We'll hope to pick something in a sweet spot that is  
>> challenging
>>  enough to be of interest, yet reasonable for presentation and  
>> preparation.
>> 
>> Let's discuss ideas and get this together. Since the schedule for BOSC is
>> developing rapidly, please give us an idea if you're interested by
>> February 12th, and copy responses to the BOSC mailing list as a central
>> place for discussion.
>> 
>> bosc at open-bio.org
>> 
>> Thanks,
>> Brad, Michael, and the BOSC organizing committee
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
> 
> 
> -- 
> Mark D Wilkinson, PI Bioinformatics
> Assistant Professor, Medical Genetics
> The James Hogg iCAPTURE Centre for Cardiovascular and Pulmonary Research
> Providence Heart + Lung Institute
> University of British Columbia - St. Paul's Hospital
> Vancouver, BC, Canada
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev


From biopython at maubp.freeserve.co.uk  Fri Jan 29 10:36:40 2010
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 29 Jan 2010 10:36:40 +0000
Subject: [Bioperl-l] [MOBY-dev] OpenBio solution challenge: Project
	updates at BOSC 2010
In-Reply-To: <op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
References: <20100128203505.GG40046@sobchak.mgh.harvard.edu>
	<op.u69hfujinbznux@dd0710001l.icapture.ubc.ca>
Message-ID: <320fb6e01001290236l1ad02515w403a19f94dbb6d15@mail.gmail.com>

Hi all,

This is a great topic but should be continue it on just the one mailing list?
Is there a suitable BOSC list, or how about the general Open Bio list?

On Thu, Jan 28, 2010 at 9:17 PM, Mark Wilkinson <markw at illuminae.com> wrote:
>
> Brad, this sounds exciting!
>
> One thing strikes me, though - by asking for the sub-projects to propose
> the "grand challenge" themselves the one thing you can guarantee is that
> the "grand challenge" is solvable (or more likely, already solved!)
>
> Other "grand challenge" kinds of meetings have an independent third party
> pose the problem that has to be solved, and then all groups work toward a
> solution and compare their results. ?This would, IMO, be more revealing of
> the "state of the art" in each Open-Bio project, and point out where the
> weaknesses are that we should be focusing on... ?Someone (for example,
> you!) could act as the moderator to ensure that the "grand challenge" was
> at least a reasonable one, within the scope of what an Open-Bio project
> *should* be able to solve...
>
> Just my CAD $0.02
>
> Mark

One possible problem with having Brad act as moderator is his ties to
Biopython (plus it would be a shame if we'd be one man down for trying
to solve the challenges - grin). Having a project representative "sign off"
on the challenge might work - or simply the whole of the BOSC committee
which is quite balanced. Alternatively some kind of panel of challenges does
seem a good way to reduce individual project bias (as suggest by Scooter),
but there will still need to be a judging committee.

I'm curious what kind of challenges the BOSC committee had in mind -
would something like taking a newly sequence bacteria and producing
an automated annotation as a GenBank, EMBL, or GFF  file be too
ambitious for example? There are already several major projects
to do this e.g. RAST http://rast.nmpdr.org/

Peter
(@Biopython)


From mike.stubbington at bbsrc.ac.uk  Fri Jan 29 13:25:25 2010
From: mike.stubbington at bbsrc.ac.uk (mike stubbington (BI))
Date: Fri, 29 Jan 2010 13:25:25 +0000
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
 error running blastn
In-Reply-To: <FD6E9A89F6034CCB856E22553ED893D7@NewLife>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
	<FD6E9A89F6034CCB856E22553ED893D7@NewLife>
Message-ID: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>

Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running /usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file found for nucleotide database [MouseGenome] in search path [/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
	-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with 

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
> error running blastn
> 
> 
> Hi,
> 
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
> 
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, <DATA> 
> line 532.
> 
> STACK Bio::Tools::Run::WrapperBase::_run 
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
> 
> M
> 
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
> 
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek error
>> running blastn
>> 
>> 
>> Dear all,
>> 
>> I am attempting to blast some primers against the mouse genome. I have created 
>> a
>> local mouse genome blast database and I can search against it using 'blastn' 
>> at
>> the command line.
>> 
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>> 
>> I then create a StandAloneBlastPlus factory using the following code?
>> 
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>> 
>> and then attempt to blast my primers using this?
>> 
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>> 
>> This fails with the following error?
>> 
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, <DATA>
>> line 532.
>> 
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>> 
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>> 
>> I'd appreciate any help you might be able to provide to shed light on this.
>> 
>> Thanks in advance,
>> 
>> Mike
>> 
>> 
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
>> 
> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
> 


From maj at fortinbras.us  Fri Jan 29 13:36:54 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 29 Jan 2010 08:36:54 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife>
	<05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk>
	<FD6E9A89F6034CCB856E22553ED893D7@NewLife>
	<ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
Message-ID: <DF05D2C7E8CC4CF18E6AE56077EB738A@NewLife>

Hi Mike-
Well, at least we're getting more informative errors. I think it's
still my bad; will look again. Both of your calls should work.
(thanks for the positive control too)
Thanks for your patience and the help--
MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; "Brian Osborne" <bosborne11 at verizon.net>
Sent: Friday, January 29, 2010 8:25 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file 
found for nucleotide database [MouseGenome] in search path 
[/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I 
construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
> error running blastn
>
>
> Hi,
>
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, 
> <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> M
>
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
>
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
>> error
>> running blastn
>>
>>
>> Dear all,
>>
>> I am attempting to blast some primers against the mouse genome. I have 
>> created
>> a
>> local mouse genome blast database and I can search against it using 'blastn'
>> at
>> the command line.
>>
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>>
>> I then create a StandAloneBlastPlus factory using the following code?
>>
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>>
>> and then attempt to blast my primers using this?
>>
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>>
>> This fails with the following error?
>>
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, 
>> <DATA>
>> line 532.
>>
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>>
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>>
>> I'd appreciate any help you might be able to provide to shed light on this.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


From maj at fortinbras.us  Fri Jan 29 13:47:48 2010
From: maj at fortinbras.us (Mark A. Jensen)
Date: Fri, 29 Jan 2010 08:47:48 -0500
Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
	error running blastn
In-Reply-To: <ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
References: <1DB83E7D-DFDE-4BE7-9EC0-F78B8875DF55@bbsrc.ac.uk><56CC8EF7EEAD4E28A3B822ADA8AD74B1@NewLife><05171AE0-71F2-4102-8131-70C6019D0172@bbsrc.ac.uk><FD6E9A89F6034CCB856E22553ED893D7@NewLife>
	<ECDEAECD-2367-4718-86E4-3AABD8FE203E@bbsrc.ac.uk>
Message-ID: <2B7BF6CD46AE441AB24203E169D9C503@NewLife>

Mike et al--
I've entered this as Bug #3003 on http://bugzilla.bioperl.org;
we'll do further ping-pongs on this issue via the comment facility
there--
cheers MAJ
----- Original Message ----- 
From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: <bioperl-l at lists.open-bio.org>; <Brian at portal.open-bio.org>; "Osborne" 
<bosborne11 at verizon.net>
Sent: Friday, January 29, 2010 8:25 AM
Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
error running blastn


Hi Mark,

Thanks for your continued help.

It now fails with this:

------------- EXCEPTION -------------
MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem running 
/usr/local/ncbi/blast/bin/blastn : BLAST Database error: No alias or index file 
found for nucleotide database [MouseGenome] in search path 
[/Volumes/stubbing/PerlScripts/5CTest/trunk::]

STACK Bio::Tools::Run::WrapperBase::_run 
/Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1004
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
STACK Bio::Tools::Run::StandAloneBlastPlus::run 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
STACK toplevel ./5CTest.pl:63
-------------------------------------

If I change the factory creation to:
my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
-db_name => '/Users/stubbing/localBlast/MouseGenome'
);

it fails with

------------- EXCEPTION -------------
MSG: DB name not valid
STACK Bio::Tools::Run::StandAloneBlastPlus::new 
/Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:516
STACK toplevel ./5CTest.pl:45
-------------------------------------

However I can run the following successfully from the command line:

blastn -db  /Users/stubbing/localBlast/MouseGenome -query querySequence.fasta

Is there something wrong with how I'm referring to the blast database when I 
construct my factory?

Thanks again,

M


On 28 Jan 2010, at 18:47, Mark A. Jensen wrote:

> Hi Mike,
> Believe I found the real bug causing the problem (was not accounting for
> the db_dir parameter). Crashes should now also throw much more helpful
> errors. Please try the code at r16774, and shout back.
> thanks --
> MAJ
> ----- Original Message ----- 
> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
> To: "Mark A. Jensen" <maj at fortinbras.us>
> Cc: <bioperl-l at lists.open-bio.org>
> Sent: Thursday, January 28, 2010 11:18 AM
> Subject: Re: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek
> error running blastn
>
>
> Hi,
>
> Thanks for the suggestion. Unfortunately it still fails - error as follows:
>
> ------------- EXCEPTION -------------
> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem 
> running
> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 1000, 
> <DATA>
> line 532.
>
> STACK Bio::Tools::Run::WrapperBase::_run
> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:1005
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
> STACK Bio::Tools::Run::StandAloneBlastPlus::run
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
> STACK toplevel ./5CTest.pl:63
> -------------------------------------
>
> M
>
> On 28 Jan 2010, at 15:56, Mark A. Jensen wrote:
>
>> Mike - please try updating your bioperl-live (the core) to the latest code
>> (revision 16761 or so).
>> CommandExts is a work in progress; from the stack errors it looks like you've
>> got an older version.
>> Try it then ping us back, if you would--
>> Thanks
>> Mark
>> ----- Original Message ----- 
>> From: "mike stubbington (BI)" <mike.stubbington at bbsrc.ac.uk>
>> To: <bioperl-l at lists.open-bio.org>
>> Sent: Thursday, January 28, 2010 10:41 AM
>> Subject: [Bioperl-l] Bio::Tools::Run::StandAloneBlastPlus - Illegal seek 
>> error
>> running blastn
>>
>>
>> Dear all,
>>
>> I am attempting to blast some primers against the mouse genome. I have 
>> created
>> a
>> local mouse genome blast database and I can search against it using 'blastn'
>> at
>> the command line.
>>
>> I have perl code that creates an array of bioperl sequence objects called
>> @primers
>>
>> I then create a StandAloneBlastPlus factory using the following code?
>>
>> my $blastFactory = Bio::Tools::Run::StandAloneBlastPlus->new(
>> -db_dir => '/Users/stubbing/localBlast/',
>> -db_name => 'MouseGenome'
>> );
>>
>> and then attempt to blast my primers using this?
>>
>> my @shortPrimers;
>> my $count=1;
>> foreach (@primers) {
>> my $currentSeq = $_;
>> print "Checking primer $count/$primerNumber ";
>> if ($_->length < 40) {
>> push(@shortPrimers,$_);
>> print "Too short!\n";
>> }
>> else {
>> print "BLASTing...";
>> my $blastResult = $blastFactory->blastn(-query => $currentSeq);
>> }
>> $count++;
>> }
>>
>> This fails with the following error?
>>
>> ------------- EXCEPTION -------------
>> MSG: /usr/local/ncbi/blast/bin/blastn call crashed: There was a problem
>> running
>> /usr/local/ncbi/blast/bin/blastn : Illegal seek at
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm line 989, 
>> <DATA>
>> line 532.
>>
>> STACK Bio::Tools::Run::WrapperBase::_run
>> /Library/Perl/5.10.0/Bio/Tools/Run/WrapperBase/CommandExts.pm:994
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1236
>> STACK Bio::Tools::Run::StandAloneBlastPlus::run
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus/BlastMethods.pm:267
>> STACK Bio::Tools::Run::StandAloneBlastPlus::AUTOLOAD
>> /Library/Perl/5.10.0/Bio/Tools/Run/StandAloneBlastPlus.pm:1233
>> STACK toplevel ./5CTest.pl:63
>> -------------------------------------
>>
>> Line 63 in my code is (as you might expect) the one that calls blastn on my
>> factory object.
>>
>> I'd appreciate any help you might be able to provide to shed light on this.
>>
>> Thanks in advance,
>>
>> Mike
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>


_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From help at gmod.org  Fri Jan 29 22:03:48 2010
From: help at gmod.org (Dave Clements, GMOD Help Desk)
Date: Fri, 29 Jan 2010 14:03:48 -0800
Subject: [Bioperl-l] 2010 GMOD Summer School - Americas
In-Reply-To: <71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com>
References: <71ee57c71001291351q47994b82w10dffb390dbf2837@mail.gmail.com>
	<71ee57c71001291354m68548823s3e3fbd2e49e9b332@mail.gmail.com>
	<71ee57c71001291356p5e7f1aadi2bf437c93014a393@mail.gmail.com>
	<71ee57c71001291357h67112e2fkcf835687e59f66ae@mail.gmail.com>
	<71ee57c71001291358k74781b08n232534d8895c5ec1@mail.gmail.com>
	<71ee57c71001291400y28e40eb6i112ea91df977dc67@mail.gmail.com>
	<71ee57c71001291400n6133982eh3a02293ff741900b@mail.gmail.com>
	<71ee57c71001291401y505b56baic61c11754d88a444@mail.gmail.com>
	<71ee57c71001291402s23e3f2e9w2562d6acf85bd4ae@mail.gmail.com>
	<71ee57c71001291402h2ec67300r4fc7a3b2375f4080@mail.gmail.com>
Message-ID: <71ee57c71001291403s19be18f3s3a1d5a314c74def@mail.gmail.com>

Hello all,

I am pleased to announce that we are now accepting applications for:

? 2010 GMOD Summer School - Americas
? ? 6-9 May 2010
? ? NESCent, Durham, NC, USA
? ? http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas

This will be a hands-on multi-day course aimed at teaching new GMOD
users/administrators how to get GMOD Components up and running. The
course will introduce participants to the GMOD project and then focus
on installation, configuration and integration of popular GMOD
Components. The course will be held May 6-9, at NESCent in Durham, NC.

These components will be covered:
? ?* Apollo - genome annotation editor
? ?* Chado - a modular and extensible database schema
? ?* Galaxy - workflow system
? ?* GBrowse - the Generic Genome Browser
? ?* GBrowse_syn - A generic synteny browser
? ?* JBrowse - genome browser
? ?* MAKER - genome annotation pipeline
? ?* Tripal - web front end for Chado

The deadline for applying is the end of Friday, February 22. Admission
is competitive and is based on the strength of the application
(especially the statement of interest). In 2009 there were over 50
applications for the 25 slots. Any applications received after the
deadline will be placed on the waiting list.

See the course page for details and an application link:
?http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas

Thanks,

Dave Clements
GMOD Help Desk

PS: We are also investigating holding a GMOD course in the
Asia/Pacific region, sometime this fall. Watch the GMOD mailing lists
and the GMOD News page/RSS feed for updates.
--
Please keep responses on the list!
http://gmod.org/wiki/2010_GMOD_Summer_School_-_Americas
http://gmod.org/wiki/GMOD_News
Was this helpful? http://gmod.org/wiki/Help_Desk_Feedback


From bhakti.dwivedi at gmail.com  Sat Jan 30 22:38:40 2010
From: bhakti.dwivedi at gmail.com (Bhakti Dwivedi)
Date: Sat, 30 Jan 2010 17:38:40 -0500
Subject: [Bioperl-l] how to map blast results on to the genome?
Message-ID: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>

Does anyone know how I can graphically map the blast results (m -8 format)
to the genome using bio-perl?

Thanks

Bhakti


From jason at bioperl.org  Sat Jan 30 23:56:14 2010
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 30 Jan 2010 15:56:14 -0800
Subject: [Bioperl-l] how to map blast results on to the genome?
In-Reply-To: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>
References: <b643abd21001301438i1c509c22gf6ddc6735a100ab1@mail.gmail.com>
Message-ID: <68937A7D-291F-419A-9ED7-7A87D9B4C78A@bioperl.org>

Did you try BioGraphics and read the HOWTO on it -- http://bioperl.org/wiki/HOWTO:Graphics
On Jan 30, 2010, at 2:38 PM, Bhakti Dwivedi wrote:

> Does anyone know how I can graphically map the blast results (m -8  
> format)
> to the genome using bio-perl?
>
> Thanks
>
> Bhakti
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

--
Jason Stajich
jason.stajich at gmail.com
jason at bioperl.org
http://fungalgenomes.org/
http://twitter.com/hyphaltip


From David.Messina at sbc.su.se  Sun Jan 31 17:43:52 2010
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 31 Jan 2010 18:43:52 +0100
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
	<18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
Message-ID: <BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>

Hey Rui,

My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet.

I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two.

Dave


From rui.faria at upf.edu  Sun Jan 31 17:17:09 2010
From: rui.faria at upf.edu (Rui Faria)
Date: Sun, 31 Jan 2010 18:17:09 +0100 (CET)
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
Message-ID: <18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>

Hi Dave,

we reported the bug on codeml about errors when the user gives its own tree file, some time ago. Did you have any chances to look at it?

We basically wanted to know your opinion on where the problem may be, since we are not the most experienced "perlers" on the planet :) 

I'm asking this because we have to deal with that right now. If someone could check where is the problem, to understand if it has an easy solution, that would be of great help.

Best,

Rui


-----Mensaje Original-----
De Dave Messina <David.Messina at sbc.su.se>
Enviado Jue 31/12/2009 11:55 AM
Para Rui Faria <rui.faria at upf.edu>
Cc Jason Stajich <jason at bioperl.org>; sandraneto_ at hotmail.com; bioperl-l List <bioperl-l at lists.open-bio.org>
Asunto Re: question about a PAML module

Hi Rui and Sandra,

Could you file this as a bug report at 

http://bugzilla.open-bio.org/enter_bug.cgi?product=Bioperl

?

Once you've created the bug report with a brief description of the problem and submitted it, please attach the following to the bug report:
- sample input files (a sequence file and a tree file, probably)
- a script which reproduces the problem
- the output (error messages) like you show below

When I updated the code to work with the current version, I didn't exhaustively test all of the different modes of running codeml, so I appreciate you reporting this.

There was another, similar issue reported a few days ago. I will try to take a look at both of these bug reports soon.


Dave


From rui.faria at upf.edu  Sun Jan 31 18:56:56 2010
From: rui.faria at upf.edu (Rui Faria)
Date: Sun, 31 Jan 2010 19:56:56 +0100 (CET)
Subject: [Bioperl-l] question about a PAML module
In-Reply-To: <BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>
References: <17885902.1262198478831.JavaMail.oracle@rif1.s.upf.edu>
	<31992102.1262223390984.JavaMail.oracle@rif2.s.upf.edu>
	<DF84D43D-C6E7-4349-BD8A-C40DF7F3D29E@sbc.su.se>
	<18165610.1264958229480.JavaMail.oracle@rif1.s.upf.edu>
	<BE2530C8-9FE3-4A30-9D60-8EF6F808FB74@sbc.su.se>
Message-ID: <11398434.1264964216856.JavaMail.oracle@rif1.s.upf.edu>

Many thanks!

We hope one day that we become experts we can retribute!

Rui

-----Mensaje Original-----
De Dave Messina <David.Messina at sbc.su.se>
Enviado Dom 31/01/2010 06:43 PM
Para Rui Faria <rui.faria at upf.edu>
Cc Jason Stajich <jason at bioperl.org>; sandraneto_ at hotmail.com; bioperl-l List <bioperl-l at lists.open-bio.org>
Asunto Re: question about a PAML module

Hey Rui,

My apologies for keeping you waiting on this. I started looking at it on Friday, and while I believe it'll be a relatively easy fix, I haven't got to the bottom of it yet.

I'll look at it some more tomorrow and hopefully get it sorted it in the next day or two.

Dave