From brunovecchi at yahoo.com.ar  Fri Aug  1 00:16:16 2008
From: brunovecchi at yahoo.com.ar (Bruno Vecchi)
Date: Fri, 01 Aug 2008 01:16:16 -0300
Subject: [Bioperl-l] Bio::Biblio doesn't find articles [SOLVED]
Message-ID: <48928E10.7090903@yahoo.com.ar>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080801/f000f7b8/attachment.html>

From Kevin.Clancy at invitrogen.com  Fri Aug  1 18:30:30 2008
From: Kevin.Clancy at invitrogen.com (Clancy, Kevin)
Date: Fri, 1 Aug 2008 15:30:30 -0700
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
Message-ID: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>

Hi Folks
I am using the windows version of Bioperl 1.5.2_100. I recently was
compiling a tool using ActiveState's PerlApp which included Bioperl
modules. I received an error for the Bio::SeqIO module, which was
calling for the Bio::SeqIO::staden::read method(?) on line 312 - 314 of
the Bio::SeqIO.pm module. I don't appear to have a copy of the staden
module under the Bio::SeqIO directory and it doesn't appear to be
present in the current BioPerl trunk. I simply commented this out of my
SeqIO.pm file to perform my build and its all running normally. Was this
simply a reference to a non existent module or am I missing something?
Thank you for your help.
kevin

Kevin Clancy, PhD
Senior Scientist, Informatic Sciences
Invitrogen Corp
Carlsbad, CA 92008
Phone: (768) 268 8356
Email: kevin.clancy at invitrogen.com 


From jason at bioperl.org  Sat Aug  2 08:58:05 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 2 Aug 2008 07:58:05 -0500
Subject: [Bioperl-l] Inframe stop codon
In-Reply-To: <516747.39380.qm@web36405.mail.mud.yahoo.com>
References: <516747.39380.qm@web36405.mail.mud.yahoo.com>
Message-ID: <F3DAE939-82AC-4EB8-8667-F6612478477E@bioperl.org>

[regarding PAML analyses]

You would need to translate the cDNA sequence and identify where the  
stop codon is, then remove that codon or remove that sequence from  
your bulk analyses.  it depends on why you think the stop codon is in  
the sequence - mis-annotation, this is a pseudogene, or what?  If  
this is a small percentage of a lot of sequences I would probably  
just skip these, if this is the terminal stop codon that being  
included in the sequences, you just need to remove the last codon  
from the sequences before providing it to PAML. There Seq HOWTO has  
many examples of how to manipulate a sequence object with substr,  
trunc, as well as just the simple seq() method that gives you the  
sequence as a string, which you can manipulate, then update the  
sequence object afterwards. As in
my $str = $seq->seq;
# remove the last codon from this cDNA sequence
substr($str, -3, 3,'');
$seq->seq($str);

Alternatively you can use trunc to truncate the sequence
my $trunc = $seq->trunc(1,$seq->length -3);
$seq = $trunc;


You can translate the sequence with the $seq->translate command, then  
test for presence of a stop codon (This is exactly the code that is  
running in the pairwise_kaks script that is in the scripts/utilities/  
directory).  If you have a stop codon you need to figure out where it  
is at the end of the sequence or not.  If it is the terminal codon,  
you can just lop off the last codon on all your sequences, but if it  
is internal, you need to decide what you want to do with this sequence.

If there are multiple stop codons, I am not sure it is appropriate to  
run PAML here, unless you are interested in some sort of pseudo-rate  
calculation that has many of the codons omitted.  Otherwise you may  
just want to calculate a DNA substitution rate for the sequences to  
make comparison.

I suggest working a single file by hand to get the appropriate steps  
down and then coding it up will be easier.

I am sure folks on the list can help too so it is important to post  
to the mailing list - I don't see any messages from you on the list  
about this query.

-jason
On Aug 2, 2008, at 5:42 AM, Tannistha wrote:

>
> Hi Jason,
>
> Please suggest me how to filter the inframe stop codons,  
> aa_to_dna_aln returns the sequence with in-frame stop codons.
> I have posted my query along with the input files to the forum.
>
> Thanks for your earlier advice, runmode =0 is working for me.
>
> Look forward to your reply
>
> Best Regards
> Tannistha
>
>
> Dr. Tannistha Nandi
> email: tannistha3 at yahoo.com
>
>
>


From David.Messina at sbc.su.se  Sun Aug  3 15:10:18 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 3 Aug 2008 21:10:18 +0200
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
In-Reply-To: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
References: <Acj0JjSSmx+wYhbUS0GAoF0WhwWjzg==>
	<28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
Message-ID: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>

Hi Kevin,

The staden module is a oddball one, to be sure.

A search on the BioPerl website turns up this FAQ entry:
http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F

Also the Windows install page
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

says:

> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden> and
> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs can only
> be installed on Windows by using Cygwin <http://www.cygwin.com/> and its gcc
> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>


In any case, the staden module (and associated external libraries) is used
only if you are trying to read the scf, abi, alf, pln, exp, ctf, or ztr
binary formats. So your edit shouldn't cause you any problems otherwise.

Dave

From cjfields at uiuc.edu  Sun Aug  3 16:20:52 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Aug 2008 15:20:52 -0500
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
In-Reply-To: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>
References: <Acj0JjSSmx+wYhbUS0GAoF0WhwWjzg==>
	<28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
	<628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>
Message-ID: <BA800345-8498-46CF-AD94-E8FD9098DEBB@uiuc.edu>

This seems to be a problem with PerlApp and eval{}; judging by a quick  
Google search this isn't the only module affected.  The line in  
question is wrapped in an eval{} to check for the availability of  
Bio::SeqIO::staden::read (but not die on it).

BTW, the eval was moved into the relevant plugin modules post-1.5.2,  
so the eval{} is checked when the module is loaded dynamically (i.e.  
when a format requiring it is passed in).  It was causing other issues  
with ActivePerl installations and was redundant, so it was removed.

http://bugzilla.open-bio.org/show_bug.cgi?id=2295

chris

On Aug 3, 2008, at 2:10 PM, Dave Messina wrote:

> Hi Kevin,
>
> The staden module is a oddball one, to be sure.
>
> A search on the BioPerl website turns up this FAQ entry:
> http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F
>
> Also the Windows install page
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> says:
>
>> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden 
>> > and
>> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs  
>> can only
>> be installed on Windows by using Cygwin <http://www.cygwin.com/>  
>> and its gcc
>> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>>
>
>
> In any case, the staden module (and associated external libraries)  
> is used
> only if you are trying to read the scf, abi, alf, pln, exp, ctf, or  
> ztr
> binary formats. So your edit shouldn't cause you any problems  
> otherwise.
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From btemperton at googlemail.com  Sat Aug  2 16:05:37 2008
From: btemperton at googlemail.com (Benbo)
Date: Sat, 2 Aug 2008 13:05:37 -0700 (PDT)
Subject: [Bioperl-l]  Finding possible primers regex
Message-ID: <18792782.post@talk.nabble.com>


Hi there, 
I'm trying to write a perl script to scan an aligned multiple entry fasta
file and find possible primers. So far I've produced a string which contains
bases which match all sequences and * where they don't match e.g.
1) TTAGCCTAA
2) TTAGCAGAA
3) TTACCCTAA

would give TTA*C**AA.

I want to parse this string and pull out all sequences which are 18-21 bp in
length and have no more than 4 * in them.

So far, I've got this:

while($fragment_match =~ /([GTAC*]{18,21})/g){
print "$1\n";
}

hoping to match all fragments 18-21 characters in length. However even that
doesn't work as it has essentially chunked it into 21 char blocks, rather
than what I hoped for of
0-18
0-19
0-20
0-21
1-19
1-20
1-21
1-22

etc.

Can anyone let me know if this is already possible in BioPerl, or how one
would go about it with regex. Sadly I'm fairly new to perl and getting to
grips with BioPerl, so please treat me gently :).

Many thanks,

Ben


-- 
View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Aug  4 00:08:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Aug 2008 23:08:51 -0500
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <18792782.post@talk.nabble.com>
References: <18792782.post@talk.nabble.com>
Message-ID: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>

On Aug 2, 2008, at 3:05 PM, Benbo wrote:

>
> Hi there,
> I'm trying to write a perl script to scan an aligned multiple entry  
> fasta
> file and find possible primers. So far I've produced a string which  
> contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
>
> would give TTA*C**AA.
>
> I want to parse this string and pull out all sequences which are  
> 18-21 bp in
> length and have no more than 4 * in them.
>
> So far, I've got this:
>
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
>
> hoping to match all fragments 18-21 characters in length. However  
> even that
> doesn't work as it has essentially chunked it into 21 char blocks,  
> rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
>
> etc.
>
> Can anyone let me know if this is already possible in BioPerl, or  
> how one
> would go about it with regex. Sadly I'm fairly new to perl and  
> getting to
> grips with BioPerl, so please treat me gently :).
>
> Many thanks,
>
> Ben

There is a trick to this which is discussed more extensively in  
'Mastering Regular Expressions'.  Essentially you have to embed code  
into the regex and trick the parser into backtracking using a negative  
lookahead.  The match itself fails (i.e. no match is returned), but  
the embedded code is executed for each match attempt,

The following script is a slight modification of one I used which  
checks the consensus string from the input alignment (in aligned FASTA  
format here), extracts the alignment slice using that match, then spit  
the alignment out to STDOUT in clustalw format.  This should work for  
perl 5.8 and up, but it's only been tested on perl 5.10.  You should  
be able to use this to fit what you want.

my $in = Bio::AlignIO->new(-file => $file,
                            -format => 'fasta');
my $out = Bio::AlignIO->new(-fh => \*STDOUT,
                            -format => 'clustalw');

while (my $aln = $in->next_aln) {
     my $c = $aln->consensus_string(100);
     my @matches;
     $c =~ m/
         ([GTAC?]{18,21})
         (?{my $match = check_match($1);
            push @matches, [$match,
                            pos(),
                            length($match)]
               if defined $match;})
         (?!)
         /xig;
     for my $match (@matches) {
         my ($hit, $st, $end) = ($match->[0],
                                 $match->[1] - $match->[2] + 1,
                                 $match->[1]);
         my $newaln = $aln->slice($st, $end);
         $out->write_aln($newaln);
     }
}

sub check_match {
     my $match = shift;
     return unless $match;
     my $ct = $match =~ tr/?/?/;
     return $match if $ct <= 4;
}


chris


From heikki at sanbi.ac.za  Mon Aug  4 02:42:57 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Mon, 4 Aug 2008 08:42:57 +0200
Subject: [Bioperl-l] Bio::Coordinate::Pair
In-Reply-To: <C4B4F0DB.498%pchanpa@emory.edu>
References: <C4B4F0DB.498%pchanpa@emory.edu>
Message-ID: <200808040842.57466.heikki@sanbi.ac.za>

Prashanth,

Your example coordinates do not do the conversion but more or less report the 
locations of your features in some third coordinates.

The way to think coordinates pairs is to use them as HSPs. You tell the pair 
object what is the matching segment in the pair of sequences.


The synopsis in Bio::Coordinate::Pair class file gives the following example:


use Bio::Location::Simple;
use Bio::Coordinate::Pair;

my $match1 = Bio::Location::Simple->new
    (-seq_id => 'propeptide', -start => 21, -end => 40, -strand=>1 );
my $match2 = Bio::Location::Simple->new
    (-seq_id => 'peptide', -start => 1, -end => 20, -strand=>1 );
my $pair = Bio::Coordinate::Pair->new(-in => $match1,
				      -out => $match2
    );
# location to match
$pos = Bio::Location::Simple->new
    (-start => 25, -end => 25, -strand=> -1 );

$res = $pair->map($pos);
print $res->match->start; # 5


In other words, region 25-40 in the propeptide matches locations 1-20 in the 
final peptide. Therefore conversion from 25 gives 5:

     signalp        21  25             40
--------------------|---|--------------|
                    1   5  pep         20

I hope this clarifies it.

The advantage of using these objects over manual conversion is that the code 
has been debugged (no all too easy +/-1 errors) and that they can be chained 
together.

Yours,

      -Heikki


On Tuesday 29 July 2008 22:07:55 Prashanth Athri wrote:
> Dear Professor Lehvaslaiho:
>
> I had a quick question about the module- Bio::Coordinate::Pair
>
> The BioPerl tutorial has the following example:
>
> $input_coordinates = Bio::Location::Simple->new
> (-seq_id => 'propeptide', -start => 1000, -end => 2000, -strand=>1 );
>
> $output_coordinates = Bio::Location::Simple->new
> (-seq_id => 'peptide', -start => 1100, -end => 2100, -strand=>1 );
>
> $pair = Bio::Coordinate::Pair->new
> (-in => $input_coordinates , -out => $output_coordinates );
>
> $pos = Bio::Location::Simple->new (-start => 500, -end => 500 );
>
> $res = $pair->map($pos);
> $converted_start = $res->start;
>
> The way I understand it, $converted_start should return ?1600?. But when I
> run this snippet, it returns ?500?. Could you please let me know how
> $pair->map($pos) is processed?
>
> I appreciate your time and thanks in advance.
>
> Regards,
> Prashanth


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From lengjingmao at gmail.com  Tue Aug  5 03:36:23 2008
From: lengjingmao at gmail.com (Shaohua Fan)
Date: Tue, 5 Aug 2008 15:36:23 +0800
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
References: <18792782.post@talk.nabble.com>
Message-ID: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>

Hi, there ,

I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?

thanks a lot. 
yours,
shaohua
----- Original Message ----- 
From: "Benbo" <btemperton at googlemail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Sunday, August 03, 2008 4:05 AM
Subject: [Bioperl-l] Finding possible primers regex


> 
> Hi there, 
> I'm trying to write a perl script to scan an aligned multiple entry fasta
> file and find possible primers. So far I've produced a string which contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
> 
> would give TTA*C**AA.
> 
> I want to parse this string and pull out all sequences which are 18-21 bp in
> length and have no more than 4 * in them.
> 
> So far, I've got this:
> 
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
> 
> hoping to match all fragments 18-21 characters in length. However even that
> doesn't work as it has essentially chunked it into 21 char blocks, rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
> 
> etc.
> 
> Can anyone let me know if this is already possible in BioPerl, or how one
> would go about it with regex. Sadly I'm fairly new to perl and getting to
> grips with BioPerl, so please treat me gently :).
> 
> Many thanks,
> 
> Ben
> 
> 
> 
> -- 
> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Tue Aug  5 05:49:55 2008
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 5 Aug 2008 11:49:55 +0200
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
References: <18792782.post@talk.nabble.com>
	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
Message-ID: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>

Hi,

There is a BioPerl Utility script doing this.
See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header.

" scripts/utilities/bp_nrdb.PLS
    Make a non-redundant database based on sequence, not id. Requires
Digest::MD5."

Alternatively, you can make a hash using the sequences as keys.


Regards,
Bernd

On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com> wrote:
> Hi, there ,
>
> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?
>
> thanks a lot.
> yours,
> shaohua
> ----- Original Message -----
> From: "Benbo" <btemperton at googlemail.com>
> To: <Bioperl-l at lists.open-bio.org>
> Sent: Sunday, August 03, 2008 4:05 AM
> Subject: [Bioperl-l] Finding possible primers regex
>
>
>>
>> Hi there,
>> I'm trying to write a perl script to scan an aligned multiple entry fasta
>> file and find possible primers. So far I've produced a string which contains
>> bases which match all sequences and * where they don't match e.g.
>> 1) TTAGCCTAA
>> 2) TTAGCAGAA
>> 3) TTACCCTAA
>>
>> would give TTA*C**AA.
>>
>> I want to parse this string and pull out all sequences which are 18-21 bp in
>> length and have no more than 4 * in them.
>>
>> So far, I've got this:
>>
>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>> print "$1\n";
>> }
>>
>> hoping to match all fragments 18-21 characters in length. However even that
>> doesn't work as it has essentially chunked it into 21 char blocks, rather
>> than what I hoped for of
>> 0-18
>> 0-19
>> 0-20
>> 0-21
>> 1-19
>> 1-20
>> 1-21
>> 1-22
>>
>> etc.
>>
>> Can anyone let me know if this is already possible in BioPerl, or how one
>> would go about it with regex. Sadly I'm fairly new to perl and getting to
>> grips with BioPerl, so please treat me gently :).
>>
>> Many thanks,
>>
>> Ben
>>
>>
>>
>> --
>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

From diriano at uni-potsdam.de  Tue Aug  5 06:28:58 2008
From: diriano at uni-potsdam.de (Diego Mauricio Riano Pachon)
Date: Tue, 05 Aug 2008 12:28:58 +0200
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
Message-ID: <48982B6A.4050304@uni-potsdam.de>

Hi all,

Or you might try a non-bioperl solution that works pretty well, check:

http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86

Best,

Diego

Bernd Web wrote:
> Hi,
> 
> There is a BioPerl Utility script doing this.
> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header.
> 
> " scripts/utilities/bp_nrdb.PLS
>     Make a non-redundant database based on sequence, not id. Requires
> Digest::MD5."
> 
> Alternatively, you can make a hash using the sequences as keys.
> 
> 
> Regards,
> Bernd
> 
> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com> wrote:
>> Hi, there ,
>>
>> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?
>>
>> thanks a lot.
>> yours,
>> shaohua
>> ----- Original Message -----
>> From: "Benbo" <btemperton at googlemail.com>
>> To: <Bioperl-l at lists.open-bio.org>
>> Sent: Sunday, August 03, 2008 4:05 AM
>> Subject: [Bioperl-l] Finding possible primers regex
>>
>>
>>> Hi there,
>>> I'm trying to write a perl script to scan an aligned multiple entry fasta
>>> file and find possible primers. So far I've produced a string which contains
>>> bases which match all sequences and * where they don't match e.g.
>>> 1) TTAGCCTAA
>>> 2) TTAGCAGAA
>>> 3) TTACCCTAA
>>>
>>> would give TTA*C**AA.
>>>
>>> I want to parse this string and pull out all sequences which are 18-21 bp in
>>> length and have no more than 4 * in them.
>>>
>>> So far, I've got this:
>>>
>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>> print "$1\n";
>>> }
>>>
>>> hoping to match all fragments 18-21 characters in length. However even that
>>> doesn't work as it has essentially chunked it into 21 char blocks, rather
>>> than what I hoped for of
>>> 0-18
>>> 0-19
>>> 0-20
>>> 0-21
>>> 1-19
>>> 1-20
>>> 1-21
>>> 1-22
>>>
>>> etc.
>>>
>>> Can anyone let me know if this is already possible in BioPerl, or how one
>>> would go about it with regex. Sadly I'm fairly new to perl and getting to
>>> grips with BioPerl, so please treat me gently :).
>>>
>>> Many thanks,
>>>
>>> Ben
>>>
>>>
>>>
>>> --
>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
___________________________________
Diego Mauricio Ria?o Pach?n
Biologist - PhD student
AG Mueller-Roeber
Institute for Biochemistry and Biology
University of Potsdam

Address: Karl-Liebknecht-Str. 24-25
	 Haus 20
	 14476 Golm
	 Germany

Tel:	 +49 331 977 2809
Fax:	 +49 331 977 2512

web:	http://www.geocities.com/dmrp.geo

From cjfields at uiuc.edu  Tue Aug  5 11:19:54 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Aug 2008 10:19:54 -0500
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <48982B6A.4050304@uni-potsdam.de>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>

Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> 	 Haus 20
> 	 14476 Golm
> 	 Germany
>
> Tel:	 +49 331 977 2809
> Fax:	 +49 331 977 2512
>
> web:	http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Aug  5 11:19:54 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Aug 2008 10:19:54 -0500
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <48982B6A.4050304@uni-potsdam.de>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>

Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> 	 Haus 20
> 	 14476 Golm
> 	 Germany
>
> Tel:	 +49 331 977 2809
> Fax:	 +49 331 977 2512
>
> web:	http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From lengjingmao at gmail.com  Tue Aug  5 11:24:22 2008
From: lengjingmao at gmail.com (Shaohua Fan)
Date: Tue, 5 Aug 2008 23:24:22 +0800
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
	<4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>
Message-ID: <3A95AD6D18A749F3B73C135CCC8E7C90@6B2F7FFC298C46F>

hi,
thanks a lot for the help!
cheers,
shaohua
----- Original Message ----- 
From: "Chris Fields" <cjfields at uiuc.edu>
To: "Diego Mauricio Riano Pachon" <diriano at uni-potsdam.de>
Cc: "Bernd Web" <bernd.web at gmail.com>; <Bioperl-l at lists.open-bio.org>; "Shaohua Fan" <lengjingmao at gmail.com>
Sent: Tuesday, August 05, 2008 11:19 PM
Subject: Re: [Bioperl-l] how to remove indentical sequences from a dataset


Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> Haus 20
> 14476 Golm
> Germany
>
> Tel: +49 331 977 2809
> Fax: +49 331 977 2512
>
> web: http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From martin.senger at gmail.com  Tue Aug  5 22:53:07 2008
From: martin.senger at gmail.com (Martin Senger)
Date: Wed, 6 Aug 2008 10:53:07 +0800
Subject: [Bioperl-l] Bio::Biblio doesn't find articles
Message-ID: <4d93f07c0808051953k4cb7511cg5ec4cd93f53cfd0f@mail.gmail.com>

I am afraid that the server that serves the MEDLINE database to the
Bio::Biblio module (using the SOAP protocol), and that is running at EBI,
may be not fully supported. I am not working at EBI anymore and I have
stopped to monitor their servers. I am still their collaborator - but I am
not, unfortunately, involved in the MEDLINE tools anymore.

I would be happy to continue to maintain the Bio::Biblio module but it
relies on a server that I do not anymore control.

Cheers,
Martin

-- 
Martin Senger
email: martin.senger at gmail.com,m.senger at cgiar.org
skype: martinsenger

From Russell.Smithies at agresearch.co.nz  Wed Aug  6 17:20:04 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 7 Aug 2008 09:20:04 +1200
Subject: [Bioperl-l] not BioPerl
Message-ID: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>

Has anyone taken a look at the new Perl interface to the NCBI C++
Toolkit?
Unfortunately, I can't even get their examples working as I'm behind a
firewall and documentation on setting proxy stuff is virtually
non-existant  :-(


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Wed Aug  6 17:33:27 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 6 Aug 2008 16:33:27 -0500
Subject: [Bioperl-l] not BioPerl
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
Message-ID: <E4A73CE7-9288-4245-99D5-90F488698D6C@illinois.edu>

Looks like they're binary releases for 32- and 64-bit linux (quite  
large, at 25 MB).  Would be nice to have the C++ bindings for other  
OS's (my guess is this was set up via swig).

I have access to a linux cluster, so I may give this a try soon.

chris

On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:

> Has anyone taken a look at the new Perl interface to the NCBI C++
> Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>
>
> Russell Smithies
>
> Bioinformatics Applications Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
>
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
>
>
>
>
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From vinaykmittal at gatech.edu  Wed Aug  6 16:56:22 2008
From: vinaykmittal at gatech.edu (Mittal, Vinay K)
Date: Wed, 6 Aug 2008 16:56:22 -0400 (EDT)
Subject: [Bioperl-l] Error installing Biopel Module
Message-ID: <469631287.3995201218056182383.JavaMail.root@mail5.gatech.edu>

Hi,

I just installed Active perl 5.10.0 and was trying to install Bioperl Modules. While installing Bioperl through package manager(ppm), I am getting following errors:

ppm install failed: Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core

I don't know what the problem is. I have never used Bioperl Modules before.

Thanks.


-- 
--------
Vinay Kumar Mittal
MS,Bioinformatics
Georgia Institute of Technology

From rfrancis at ichr.uwa.edu.au  Wed Aug  6 21:11:28 2008
From: rfrancis at ichr.uwa.edu.au (Richard Francis)
Date: Thu, 07 Aug 2008 09:11:28 +0800
Subject: [Bioperl-l] AlignIO::clustalw match_line query
Message-ID: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>

Dear List,

I wonder if you can help.

I?m having trouble finding out on which criteria the conserved and
semi-conserved substitution decisions for a match line produced from the
match_line function in AlignIO are based.

I note that match_line produces the same output as an alignment match
line would from ClustalW and indeed is used in the AlignIO::clustalw
module, but are the substitution decisions based on the same Venn
diagram at http://www.ebi.ac.uk/Tools/clustalw2/clustalw_help.html#color
ie are they faithful to the generation of the match line from within
ClustalW itself?

I need to know this as part of a paper I?m writing so I would really
appreciate your help with this.

Kind regards and thanks in advance,

Richard Francis
#####################################################################################
This e-mail message has been scanned for Viruses and Content and cleared 
by MailMarshal
#####################################################################################


From jason at bioperl.org  Wed Aug  6 22:26:06 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 6 Aug 2008 19:26:06 -0700
Subject: [Bioperl-l] AlignIO::clustalw match_line query
In-Reply-To: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>
References: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>
Message-ID: <A7F598FF-C34F-466E-A73B-D714872201C8@bioperl.org>

Implemented independently, but it was based on what the clustalw  
documentation says.
The main code is in the match_line function in Bio::SimpleAlign.

See the CONSERVATION_GROUPS Hash which looks like this: So a  
'strong' (":") on the match line would be coded where the residues  
seen in a column are only 'S', 'T', or 'A' (for example).

            'strong' => [ qw(
				STA
				NEQK
				NHQK
				NDEQ
				QHRK
				MILV
				MILF
				HY
				FYW )],
             'weak' => [ qw(
				CSA
				ATV
				SAG
				STNK
				STPA
				SGND
				SNDEQK
				NDEQHK
				NEQHRK
				FVLIM
				HFY )],);
}

It was checked against clustalw output by hand when it was  
implemented. If you know of any inconsistencies, let use know.

-jason
On Aug 6, 2008, at 6:11 PM, Richard Francis wrote:

> Dear List,
>
> I wonder if you can help.
>
> I?m having trouble finding out on which criteria the conserved and
> semi-conserved substitution decisions for a match line produced  
> from the
> match_line function in AlignIO are based.
>
> I note that match_line produces the same output as an alignment match
> line would from ClustalW and indeed is used in the AlignIO::clustalw
> module, but are the substitution decisions based on the same Venn
> diagram at http://www.ebi.ac.uk/Tools/clustalw2/ 
> clustalw_help.html#color
> ie are they faithful to the generation of the match line from within
> ClustalW itself?
>
> I need to know this as part of a paper I?m writing so I would really
> appreciate your help with this.
>
> Kind regards and thanks in advance,
>
> Richard Francis
> ###################################################################### 
> ###############
> This e-mail message has been scanned for Viruses and Content and  
> cleared
> by MailMarshal
> ###################################################################### 
> ###############
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From betts at embl.de  Thu Aug  7 08:42:59 2008
From: betts at embl.de (Matthew Betts)
Date: Thu, 7 Aug 2008 14:42:59 +0200 (CEST)
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
Message-ID: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>


Hi,

Has any one tried to draw secondary structure with Bio::Graphics? i.e. two 
different types of glyph with different colours on the same track.

Could use a hash reference to get the different glyph types (would be nice 
if there was a cylinder glyph and a thick arrow glyph), or heterogeneous 
segments to get the different colours, but I can't see how to do both at 
the same time.

Any example code or suggestions on how I could implement it would be 
great.

Thanks,

Matthew

-- 
Matthew Betts PhD, Russell Group (Structural Bioinformatics)
EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
phone: +49 (0)6221 387 8305; mailto:betts at embl.de

From cain.cshl at gmail.com  Thu Aug  7 10:08:39 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 7 Aug 2008 10:08:39 -0400
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
In-Reply-To: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
References: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
Message-ID: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>

Hi Matthew,

I don't have any code examples, but people have used GBrowse for
protein secondary structure, which uses Bio::Graphics underneath the
hood.

If you want to put more than one glyph and/or more than one color in a
track, it is fairly easy.  You just need to provide a callback for
each option when you create the track, like this:

  $panel->add_track($features_array_ref,
                    -glyph       => sub { #code to set the glyph
according the attributes of the feature  },
                    -bgcolor     => sub { #code to set the color },
                    -fgcolor     => 'black',
                 ...etc...
  );

For more information, see the biographics howto:

  http://www.bioperl.org/wiki/HOWTO:Graphics

Scott


On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts <betts at embl.de> wrote:
>
> Hi,
>
> Has any one tried to draw secondary structure with Bio::Graphics? i.e. two
> different types of glyph with different colours on the same track.
>
> Could use a hash reference to get the different glyph types (would be nice
> if there was a cylinder glyph and a thick arrow glyph), or heterogeneous
> segments to get the different colours, but I can't see how to do both at
> the same time.
>
> Any example code or suggestions on how I could implement it would be
> great.
>
> Thanks,
>
> Matthew
>
> --
> Matthew Betts PhD, Russell Group (Structural Bioinformatics)
> EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
> phone: +49 (0)6221 387 8305; mailto:betts at embl.de
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory

From betts at embl.de  Thu Aug  7 12:27:28 2008
From: betts at embl.de (Matthew Betts)
Date: Thu, 7 Aug 2008 18:27:28 +0200 (CEST)
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
In-Reply-To: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>
References: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
	<536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0808071823310.8169@pc-russell13>


Hi Scott,

Thanks for that, was a great help - I didn't realise I could use a code 
ref for anything other than the glyph name.

I'm now doing this:

$panel->add_track(
                  '-bgcolor' => sub {
				     my($feature) = @_;
                                     $feature->display_name eq 'strand' ? 'cyan' : 'magenta';
                                    },
                  '-strand_arrow' => sub {
                                          my($feature) = @_;
                                          $feature->display_name eq 'strand' ? 1 : 0;
                                         },
);

Matthew


On Thu, 7 Aug 2008, Scott Cain wrote:

> Hi Matthew,
> 
> I don't have any code examples, but people have used GBrowse for
> protein secondary structure, which uses Bio::Graphics underneath the
> hood.
> 
> If you want to put more than one glyph and/or more than one color in a
> track, it is fairly easy.  You just need to provide a callback for
> each option when you create the track, like this:
> 
>   $panel->add_track($features_array_ref,
>                     -glyph       => sub { #code to set the glyph
> according the attributes of the feature  },
>                     -bgcolor     => sub { #code to set the color },
>                     -fgcolor     => 'black',
>                  ...etc...
>   );
> 
> For more information, see the biographics howto:
> 
>   http://www.bioperl.org/wiki/HOWTO:Graphics
> 
> Scott
> 
> 
> 
> On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts <betts at embl.de> wrote:
> >
> > Hi,
> >
> > Has any one tried to draw secondary structure with Bio::Graphics? i.e. two
> > different types of glyph with different colours on the same track.
> >
> > Could use a hash reference to get the different glyph types (would be nice
> > if there was a cylinder glyph and a thick arrow glyph), or heterogeneous
> > segments to get the different colours, but I can't see how to do both at
> > the same time.
> >
> > Any example code or suggestions on how I could implement it would be
> > great.
> >
> > Thanks,
> >
> > Matthew
> >
> > --
> > Matthew Betts PhD, Russell Group (Structural Bioinformatics)
> > EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
> > phone: +49 (0)6221 387 8305; mailto:betts at embl.de
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> 

From jay at jays.net  Thu Aug  7 12:32:29 2008
From: jay at jays.net (Jay Hannah)
Date: Thu, 07 Aug 2008 11:32:29 -0500
Subject: [Bioperl-l] not BioPerl
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
Message-ID: <489B239D.8060305@jays.net>

Smithies, Russell wrote:
> Has anyone taken a look at the new Perl interface to the NCBI C++ Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>   

Do people actually use the NCBI C++ Toolkit for things outside of NCBI?  
What?

I tried to leverage it a year or so ago for an Entrez/sequence/search 
project and got nowhere.

j


From jcherry at ncbi.nlm.nih.gov  Thu Aug  7 13:06:28 2008
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 7 Aug 2008 13:06:28 -0400 (EDT)
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
Message-ID: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>

For those who may be wondering what this is about, a Perl interface to the 
NCBI C++ Toolkit is available at 
ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/.  The C++ Toolkit is 
the main code base that we develop and use at NCBI.  It includes many 
things that may be of interest to BioPerl users, such as sequence analysis 
algorithms, means for interacting with NCBI databases, and facilities for 
reading, writing, and manipulating NCBI data model objects (usually 
defined by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and 
readable from ASN.1 and XML).

Russell, I think you can make things work from behind a firewall by 
setting some environment variables: set CONN_FIREWALL to 1, possibly set 
CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT 
as appropriate.  Please email me if you can't get things to work.  I'll 
see that decent instructions for this are included in the next release.

Josh Cherry


On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:

> Has anyone taken a look at the new Perl interface to the NCBI C++
> Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>
>
> Russell Smithies


From tristan.lefebure at gmail.com  Thu Aug  7 13:35:24 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Thu, 7 Aug 2008 13:35:24 -0400
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
Message-ID: <200808071335.24668.tristan.lefebure@gmail.com>

Hi list,

I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
BioPerl (with the only difference that I'm using taxids instead of taxon 
names). Basically, the script generates a taxonomic tree given a list of 
taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
object, and then merge this object to a tree object that keeps growing. It 
runs very well with a small number of taxa, but with many taxa (>1000), it is 
very very very slow (about a week for 3000 taxa).

The slowness is due to the  function merge_lineage (line 65), which merges the 
existing tree object with a new taxon object. I guess that the difficulty 
with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
between the tree and the new taxon object...

Would you have any idea of how to get around the problem? Should I look under 
the hood of merge_lineage to try to improve it for large trees?

Thanks!

Version: bioperl-1.5.2_102
OS: GNU/Linux

-Tristan

From cjfields at illinois.edu  Thu Aug  7 13:38:53 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Aug 2008 12:38:53 -0500
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
In-Reply-To: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
References: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
Message-ID: <E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>

Josh,

Thanks for the update.  I saw that these are only binaries for linux  
32/64-bit.  Are there plans to either support other OS's (OS X, Win,  
etc) or to maybe make a release with the XS-bindings so users can work  
towards that?  With additional support I can see this easily fitting  
into several spots in BioPerl, but otherwise I'm unsure.

chris

On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote:

> For those who may be wondering what this is about, a Perl interface  
> to the NCBI C++ Toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/ 
> .  The C++ Toolkit is the main code base that we develop and use at  
> NCBI.  It includes many things that may be of interest to BioPerl  
> users, such as sequence analysis algorithms, means for interacting  
> with NCBI databases, and facilities for reading, writing, and  
> manipulating NCBI data model objects (usually defined by ASN.1  
> specifications; writeable as ASN.1, XML, and JSON, and readable from  
> ASN.1 and XML).
>
> Russell, I think you can make things work from behind a firewall by  
> setting some environment variables: set CONN_FIREWALL to 1, possibly  
> set CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and  
> CONN_HTTP_PROXY_PORT as appropriate.  Please email me if you can't  
> get things to work.  I'll see that decent instructions for this are  
> included in the next release.
>
> Josh Cherry
>
>
> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:
>
>> Has anyone taken a look at the new Perl interface to the NCBI C++
>> Toolkit?
>> Unfortunately, I can't even get their examples working as I'm  
>> behind a
>> firewall and documentation on setting proxy stuff is virtually
>> non-existant  :-(
>>
>>
>> Russell Smithies
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From jcherry at ncbi.nlm.nih.gov  Thu Aug  7 14:04:17 2008
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 7 Aug 2008 14:04:17 -0400 (EDT)
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
In-Reply-To: <E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>
References: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
	<E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>
Message-ID: <Pine.LNX.4.64.0808071359430.13868@widget0.be-md.ncbi.nlm.nih.gov>

Chris,

Support for other OS's is definitely a possibility, depending on community 
feedback (how useful are the wrappers in general, and how much demand is 
there for them on other platforms?).  I wish I could magically make them 
available for Windows and OS X, but there are some technical issues to 
work out.

Josh


On Thu, 7 Aug 2008, Chris Fields wrote:

> Josh,
>
> Thanks for the update.  I saw that these are only binaries for linux 
> 32/64-bit.  Are there plans to either support other OS's (OS X, Win, etc) or 
> to maybe make a release with the XS-bindings so users can work towards that? 
> With additional support I can see this easily fitting into several spots in 
> BioPerl, but otherwise I'm unsure.
>
> chris
>
> On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote:
>
>> For those who may be wondering what this is about, a Perl interface to the 
>> NCBI C++ Toolkit is available at 
>> ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/.  The C++ Toolkit is 
>> the main code base that we develop and use at NCBI.  It includes many 
>> things that may be of interest to BioPerl users, such as sequence analysis 
>> algorithms, means for interacting with NCBI databases, and facilities for 
>> reading, writing, and manipulating NCBI data model objects (usually defined 
>> by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and readable 
>> from ASN.1 and XML).
>> 
>> Russell, I think you can make things work from behind a firewall by setting 
>> some environment variables: set CONN_FIREWALL to 1, possibly set 
>> CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT 
>> as appropriate.  Please email me if you can't get things to work.  I'll see 
>> that decent instructions for this are included in the next release.
>> 
>> Josh Cherry
>> 
>> 
>> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:
>> 
>>> Has anyone taken a look at the new Perl interface to the NCBI C++
>>> Toolkit?
>>> Unfortunately, I can't even get their examples working as I'm behind a
>>> firewall and documentation on setting proxy stuff is virtually
>>> non-existant  :-(
>>> 
>>> 
>>> Russell Smithies
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>

From bix at sendu.me.uk  Thu Aug  7 18:20:29 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Aug 2008 23:20:29 +0100
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
 large trees
In-Reply-To: <200808071335.24668.tristan.lefebure@gmail.com>
References: <200808071335.24668.tristan.lefebure@gmail.com>
Message-ID: <489B752D.2080209@sendu.me.uk>

Tristan Lefebure wrote:
> I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
> BioPerl (with the only difference that I'm using taxids instead of taxon 
> names). Basically, the script generates a taxonomic tree given a list of 
> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
> object, and then merge this object to a tree object that keeps growing. It 
> runs very well with a small number of taxa, but with many taxa (>1000), it is 
> very very very slow (about a week for 3000 taxa).
> 
> The slowness is due to the  function merge_lineage (line 65), which merges the 
> existing tree object with a new taxon object. I guess that the difficulty 
> with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
> between the tree and the new taxon object...
> 
> Would you have any idea of how to get around the problem? Should I look under 
> the hood of merge_lineage to try to improve it for large trees?

Yes, please do. It might have been me that wrote that, in which case I 
didn't do anything fancy or consider the above problem.

From cjfields at illinois.edu  Thu Aug  7 20:42:16 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Aug 2008 19:42:16 -0500
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
In-Reply-To: <489B752D.2080209@sendu.me.uk>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
Message-ID: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>


On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:

> Tristan Lefebure wrote:
>> I'm using a script very similar to bp_taxonomy2tree.pl distributed  
>> with BioPerl (with the only difference that I'm using taxids  
>> instead of taxon names). Basically, the script generates a  
>> taxonomic tree given a list of taxids using the NCBI taxonomy db.  
>> For each taxon, it generates a taxon object, and then merge this  
>> object to a tree object that keeps growing. It runs very well with  
>> a small number of taxa, but with many taxa (>1000), it is very very  
>> very slow (about a week for 3000 taxa).
>> The slowness is due to the  function merge_lineage (line 65), which  
>> merges the existing tree object with a new taxon object. I guess  
>> that the difficulty with a big tree (i.e. more than 1000 leaf) is  
>> to find the nodes in common between the tree and the new taxon  
>> object...
>> Would you have any idea of how to get around the problem? Should I  
>> look under the hood of merge_lineage to try to improve it for large  
>> trees?
>
> Yes, please do. It might have been me that wrote that, in which case  
> I didn't do anything fancy or consider the above problem.

Actually I thought that was fixed; wasn't some caching added for that  
script at one point?

chris

From bix at sendu.me.uk  Fri Aug  8 03:50:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 08 Aug 2008 08:50:50 +0100
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
 large trees
In-Reply-To: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
	<7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
Message-ID: <489BFADA.1060308@sendu.me.uk>

Chris Fields wrote:
> 
> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
> 
>> Tristan Lefebure wrote:
>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed 
>>> with BioPerl (with the only difference that I'm using taxids instead 
>>> of taxon names). Basically, the script generates a taxonomic tree 
>>> given a list of taxids using the NCBI taxonomy db. For each taxon, it 
>>> generates a taxon object, and then merge this object to a tree object 
>>> that keeps growing. It runs very well with a small number of taxa, 
>>> but with many taxa (>1000), it is very very very slow (about a week 
>>> for 3000 taxa).
>>> The slowness is due to the  function merge_lineage (line 65), which 
>>> merges the existing tree object with a new taxon object. I guess that 
>>> the difficulty with a big tree (i.e. more than 1000 leaf) is to find 
>>> the nodes in common between the tree and the new taxon object...
>>> Would you have any idea of how to get around the problem? Should I 
>>> look under the hood of merge_lineage to try to improve it for large 
>>> trees?
>>
>> Yes, please do. It might have been me that wrote that, in which case I 
>> didn't do anything fancy or consider the above problem.
> 
> Actually I thought that was fixed;

Oh yeah. Looks like I did something related to 'speedup for 
merge_lineage()' on the 18th Dec 2006. Tristan, checkout 
Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.

From tristan.lefebure at gmail.com  Fri Aug  8 12:02:32 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Fri, 8 Aug 2008 12:02:32 -0400
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
In-Reply-To: <489BFADA.1060308@sendu.me.uk>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
	<7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
	<489BFADA.1060308@sendu.me.uk>
Message-ID: <d31f7c40808080902o70fbf78dub50c9ba660355ba0@mail.gmail.com>

Yes indeed, with the svn code it took 10 minutes (compared to one week!)
Thanks, -Tristan

On Fri, Aug 8, 2008 at 3:50 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Chris Fields wrote:
>
>>
>> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
>>
>>  Tristan Lefebure wrote:
>>>
>>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed with
>>>> BioPerl (with the only difference that I'm using taxids instead of taxon
>>>> names). Basically, the script generates a taxonomic tree given a list of
>>>> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon
>>>> object, and then merge this object to a tree object that keeps growing. It
>>>> runs very well with a small number of taxa, but with many taxa (>1000), it
>>>> is very very very slow (about a week for 3000 taxa).
>>>> The slowness is due to the  function merge_lineage (line 65), which
>>>> merges the existing tree object with a new taxon object. I guess that the
>>>> difficulty with a big tree (i.e. more than 1000 leaf) is to find the nodes
>>>> in common between the tree and the new taxon object...
>>>> Would you have any idea of how to get around the problem? Should I look
>>>> under the hood of merge_lineage to try to improve it for large trees?
>>>>
>>>
>>> Yes, please do. It might have been me that wrote that, in which case I
>>> didn't do anything fancy or consider the above problem.
>>>
>>
>> Actually I thought that was fixed;
>>
>
> Oh yeah. Looks like I did something related to 'speedup for
> merge_lineage()' on the 18th Dec 2006. Tristan, checkout
> Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.
>

From rvos at interchange.ubc.ca  Fri Aug  8 19:59:20 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Fri, 8 Aug 2008 16:59:20 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
Message-ID: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>

Hi,

while going through a large genbank file
(ftp://ftp.ncbi.nlm.nih.gov/genbank/gbpri21.seq.gz) I ran into malloc
errors. Just for the record (I doubt this does anyone any good), I
got:

perl(391) malloc: *** vm_allocate(size=8421376) failed (error code=3)
perl(391) malloc: *** error: can't allocate region
perl(391) malloc: *** set a breakpoint in szone_error to debug
Out of memory!

What I was trying to do is go through the file, and only write out
those seq objects that aren't human, and that have CDS features, i.e.:

################################################

#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

my $dir = shift @ARGV; # the directory with *.gz files
my $out = shift @ARGV; # the directory to write to...
mkdir $out if not -d $out; # ...which may need to be created

opendir my $dirhandle, $dir or die $!;
for my $archive ( readdir $dirhandle ) {
    next if $archive !~ /\.gz$/;
    my $file = $archive;
    $file =~ s/\.gz$//;

    # external call to the gunzip utility,
    # such that we keep the archive
    system( "gunzip -c \"${dir}/${archive}\" > \"${dir}/${file}\"" );

    # object that parses genbank files,
    # returns Bio::Seq objects
    my $reader = Bio::SeqIO->new(
        '-format' => 'genbank',
        '-file'   => "${dir}/${file}"
    );

    # object that receives Bio::Seq objects,
    # writes genbank files
    my $writer = Bio::SeqIO->new(
        '-format' => 'genbank',
        '-file'   => ">${out}/${file}",
    );
    while ( my $seq = $reader->next_seq ) {
        my $name = $seq->species->binomial;
        if ( $name ne 'Homo sapiens' ) {

            # search for coding sequences among the features
            my $HasCDS = 0;
            FEATURE: for my $f ( $seq->get_SeqFeatures ) {
                if ( $f->primary_tag eq 'CDS' ) {
                    $HasCDS++;
                    last FEATURE;
                }
            }

            # write the sequence to file
            if ( $HasCDS ) {
                $writer->write_seq( $seq );
            }
        }
    }

    # delete the extracted, unfiltered file
    unlink "${dir}/${file}";
}

################################################

Okay, so it runs out of memory. Can I do something to fix that? Should
I flush on either of the I/O objects after each $seq? Could there be
memory leaks in the Bio::Seq objects? Should I $seq->DESTROY them
explicitly or something like that?

Thanks,

Rutger

-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com

From David.Messina at sbc.su.se  Sat Aug  9 07:04:04 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 9 Aug 2008 13:04:04 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
Message-ID: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>

Hi Rutger,
I ran your script on the same genbank file and, while I did not run out of
memory, I did see what appears to be a memory leak. Even when I manually
undef'd the reader and writer object every 1000 records, memory usage
continued to grow.

I can't quite figure out what's going on, though.

If I run a different program using SeqIO (the simple sequence converter from
the SeqIO HOWTO) on the same input file, I don't see this same runaway
growth.

Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the
sequence converter holds steady at about 12MB of real memory, whereas on
5.10 it grows, albeit slowly, for as long as the program is executing. When
I killed it about 20% of the way through the file, it was up to  about 44MB
of real memory.

Anyone else have a chance to look at this?


Dave

From rvos at interchange.ubc.ca  Sat Aug  9 07:36:20 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Sat, 9 Aug 2008 04:36:20 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
Message-ID: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>

Hi Dave,

thanks for the reply. The memory usage is in fact much more atrocious
than just 44 mb: I'm actually looping over all 36 such archives (the
genbank primates), and on my macbook it steadily increase to over 1gb
of memory. What seemed to be helping somewhat is to call
$reader->flush and $writer->flush after each seq, at least to the
extent that I make it through that one file, but last time I tried I
didn't get much further: the whole terminal process died shortly after
instead. I seem to vaguely recall that even if perl free()'s memory,
that doesn't necessarily mean that the memory is returned to the OS
for the runtime of the program - depending on the OS and perl version.
What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.

Rutger

On Sat, Aug 9, 2008 at 4:04 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Rutger,
> I ran your script on the same genbank file and, while I did not run out of
> memory, I did see what appears to be a memory leak. Even when I manually
> undef'd the reader and writer object every 1000 records, memory usage
> continued to grow.
>
> I can't quite figure out what's going on, though.
> If I run a different program using SeqIO (the simple sequence converter from
> the SeqIO HOWTO) on the same input file, I don't see this same runaway
> growth.
> Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the
> sequence converter holds steady at about 12MB of real memory, whereas on
> 5.10 it grows, albeit slowly, for as long as the program is executing. When
> I killed it about 20% of the way through the file, it was up to  about 44MB
> of real memory.
> Anyone else have a chance to look at this?
>
> Dave
>


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com

From David.Messina at sbc.su.se  Sat Aug  9 08:58:56 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 9 Aug 2008 14:58:56 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
Message-ID: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>

>
>  I seem to vaguely recall that even if perl free()'s memory that doesn't
> necessarily mean that the memory is returned to the OS for the runtime of
> the program


I believe that's correct.


> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>

perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.


Dave

From cjfields at illinois.edu  Sat Aug  9 09:56:19 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Aug 2008 08:56:19 -0500
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
Message-ID: <57147D88-ABE6-44E0-8D76-790B0C735438@illinois.edu>

There is definitely a memory leak.  I can confirm it on OSX 10.4/10.5  
using bioperl-live.

I'll try looking into it this weekend, but I can't promise when it'll  
be fixed; my laptop is on the fritz.

chris

On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:

>>
>> I seem to vaguely recall that even if perl free()'s memory that  
>> doesn't
>> necessarily mean that the memory is returned to the OS for the  
>> runtime of
>> the program
>
>
> I believe that's correct.
>
>
>
>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>
>
> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>
>
> Dave

From cjfields at illinois.edu  Sat Aug  9 10:15:23 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Aug 2008 09:15:23 -0500
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
Message-ID: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>

Forgot to mention, maybe we can file this as a bug?  It's a pretty  
serious one but it should be easy to narrow down; the change had to be  
introduced fairly recently.

chris

On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:

>>
>> I seem to vaguely recall that even if perl free()'s memory that  
>> doesn't
>> necessarily mean that the memory is returned to the OS for the  
>> runtime of
>> the program
>
>
> I believe that's correct.
>
>
>
>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>
>
> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat Aug  9 12:00:46 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 9 Aug 2008 12:00:46 -0400
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
	<9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
Message-ID: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>

This smells of circular references somewhere. I think the first point  
I would go looking is the species storing - does the problem go away  
if you turn that off? Maybe the version of weaken() is at play here?

	-hilmar

On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:

> Forgot to mention, maybe we can file this as a bug?  It's a pretty  
> serious one but it should be easy to narrow down; the change had to  
> be introduced fairly recently.
>
> chris
>
> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>
>>>
>>> I seem to vaguely recall that even if perl free()'s memory that  
>>> doesn't
>>> necessarily mean that the memory is returned to the OS for the  
>>> runtime of
>>> the program
>>
>>
>> I believe that's correct.
>>
>>
>>
>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>
>>
>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>
>>
>> Dave
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Aug  9 12:07:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 9 Aug 2008 12:07:30 -0400
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>
References: <18792782.post@talk.nabble.com>
	<33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>
Message-ID: <591AE8EB-4D45-4859-A93E-EA9BF01CA9C6@gmx.net>

This looks like a neat trick. Do you think it's worth including as a  
SimpleAlign method (obviously w/o the printing to STDOUT)? I can  
imagine that a lot of people might appreciate it.

	-hilmar

On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:

> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>
>>
>> Hi there,
>> I'm trying to write a perl script to scan an aligned multiple entry  
>> fasta
>> file and find possible primers. So far I've produced a string which  
>> contains
>> bases which match all sequences and * where they don't match e.g.
>> 1) TTAGCCTAA
>> 2) TTAGCAGAA
>> 3) TTACCCTAA
>>
>> would give TTA*C**AA.
>>
>> I want to parse this string and pull out all sequences which are  
>> 18-21 bp in
>> length and have no more than 4 * in them.
>>
>> So far, I've got this:
>>
>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>> print "$1\n";
>> }
>>
>> hoping to match all fragments 18-21 characters in length. However  
>> even that
>> doesn't work as it has essentially chunked it into 21 char blocks,  
>> rather
>> than what I hoped for of
>> 0-18
>> 0-19
>> 0-20
>> 0-21
>> 1-19
>> 1-20
>> 1-21
>> 1-22
>>
>> etc.
>>
>> Can anyone let me know if this is already possible in BioPerl, or  
>> how one
>> would go about it with regex. Sadly I'm fairly new to perl and  
>> getting to
>> grips with BioPerl, so please treat me gently :).
>>
>> Many thanks,
>>
>> Ben
>
> There is a trick to this which is discussed more extensively in  
> 'Mastering Regular Expressions'.  Essentially you have to embed code  
> into the regex and trick the parser into backtracking using a  
> negative lookahead.  The match itself fails (i.e. no match is  
> returned), but the embedded code is executed for each match attempt,
>
> The following script is a slight modification of one I used which  
> checks the consensus string from the input alignment (in aligned  
> FASTA format here), extracts the alignment slice using that match,  
> then spit the alignment out to STDOUT in clustalw format.  This  
> should work for perl 5.8 and up, but it's only been tested on perl  
> 5.10.  You should be able to use this to fit what you want.
>
> my $in = Bio::AlignIO->new(-file => $file,
>                           -format => 'fasta');
> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>                           -format => 'clustalw');
>
> while (my $aln = $in->next_aln) {
>    my $c = $aln->consensus_string(100);
>    my @matches;
>    $c =~ m/
>        ([GTAC?]{18,21})
>        (?{my $match = check_match($1);
>           push @matches, [$match,
>                           pos(),
>                           length($match)]
>              if defined $match;})
>        (?!)
>        /xig;
>    for my $match (@matches) {
>        my ($hit, $st, $end) = ($match->[0],
>                                $match->[1] - $match->[2] + 1,
>                                $match->[1]);
>        my $newaln = $aln->slice($st, $end);
>        $out->write_aln($newaln);
>    }
> }
>
> sub check_match {
>    my $match = shift;
>    return unless $match;
>    my $ct = $match =~ tr/?/?/;
>    return $match if $ct <= 4;
> }
>
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From rvos at interchange.ubc.ca  Sat Aug  9 13:47:33 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Sat, 9 Aug 2008 10:47:33 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
	<9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
	<897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>
Message-ID: <2bb9b24a0808091047t46a6bfa8r7e11a3a1537180@mail.gmail.com>

I am sure my version of weaken() works as advertised. Is there a way
to turn off species storing from outside the code base or do you mean
I go and start commenting bits out in Bio::SeqIO::genbank (or
wherever)?

On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
> This smells of circular references somewhere. I think the first point I
> would go looking is the species storing - does the problem go away if you
> turn that off? Maybe the version of weaken() is at play here?
>
>        -hilmar
>
> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:
>
>> Forgot to mention, maybe we can file this as a bug?  It's a pretty serious
>> one but it should be easy to narrow down; the change had to be introduced
>> fairly recently.
>>
>> chris
>>
>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>>
>>>>
>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't
>>>> necessarily mean that the memory is returned to the OS for the runtime
>>>> of
>>>> the program
>>>
>>>
>>> I believe that's correct.
>>>
>>>
>>>
>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>>
>>>
>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>>
>>>
>>> Dave
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Marie-Claude Hofmann
>> College of Veterinary Medicine
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com

From hartzell at alerce.com  Sat Aug  9 14:33:51 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 9 Aug 2008 11:33:51 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
Message-ID: <18589.58127.57270.352974@almost.alerce.com>


I'm pretty sure that this fixes the problem:

g.

Index: Bio/Species.pm
===================================================================
--- Bio/Species.pm	(revision 14791)
+++ Bio/Species.pm	(working copy)
@@ -340,6 +340,7 @@
 		}
 		
 		$self->{_species} = $species;
+		weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
     }
 	return $self->{_species};
 }

From cjfields at illinois.edu  Sat Aug  9 15:08:27 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Sat,  9 Aug 2008 14:08:27 -0500 (CDT)
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
Message-ID: <20080809140827.BHN28056@expms6.cites.uiuc.edu>

I'm pretty sure it's not due to a particular version of weaken(), though it does 
sound like a circular references issue.  I have tried this with perl 5.8.6, 5.8.8, 
and 5.10 (all Mac OS, either 10.4 or 10.5); all have the same memory leak 
issues.  

You can try using SeqBuilder to get rid of the Bio::Species object.  I'll give that a 
try when I can.  Unfortunately my laptop is now with the local Apple geniuses 
awaiting a motherboard, so I can't get to it right away (I'll give it a try on my 
wife's laptop).

chris

---- Original message ----
>Date: Sat, 9 Aug 2008 10:47:33 -0700
>From: "Rutger Vos" <rvos at interchange.ubc.ca>  
>Subject: Re: [Bioperl-l] malloc errors while using Bio::SeqIO?  
>To: "Hilmar Lapp" <hlapp at gmx.net>
>Cc: Chris Fields <cjfields at illinois.edu>, bioperl list <bioperl-l at lists.open-
bio.org>
>
>I am sure my version of weaken() works as advertised. Is there a way
>to turn off species storing from outside the code base or do you mean
>I go and start commenting bits out in Bio::SeqIO::genbank (or
>wherever)?
>
>On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> This smells of circular references somewhere. I think the first point I
>> would go looking is the species storing - does the problem go away if you
>> turn that off? Maybe the version of weaken() is at play here?
>>
>>        -hilmar
>>
>> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:
>>
>>> Forgot to mention, maybe we can file this as a bug?  It's a pretty serious
>>> one but it should be easy to narrow down; the change had to be 
introduced
>>> fairly recently.
>>>
>>> chris
>>>
>>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>>>
>>>>>
>>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't
>>>>> necessarily mean that the memory is returned to the OS for the runtime
>>>>> of
>>>>> the program
>>>>
>>>>
>>>> I believe that's correct.
>>>>
>>>>
>>>>
>>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>>>
>>>>
>>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>>>
>>>>
>>>> Dave
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Marie-Claude Hofmann
>>> College of Veterinary Medicine
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> 
=================================================
==========
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> 
=================================================
==========
>>
>>
>>
>>
>
>
>
>-- 
>Dr. Rutger A. Vos
>Department of zoology
>University of British Columbia
>http://www.nexml.org
>http://rutgervos.blogspot.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Sat Aug  9 20:17:52 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 9 Aug 2008 17:17:52 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <18589.58127.57270.352974@almost.alerce.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
Message-ID: <18590.13232.892714.952555@almost.alerce.com>


George Hartzell writes:
 > 
 > I'm pretty sure that this fixes the problem:
 > 
 > g.
 > 
 > Index: Bio/Species.pm
 > ===================================================================
 > --- Bio/Species.pm	(revision 14791)
 > +++ Bio/Species.pm	(working copy)
 > @@ -340,6 +340,7 @@
 >  		}
 >  		
 >  		$self->{_species} = $species;
 > +		weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
 >      }
 >  	return $self->{_species};
 >  }

Actually, it's a bit clearer with the weaken moved up in the block so
that it's closer to where the new tree is allocated.

Chris suggested that I go ahead and I commit it.

g.


From David.Messina at sbc.su.se  Sun Aug 10 05:57:07 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 10 Aug 2008 11:57:07 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <18590.13232.892714.952555@almost.alerce.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
	<18590.13232.892714.952555@almost.alerce.com>
Message-ID: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>

Nice, George -- holds steady at about 32MB now.
Much better. :)


Dave

From hartzell at alerce.com  Sun Aug 10 12:51:39 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sun, 10 Aug 2008 09:51:39 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
	<18590.13232.892714.952555@almost.alerce.com>
	<628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>
Message-ID: <18591.7323.244987.436383@almost.alerce.com>

Dave Messina writes:
 > Nice, George -- holds steady at about 32MB now.
 > Much better. :)

Good to hear.

Bonus points go to rvos@ for providing such a nice clean bug report and
test case, it made running it down much more appealing.

g.

From valiente at lsi.upc.edu  Mon Aug 11 04:09:37 2008
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Mon, 11 Aug 2008 11:09:37 +0300
Subject: [Bioperl-l] get_lca method very slow on many nodes
In-Reply-To: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
References: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
Message-ID: <B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>

Despite the speedup for merge_lineage, the get_lca method still runs  
very slow on a large number of nodes (say, 1500 nodes) and it does  
not rely on merge_lineage. In the get_lca method, all the lineages  
are first collected in @paths in order to later find their $lca,  
while it might be faster to process each $path as soon as it is  
obtained with the get_lineage_nodes method. Any other ideas how to  
speedup the get_lca method? Thanks,

Gabriel

From lmanchon at univ-montp2.fr  Mon Aug 11 12:32:20 2008
From: lmanchon at univ-montp2.fr (Laurent Manchon)
Date: Mon, 11 Aug 2008 18:32:20 +0200
Subject: [Bioperl-l] protein pattern scan
Message-ID: <5.0.2.1.2.20080811182952.00bebff0@pop.univ-montp2.fr>

Hi,

do you know if it's possible to search protein motif in a multifasta 
protein file
using bioperl to return the motif, the position and the name of the 
corresponding sequence ?

thank you for your help.


+---------------------------------------------+
  Laurent Manchon
  Email: lmanchon at univ-montp2.fr
+---------------------------------------------+

From cjfields at illinois.edu  Mon Aug 11 13:32:05 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Mon, 11 Aug 2008 12:32:05 -0500 (CDT)
Subject: [Bioperl-l] protein pattern scan
Message-ID: <20080811123205.BHO45474@expms6.cites.uiuc.edu>

This is covered the FAQ:

http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F

chris

---- Original message ----
>Date: Mon, 11 Aug 2008 18:32:20 +0200
>From: Laurent Manchon <lmanchon at univ-montp2.fr>  
>Subject: [Bioperl-l] protein pattern scan  
>To: bioperl-l at lists.open-bio.org
>
>Hi,
>
>do you know if it's possible to search protein motif in a multifasta 
>protein file
>using bioperl to return the motif, the position and the name of the 
>corresponding sequence ?
>
>thank you for your help.
>
>
>+---------------------------------------------+
>  Laurent Manchon
>  Email: lmanchon at univ-montp2.fr
>+---------------------------------------------+
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l

From bix at sendu.me.uk  Mon Aug 11 13:44:37 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Aug 2008 18:44:37 +0100
Subject: [Bioperl-l] get_lca method very slow on many nodes
In-Reply-To: <B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>
References: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
	<B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>
Message-ID: <48A07A85.6050601@sendu.me.uk>

Gabriel Valiente wrote:
> Despite the speedup for merge_lineage, the get_lca method still runs 
> very slow on a large number of nodes (say, 1500 nodes) and it does not 
> rely on merge_lineage. In the get_lca method, all the lineages are first 
> collected in @paths in order to later find their $lca, while it might be 
> faster to process each $path as soon as it is obtained with the 
> get_lineage_nodes method.

If you try that idea out and it works, please do commit it. I've no 
further suggestions atm, but I haven't had a chance to look at it to 
remind myself what happens.

From cjfields at illinois.edu  Mon Aug 11 15:50:38 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Mon, 11 Aug 2008 14:50:38 -0500 (CDT)
Subject: [Bioperl-l] Finding possible primers regex
Message-ID: <20080811145038.BHO59267@expms6.cites.uiuc.edu>

When I can I could try generating a method which accepts a regex/Bio::Tools::SeqPattern and returns an AlignIO stream or array of SimpleAlign instances (the former could be attached to a temp file for iteration).  Any preference?

chris

---- Original message ----
>Date: Sat, 9 Aug 2008 12:07:30 -0400
>From: Hilmar Lapp <hlapp at gmx.net>  
>Subject: Re: [Bioperl-l] Finding possible primers regex  
>To: Chris Fields <cjfields at uiuc.edu>
>Cc: Benbo <btemperton at googlemail.com>, Bioperl-l at lists.open-bio.org
>
>This looks like a neat trick. Do you think it's worth including as a  
>SimpleAlign method (obviously w/o the printing to STDOUT)? I can  
>imagine that a lot of people might appreciate it.
>
>	-hilmar
>
>On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:
>
>> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>>
>>>
>>> Hi there,
>>> I'm trying to write a perl script to scan an aligned multiple entry  
>>> fasta
>>> file and find possible primers. So far I've produced a string which  
>>> contains
>>> bases which match all sequences and * where they don't match e.g.
>>> 1) TTAGCCTAA
>>> 2) TTAGCAGAA
>>> 3) TTACCCTAA
>>>
>>> would give TTA*C**AA.
>>>
>>> I want to parse this string and pull out all sequences which are  
>>> 18-21 bp in
>>> length and have no more than 4 * in them.
>>>
>>> So far, I've got this:
>>>
>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>> print "$1\n";
>>> }
>>>
>>> hoping to match all fragments 18-21 characters in length. However  
>>> even that
>>> doesn't work as it has essentially chunked it into 21 char blocks,  
>>> rather
>>> than what I hoped for of
>>> 0-18
>>> 0-19
>>> 0-20
>>> 0-21
>>> 1-19
>>> 1-20
>>> 1-21
>>> 1-22
>>>
>>> etc.
>>>
>>> Can anyone let me know if this is already possible in BioPerl, or  
>>> how one
>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>> getting to
>>> grips with BioPerl, so please treat me gently :).
>>>
>>> Many thanks,
>>>
>>> Ben
>>
>> There is a trick to this which is discussed more extensively in  
>> 'Mastering Regular Expressions'.  Essentially you have to embed code  
>> into the regex and trick the parser into backtracking using a  
>> negative lookahead.  The match itself fails (i.e. no match is  
>> returned), but the embedded code is executed for each match attempt,
>>
>> The following script is a slight modification of one I used which  
>> checks the consensus string from the input alignment (in aligned  
>> FASTA format here), extracts the alignment slice using that match,  
>> then spit the alignment out to STDOUT in clustalw format.  This  
>> should work for perl 5.8 and up, but it's only been tested on perl  
>> 5.10.  You should be able to use this to fit what you want.
>>
>> my $in = Bio::AlignIO->new(-file => $file,
>>                           -format => 'fasta');
>> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>>                           -format => 'clustalw');
>>
>> while (my $aln = $in->next_aln) {
>>    my $c = $aln->consensus_string(100);
>>    my @matches;
>>    $c =~ m/
>>        ([GTAC?]{18,21})
>>        (?{my $match = check_match($1);
>>           push @matches, [$match,
>>                           pos(),
>>                           length($match)]
>>              if defined $match;})
>>        (?!)
>>        /xig;
>>    for my $match (@matches) {
>>        my ($hit, $st, $end) = ($match->[0],
>>                                $match->[1] - $match->[2] + 1,
>>                                $match->[1]);
>>        my $newaln = $aln->slice($st, $end);
>>        $out->write_aln($newaln);
>>    }
>> }
>>
>> sub check_match {
>>    my $match = shift;
>>    return unless $match;
>>    my $ct = $match =~ tr/?/?/;
>>    return $match if $ct <= 4;
>> }
>>
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>-- 
>===========================================================
>: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>===========================================================
>
>
>

From hlapp at gmx.net  Mon Aug 11 22:35:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 11 Aug 2008 22:35:13 -0400
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <20080811145038.BHO59267@expms6.cites.uiuc.edu>
References: <20080811145038.BHO59267@expms6.cites.uiuc.edu>
Message-ID: <B79B827A-E66C-417D-A5E8-B4FE96791089@gmx.net>

Actually, now that you ask I'm wondering whether one wouldn't  
sometimes want to retain the relationship between the match and the  
resulting spliced alignment? If so, neither AlignIO nor array would  
accomplish that, right?

Other than that I myself don't have a strong preference either way. I  
suppose AlignIO stream is somewhat more extensible, since as you say  
it could be coupled to a file if the resulting set of alignments is  
really large.

	-hilmar

On Aug 11, 2008, at 3:50 PM, Christopher Fields wrote:

> When I can I could try generating a method which accepts a regex/ 
> Bio::Tools::SeqPattern and returns an AlignIO stream or array of  
> SimpleAlign instances (the former could be attached to a temp file  
> for iteration).  Any preference?
>
> chris
>
> ---- Original message ----
>> Date: Sat, 9 Aug 2008 12:07:30 -0400
>> From: Hilmar Lapp <hlapp at gmx.net>
>> Subject: Re: [Bioperl-l] Finding possible primers regex
>> To: Chris Fields <cjfields at uiuc.edu>
>> Cc: Benbo <btemperton at googlemail.com>, Bioperl-l at lists.open-bio.org
>>
>> This looks like a neat trick. Do you think it's worth including as a
>> SimpleAlign method (obviously w/o the printing to STDOUT)? I can
>> imagine that a lot of people might appreciate it.
>>
>> 	-hilmar
>>
>> On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:
>>
>>> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>>>
>>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple entry
>>>> fasta
>>>> file and find possible primers. So far I've produced a string which
>>>> contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char blocks,
>>>> rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>
>>> There is a trick to this which is discussed more extensively in
>>> 'Mastering Regular Expressions'.  Essentially you have to embed code
>>> into the regex and trick the parser into backtracking using a
>>> negative lookahead.  The match itself fails (i.e. no match is
>>> returned), but the embedded code is executed for each match attempt,
>>>
>>> The following script is a slight modification of one I used which
>>> checks the consensus string from the input alignment (in aligned
>>> FASTA format here), extracts the alignment slice using that match,
>>> then spit the alignment out to STDOUT in clustalw format.  This
>>> should work for perl 5.8 and up, but it's only been tested on perl
>>> 5.10.  You should be able to use this to fit what you want.
>>>
>>> my $in = Bio::AlignIO->new(-file => $file,
>>>                          -format => 'fasta');
>>> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>>>                          -format => 'clustalw');
>>>
>>> while (my $aln = $in->next_aln) {
>>>   my $c = $aln->consensus_string(100);
>>>   my @matches;
>>>   $c =~ m/
>>>       ([GTAC?]{18,21})
>>>       (?{my $match = check_match($1);
>>>          push @matches, [$match,
>>>                          pos(),
>>>                          length($match)]
>>>             if defined $match;})
>>>       (?!)
>>>       /xig;
>>>   for my $match (@matches) {
>>>       my ($hit, $st, $end) = ($match->[0],
>>>                               $match->[1] - $match->[2] + 1,
>>>                               $match->[1]);
>>>       my $newaln = $aln->slice($st, $end);
>>>       $out->write_aln($newaln);
>>>   }
>>> }
>>>
>>> sub check_match {
>>>   my $match = shift;
>>>   return unless $match;
>>>   my $ct = $match =~ tr/?/?/;
>>>   return $match if $ct <= 4;
>>> }
>>>
>>>
>>> chris
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mirhan at indiana.edu  Mon Aug 11 23:46:35 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Mon, 11 Aug 2008 23:46:35 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <D8FD887C-86A9-4E53-9A59-10FDE295C067@duke.edu>
Message-ID: <C4C67FDB.5282%mirhan@indiana.edu>


Hi,
Yes it is true that it's similar to get_all_Annotations, it's basically a recursive version of it.
I wanted to provide a method to get at nested annotations without going through all the if(isa collection) do recursive call.. etc. everytime, because most of the xml elements are implemented as nested annotation collections to the nodes.
( I am contemplating on using tagtrees instead of nested annotation collections in the future, but as of now, Annotation::tagtrees was documented as a temporary implementation, so I passed on that option. )
I forgot about the interface part.
At least for my purpose I would think it's a good function to have in the interface.
I agree that adding a recursive option to the get_all_Annotation would be better.

Mira


On 8/11/08 11:28 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:33 AM, Han, Mira wrote:

> Added get_deep_Annotations in Annotation::Collection
> in order to get annotations that are within nested collections.


I hope I'm not contradicting Chris here, but we will probably want to
think about this a bit more. Your implementation won't work as it is
assuming an interface function that isn't defined on the interface
(both get_deep_Annotations() and _deep_annotation_helper()). Also, it
does nearly the same as get_all_Annotations(), and passing on the keys
to nested collections should maybe simply be an option to that method.
Alternatively, one could add an option -recurse to get_Annotation.

The other difference you note is that your method does not flatten the
nested annotations, but unless I am missing something your
implementation does flatten annotations from nested collections.

So even if we need a separate method for this, something like
get_nested_Annotations() would probably be a more appropriate name,
and if we do need a separate method, it should be compelling enough to
add it to the interface too (as otherwise your code will only work
with certain implementation classes).

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From mirhan at indiana.edu  Tue Aug 12 00:00:28 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Tue, 12 Aug 2008 00:00:28 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu>
Message-ID: <C4C6831C.5287%mirhan@indiana.edu>


Oh yes,
I meant get_Annotations,
I want a get_Annotations that is recursive and passes the keys to the recursive calls.


On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is
already recursive)

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From hlapp at duke.edu  Mon Aug 11 23:54:43 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 11 Aug 2008 23:54:43 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4C67FDB.5282%mirhan@indiana.edu>
References: <C4C67FDB.5282%mirhan@indiana.edu>
Message-ID: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu>

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's  
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is  
already recursive)

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From mrphysh at juno.com  Tue Aug 12 10:30:36 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Tue, 12 Aug 2008 14:30:36 GMT
Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem
Message-ID: <20080812.083036.25924.0@webmail02.vgs.untd.com>

I am studying bioperl and making progress.  I have been struggling with the database retrieval from 
on-line databases.  This is an example................

#!/usr/bin/perl -w
 
use Bio::Perl;

 
$seq_object = get_sequence('swiss',"ROA1_HUMAN");
 
write_sequence(">roa1.fasta",'fasta',$seq_object);

exit;

This script gives

Can't locate IO/String.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at ee_bpo.pl line 12.
BEGIN failed--compilation aborted at ee_bpo.pl line 12.

I have chased around with the paths in @INC, using "use lib'.

This is an install problem.  The original installation was with perl Makefile.pl.  I reinstalled over the old with cpan.  stuff like this:
cpan> o conf prerequisites_policy follow
cpan> i /bioperl/
cpan> install Bundle::BioPerl
cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
This all seemed to proceed smoothly.

this guy did not produce an error.
use Bio::Perl;

I am almost thinking that the problem is with the perl. But regular ftp through perl works:

use Net::FTP;#I found this in usr/share/perl/5.8.8/Net
As a perl command this module seems to work.

I looked in the archives and found nothing.  I think I have done my homework.  any ideas?

I run Ubuntu on a pentium III (and love it).  the version of Ubuntu is new.  the Perl (and MySQL) came with the OS: perl 5.8.8

John Brigham in Denver.


____________________________________________________________
Click to get a free auto insurance quotes from top companies.
http://thirdpartyoffers.juno.com/TGL2141/fc/Ioyw6i3m2nsox4VCjepKpyEFCMEzNF4I2x42PAQjIIwUwo0E7h1wL0/


From jay at jays.net  Tue Aug 12 11:08:59 2008
From: jay at jays.net (Jay Hannah)
Date: Tue, 12 Aug 2008 10:08:59 -0500
Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem
In-Reply-To: <20080812.083036.25924.0@webmail02.vgs.untd.com>
References: <20080812.083036.25924.0@webmail02.vgs.untd.com>
Message-ID: <CA21F550-DF97-41EE-BE8C-CBDFE67B0148@jays.net>

On Aug 12, 2008, at 2:30 PM, mrphysh at juno.com wrote:
> Can't locate IO/String.pm in @INC
...
> cpan> install Bundle::BioPerl
> cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> This all seemed to proceed smoothly

bioperl-1.2.1 is very old. Apparently Bundle::BioPerl is out of date?

Here's lots of info about installing BioPerl:

    http://www.bioperl.org/wiki/Getting_BioPerl

I recommend using bioperl-live directly from SVN, but I'm sort of a  
rebel like that.  :)

Alternately, you could try just doing a

    cpan> install IO::String

HTH,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From heikki at sanbi.ac.za  Thu Aug 14 09:14:48 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 14 Aug 2008 15:14:48 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
Message-ID: <200808141514.49124.heikki@sanbi.ac.za>

A generic method for retrieving nodes from a Bio::Tree::TreeI objects is 
Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the 'id' 
attribute unless a field is given. I can retrieve nodes based on internal id 
like this:

   $tree->find_node(-internal_id => $internal_id);

I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that retrieves by id. 
However, the POD documentation claims that it retrieves by internal id.

What needs to be done?

A. Fix the doc to speak about id
B. Fix to code to retrieve by internal_id
C. Fix the doc and create findnode_by_internal_id()
C. Remove findnode_by_id() as redundant and confusing
D. Deprecate findnode_by_id() as redundant and confusing

There are no tests for findnode_by_id() which to me tilts selection to D and A 
for now.

Any other opinions?

	-Heikki


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From hlapp at gmx.net  Thu Aug 14 18:28:20 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 14 Aug 2008 18:28:20 -0400
Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API?
In-Reply-To: <48A448DD.4000206@psb.ugent.be>
References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com>
	<48A448DD.4000206@psb.ugent.be>
Message-ID: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>

Hi Erick,

how did you determine that go-perl is specific to GO? I've found it to  
work quite well for any kind of OBO-formatted ontology.

Also, you note that BioPerl doesn't have the ability to write in  
certain formats, and to intersect and "unify" (would you mind  
explaining what you mean by that?) ontologies. It seems that your  
implementation of RDF etc export isn't really reusable or modular in  
any way, but I'd love to bring the intersection function over to  
BioPerl (BTW when you decided to roll your own ontology API, did you  
get the impression that BioPerl isn't receptive to you adding to it?).  
Would you mind pointing me to the place in the code where I would find  
that, as I can't seem to find it.

	-hilmar

On Aug 14, 2008, at 11:01 AM, Erick Antezana wrote:

> Hi Arne,
>
> if you plan to work with PERL, you might take a look at ONTO-PERL :
>
> http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn042
> http://search.cpan.org/dist/ONTO-PERL/
> http://search.cpan.org/src/EASR/ONTO-PERL-1.13/doc/example00.html
>
> ONTO-PERL has been used intensively to build the Cell Cycle Ontology.
>
> cheers,
> Erick
>
> Arne Muller wrote:
>> Dear All,
>>
>> I'm new to this list and don't know much about ontologies in general
>> (I worked a bit with GO some time ago).
>>
>> Let me explain my problem: We have several related vocabularies
>> (non-hierarchical and redundant because of different spellings etc
>> ...) to describe organs and tissues in our department, and we need to
>> map each of these vocabs to all of our other legacy vocabs that
>> describe similar concepts. We'd like to use the adult mouse anatomy
>> ontology and modify/extend it with additional terms (if necessary),
>> synonyms and dbXrefs. Most of our vocabs should be mapped as dbXrefs
>> to existing terms in the MA ontology. The goal is that different  
>> units
>> in our department use slightly different vocabulary to describe
>> samples, and we now need link these different system (always the same
>> old story ... ;-).
>>
>> For the moment I'm not planning to turn our messy legacy vocabs into
>> OBO formated ontologies and to map them via cross products and the  
>> OBO
>> relation ontology - though this might be the most  proper way to do
>> ... (comments are welcome).
>>
>> I'll have to write an "easy to use" tool that allows our data curator
>> to easily map the legacy vocabs as dbXrefs of terms in the MA
>> ontology. The question is, how am I gonna do this? I've a fairly good
>> idea of how my software (java webapp) should look like, but are there
>> any APIs and implementations of the OBO model as well as a DB schema
>> and mappings between the model and the schema?
>>
>> I've had a look into the OLS from the EBI that seems to be fairly
>> simple (which is good ;-) and that uses the oboedit.jar somewhere at
>> the back-end. I've also found something like an obo api on
>> http://wiki.geneontology.org/index.php/OBO-Edit:_Getting_the_Source_Code#.28Optional.29_Getting_the_OBO_API_from_Subclipse
>> but so far I've not found any documentation nor examples on how to  
>> get
>> started.
>>
>> I'd be happy to hear how developers and bioinformatics people use obo
>> in their own tools (I better ask before going DIY ...).
>>
>>   thanks a lot for your comments and help
>>   +kind regards,
>>
>>   Arne
>>
>> ------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's  
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win  
>> great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in  
>> the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Obo-discuss mailing list
>> Obo-discuss at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/obo-discuss
>>
>
> -- 
> ==================================================================
> Erick Antezana                    http://www.cellcycleontology.org
> PhD student
> Tel:+32 (0)9 331 38 24                        fax:+32 (0)9 3313809
> VIB Department of Plant Systems Biology, Ghent University
> Technologiepark 927, 9052 Gent, BELGIUM
> erant at psb.ugent.be                  http://www.psb.ugent.be/~erant
> ==================================================================
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Obo-discuss mailing list
> Obo-discuss at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/obo-discuss

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mjanis at chem.ucla.edu  Thu Aug 14 19:37:05 2008
From: mjanis at chem.ucla.edu (Michael Janis)
Date: Thu, 14 Aug 2008 16:37:05 -0700
Subject: [Bioperl-l] Code to contribute
Message-ID: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>

Hi,

 
I've had some perl code lying around for what seems like forever and I'd
like to contribute it to bioperl, if such facilities don't already exist in
bioperl.  The code implements shuffling (DNA or RNA) keeping the
dinucleotide composition (and codon usage) intact through a Eularian path
approach as described in Altschul and Erickson (1985).  The code seeds the
Eularian paths by keeping the first and last nucleotide invariant in the
shuffle - which has minimal detrimental effects to the purpose of the
algorithm, in my experience.

 
A quick search on the bioperl website shows that there is a mutation.pls
script, and facilities for using Sean Eddy's SQUID C library, which
implements the same function (I wrote this particular function before I knew
how to use C).  As such, it's probably not as elegant as Sean Eddy's
implementation, but it works - and it's entirely in perl.

 
The bioperl developer pages suggest a post to the mailing list as the best
place to start contributing to bioperl.  Is this a useful function to add to
the project?

 
Best Regards,

 
Michael

 
-------------------------------

Michael Janis

 <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu

-------------------------------

 
From rvos at interchange.ubc.ca  Thu Aug 14 19:51:43 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Thu, 14 Aug 2008 16:51:43 -0700
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>
	<2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
Message-ID: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>

Sounds exciting! I bet the general advice you'll get is to i) check
out the latest code from svn ii) see which bioperl objects/interfaces
(e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
iii) write a class that performs the algorithm as some sort of
analysis factory taking the sequence object (or ideally object
interface) as an input iv) run that class by the mailing list v) check
it into svn.

On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu> wrote:
> Hi,
>
>
>
> I've had some perl code lying around for what seems like forever and I'd
> like to contribute it to bioperl, if such facilities don't already exist in
> bioperl.  The code implements shuffling (DNA or RNA) keeping the
> dinucleotide composition (and codon usage) intact through a Eularian path
> approach as described in Altschul and Erickson (1985).  The code seeds the
> Eularian paths by keeping the first and last nucleotide invariant in the
> shuffle - which has minimal detrimental effects to the purpose of the
> algorithm, in my experience.
>
>
>
> A quick search on the bioperl website shows that there is a mutation.pls
> script, and facilities for using Sean Eddy's SQUID C library, which
> implements the same function (I wrote this particular function before I knew
> how to use C).  As such, it's probably not as elegant as Sean Eddy's
> implementation, but it works - and it's entirely in perl.
>
>
>
> The bioperl developer pages suggest a post to the mailing list as the best
> place to start contributing to bioperl.  Is this a useful function to add to
> the project?
>
>
>
> Best Regards,
>
>
>
> Michael
>
>
>
> -------------------------------
>
> Michael Janis
>
>  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>
> -------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com

From mjanis at chem.ucla.edu  Thu Aug 14 19:55:04 2008
From: mjanis at chem.ucla.edu (Michael Janis)
Date: Thu, 14 Aug 2008 16:55:04 -0700
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>
	<2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
Message-ID: <008701c8fe69$2cee6020$86cb2060$@ucla.edu>

Thanks, Rutger, I'll do exactly that!  (give me a few days)


Best Regards,

Michael

-------------------------------
Michael Janis
mjanis at chem.ucla.edu
-------------------------------


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rutger Vos
Sent: Thursday, August 14, 2008 4:52 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Fwd: Code to contribute

Sounds exciting! I bet the general advice you'll get is to i) check
out the latest code from svn ii) see which bioperl objects/interfaces
(e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
iii) write a class that performs the algorithm as some sort of
analysis factory taking the sequence object (or ideally object
interface) as an input iv) run that class by the mailing list v) check
it into svn.

On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu> wrote:
> Hi,
>
>
>
> I've had some perl code lying around for what seems like forever and I'd
> like to contribute it to bioperl, if such facilities don't already exist
in
> bioperl.  The code implements shuffling (DNA or RNA) keeping the
> dinucleotide composition (and codon usage) intact through a Eularian path
> approach as described in Altschul and Erickson (1985).  The code seeds the
> Eularian paths by keeping the first and last nucleotide invariant in the
> shuffle - which has minimal detrimental effects to the purpose of the
> algorithm, in my experience.
>
>
>
> A quick search on the bioperl website shows that there is a mutation.pls
> script, and facilities for using Sean Eddy's SQUID C library, which
> implements the same function (I wrote this particular function before I
knew
> how to use C).  As such, it's probably not as elegant as Sean Eddy's
> implementation, but it works - and it's entirely in perl.
>
>
>
> The bioperl developer pages suggest a post to the mailing list as the best
> place to start contributing to bioperl.  Is this a useful function to add
to
> the project?
>
>
>
> Best Regards,
>
>
>
> Michael
>
>
>
> -------------------------------
>
> Michael Janis
>
>  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>
> -------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Thu Aug 14 21:17:23 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 15 Aug 2008 13:17:23 +1200
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>

You forgot 2 points,

vi) write documentation/examples, and  vii) write tests

;-)


Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Rutger Vos
> Sent: Friday, 15 August 2008 11:52 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Fwd: Code to contribute
> 
> Sounds exciting! I bet the general advice you'll get is to i) check
> out the latest code from svn ii) see which bioperl objects/interfaces
> (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
> iii) write a class that performs the algorithm as some sort of
> analysis factory taking the sequence object (or ideally object
> interface) as an input iv) run that class by the mailing list v) check
> it into svn.
> 
> On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu>
wrote:
> > Hi,
> >
> >
> >
> > I've had some perl code lying around for what seems like forever and
I'd
> > like to contribute it to bioperl, if such facilities don't already
exist in
> > bioperl.  The code implements shuffling (DNA or RNA) keeping the
> > dinucleotide composition (and codon usage) intact through a Eularian
path
> > approach as described in Altschul and Erickson (1985).  The code
seeds the
> > Eularian paths by keeping the first and last nucleotide invariant in
the
> > shuffle - which has minimal detrimental effects to the purpose of
the
> > algorithm, in my experience.
> >
> >
> >
> > A quick search on the bioperl website shows that there is a
mutation.pls
> > script, and facilities for using Sean Eddy's SQUID C library, which
> > implements the same function (I wrote this particular function
before I knew
> > how to use C).  As such, it's probably not as elegant as Sean Eddy's
> > implementation, but it works - and it's entirely in perl.
> >
> >
> >
> > The bioperl developer pages suggest a post to the mailing list as
the best
> > place to start contributing to bioperl.  Is this a useful function
to add to
> > the project?
> >
> >
> >
> > Best Regards,
> >
> >
> >
> > Michael
> >
> >
> >
> > -------------------------------
> >
> > Michael Janis
> >
> >  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
> >
> > -------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> --
> Dr. Rutger A. Vos
> Department of zoology
> University of British Columbia
> http://www.nexml.org
> http://rutgervos.blogspot.com
> 
> 
> 
> --
> Dr. Rutger A. Vos
> Department of zoology
> University of British Columbia
> http://www.nexml.org
> http://rutgervos.blogspot.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mirhan at indiana.edu  Fri Aug 15 02:11:46 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Fri, 15 Aug 2008 02:11:46 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4C6831C.5287%mirhan@indiana.edu>
Message-ID: <C4CA9662.52BD%mirhan@indiana.edu>


Hi,
I've fixed the get_deep_Annotations() to get_nested_Annotations()
It has arguments -keys and -recursive,
And behaves exactly like get_Annotations() when recursive is not set (tested by replacing the get_Annotations() in the Annotation.t)
I made it a new function instead of modifying get_Annotations() because I wasn't sure how to modify it to be backwards compatible.
I thought of adding the function to the interface AnnotationCollectionI,
But it seemed like get_all_Annotations() was missing from the interface as well,
So decided to ask if it should be added to the interface at all.
Isn't it possible that certain implementations of the interface has functions that are only specific to that implementation?

Mira


On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:


Oh yes,
I meant get_Annotations,
I want a get_Annotations that is recursive and passes the keys to the recursive calls.


On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is
already recursive)

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at illinois.edu  Fri Aug 15 09:59:42 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Aug 2008 08:59:42 -0500
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>
Message-ID: <BF55B9C8-B43A-404A-8085-4C8200B9B4AA@illinois.edu>

Agreed!

We're hoping to move to a more structured core after 1.6 which will  
require decent documentation and tests for inclusion.  My feeling is  
one should free to add code samples to relevant pages/sections in the  
BioPerl wiki, or write up your own HOWTO (it's not terribly hard to  
do, and it adds to your karma).

chris

On Aug 14, 2008, at 8:17 PM, Smithies, Russell wrote:

> You forgot 2 points,
>
> vi) write documentation/examples, and  vii) write tests
>
> ;-)
>
>
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Rutger Vos
>> Sent: Friday, 15 August 2008 11:52 a.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Fwd: Code to contribute
>>
>> Sounds exciting! I bet the general advice you'll get is to i) check
>> out the latest code from svn ii) see which bioperl objects/interfaces
>> (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
>> iii) write a class that performs the algorithm as some sort of
>> analysis factory taking the sequence object (or ideally object
>> interface) as an input iv) run that class by the mailing list v)  
>> check
>> it into svn.
>>
>> On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu>
> wrote:
>>> Hi,
>>>
>>>
>>>
>>> I've had some perl code lying around for what seems like forever and
> I'd
>>> like to contribute it to bioperl, if such facilities don't already
> exist in
>>> bioperl.  The code implements shuffling (DNA or RNA) keeping the
>>> dinucleotide composition (and codon usage) intact through a Eularian
> path
>>> approach as described in Altschul and Erickson (1985).  The code
> seeds the
>>> Eularian paths by keeping the first and last nucleotide invariant in
> the
>>> shuffle - which has minimal detrimental effects to the purpose of
> the
>>> algorithm, in my experience.
>>>
>>>
>>>
>>> A quick search on the bioperl website shows that there is a
> mutation.pls
>>> script, and facilities for using Sean Eddy's SQUID C library, which
>>> implements the same function (I wrote this particular function
> before I knew
>>> how to use C).  As such, it's probably not as elegant as Sean Eddy's
>>> implementation, but it works - and it's entirely in perl.
>>>
>>>
>>>
>>> The bioperl developer pages suggest a post to the mailing list as
> the best
>>> place to start contributing to bioperl.  Is this a useful function
> to add to
>>> the project?
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> Michael
>>>
>>>
>>>
>>> -------------------------------
>>>
>>> Michael Janis
>>>
>>> <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>>>
>>> -------------------------------
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> --
>> Dr. Rutger A. Vos
>> Department of zoology
>> University of British Columbia
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>>
>>
>>
>> --
>> Dr. Rutger A. Vos
>> Department of zoology
>> University of British Columbia
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at illinois.edu  Fri Aug 15 10:12:10 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Aug 2008 09:12:10 -0500
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4CA9662.52BD%mirhan@indiana.edu>
References: <C4CA9662.52BD%mirhan@indiana.edu>
Message-ID: <3371D481-1416-4976-9846-83CF83395039@illinois.edu>

The method get_all_annotation_keys() is present in  
AnnotationCollectionI but not get_all_Annotations(), though I doubt it  
is set up for recursive retrieval (something that might be worth  
testing).  I don't have a problem adding this in.  Hilmar, thoughts?

chris

On Aug 15, 2008, at 1:11 AM, Han, Mira wrote:

>
> Hi,
> I've fixed the get_deep_Annotations() to get_nested_Annotations()
> It has arguments -keys and -recursive,
> And behaves exactly like get_Annotations() when recursive is not set  
> (tested by replacing the get_Annotations() in the Annotation.t)
> I made it a new function instead of modifying get_Annotations()  
> because I wasn't sure how to modify it to be backwards compatible.
> I thought of adding the function to the interface  
> AnnotationCollectionI,
> But it seemed like get_all_Annotations() was missing from the  
> interface as well,
> So decided to ask if it should be added to the interface at all.
> Isn't it possible that certain implementations of the interface has  
> functions that are only specific to that implementation?
>
> Mira
>
>
> On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:
>
>
> Oh yes,
> I meant get_Annotations,
> I want a get_Annotations that is recursive and passes the keys to  
> the recursive calls.
>
>
>
> On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
>
> Hi Mira -
>
> On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:
>
>> Yes it is true that it's similar to get_all_Annotations, it's
>> basically a recursive version of it.
>
>
> I suppose you mean get_Annotations(), right? (get_all_Annotations() is
> already recursive)
>
>        -hilmar
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From jorvis at gmail.com  Fri Aug 15 15:45:23 2008
From: jorvis at gmail.com (Joshua Orvis)
Date: Fri, 15 Aug 2008 15:45:23 -0400
Subject: [Bioperl-l] help creating de novo GFF3
Message-ID: <a58187640808151245o48af6311hb8afee40341b7ef2@mail.gmail.com>

I don't have a lot of experience with Bioperl and have used it mostly for
simple format conversions or parsing Genbank files.  I need to create a
quick script to create GFF3 and decided to give bioperl a try again instead
of just printing the columns myself but have had a few problems.  My
apologies for the narrative here but I know it can sometimes be informative
to hear 'how' a user arrived at a problem rather than just knowing the
problem itself.

Is there a documented explicit mapping between the GFF3 columns and the
predefined tags (ID, Name, etc.) and their Bioperl object attribute
equivalents?  Is it preferrable to create Bio::SeqFeature::Generic objects
and pass them to Bio::Tools::GFF->write_feature or rather to create
Bio::SeqFeature::Annotated and pass them to Bio::FeatureIO::gff ?   I may be
overlooking it, but a simple tutorial showing how to create and define a new
sequence object, attach annotations to it and dump in GFF format seems to be
missing.  This seems like a basic thing to do - most of the documentation I
find is about converting between formats rather than creating new
annotation.

Here are some of the problems I (a typical naive user?) ran into when
adventuring with bioperl here.   My first attempt resulted in the string
"SEQ" as column 0 in all my GFF output.  I thought that maybe this was
because my features weren't 'attached' to a sequence, so I created a
Bio::Seq::RichSeq object and tried both (separately):

$seq->add_SeqFeature( $feat );

and

$feat->attach_seq( $seq );

Neither changed the first column of output.  Looking at the
docs.bioperl.orgmethods for Bio::SeqFeature::Generic I found the
seq_id attribute, which
came with the warning: "This attribute should *not* be used in GFF dumping"
- but since it's the only thing I did that worked, I used it anyway.

Next I wanted to have ID tags within my last column.  I first tried setting
all the relevant attributes I could see on my features (id, primary_tag,
display_name, display_id, etc.) but none of these caused ID=? to be
created.  Next, I tried something like this:


my $feat = new Bio::SeqFeature::Annotated (
        -start        => $start,
        -end          => $end,
        -strand       => $strand,
        -primary      => 'gene',
        -seq_id       => $asmbl_id,  ## this works but is discouraged
        -tag          => { ID => $transcript->{pub_locus},
                           product_name => $transcript->{com_name},
                           ec_number => $transcript->{'ec#'},
                           gene_symbol => $transcript->{gene_sym}
                         }
);

My hopes that passing it via the -tag option would do the trick failed, as
it created a line like this instead:

10263   .       gene    58512   56983   .       +       .       iD=AN9220.4;

Notice the 'ID' -> 'iD' transformation (without any command-line warnings).

I'm still stuck on this one (Parent would be next) but overall guidance or
pointers to a tutorial/documentation I'm overlooking would be great.

JO

From jason at bioperl.org  Fri Aug 15 19:00:04 2008
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Aug 2008 16:00:04 -0700
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <200808141514.49124.heikki@sanbi.ac.za>
References: <200808141514.49124.heikki@sanbi.ac.za>
Message-ID: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>

I think D.

There should probably only be one find_node function.

findnode_by_id was written by Ramiro for the re-rooting code and I  
guess it wasn't checked to reduce unneeded functions.  I don't have  
any problems removing/deprecating it but will need to update the code  
that depends on it to use find_node properly.

-jason
On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:

> A generic method for retrieving nodes from a Bio::Tree::TreeI  
> objects is
> Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the  
> 'id'
> attribute unless a field is given. I can retrieve nodes based on  
> internal id
> like this:
>
>   $tree->find_node(-internal_id => $internal_id);
>
> I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that  
> retrieves by id.
> However, the POD documentation claims that it retrieves by internal  
> id.
>
> What needs to be done?
>
> A. Fix the doc to speak about id
> B. Fix to code to retrieve by internal_id
> C. Fix the doc and create findnode_by_internal_id()
> C. Remove findnode_by_id() as redundant and confusing
> D. Deprecate findnode_by_id() as redundant and confusing
>
> There are no tests for findnode_by_id() which to me tilts selection  
> to D and A
> for now.
>
> Any other opinions?
>
> 	-Heikki
>
>
> -- 
> ______ _/      _/_____________________________________________________
>      _/      _/
>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>   _/  _/  _/  SANBI, South African National Bioinformatics Institute
>  _/  _/  _/  University of Western Cape, South Africa
>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason at bioperl.org


From schmidtc at udel.edu  Fri Aug 15 19:42:44 2008
From: schmidtc at udel.edu (Carl Schmidt)
Date: Fri, 15 Aug 2008 19:42:44 -0400
Subject: [Bioperl-l] lazy symbol binding
Message-ID: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu>

When I attempt  bp_load_gff.pl
I get the following error:

dyld: lazy symbol binding failed: Symbol not found: _mysql_init
   Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ 
auto/DBD/mysql/mysql.bundle
   Expected in: dynamic lookup

dyld: Symbol not found: _mysql_init
   Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ 
auto/DBD/mysql/mysql.bundle
   Expected in: dynamic lookup

Trace/BPT trap

Any suggestions?  I apologize if this is the wrong place for posting  
such a question.

Thanks
Carl

Carl J. Schmidt
Department of Animal & Food Sciences
University of Delaware
Newark, DE 19716
schmidtc at udel.edu
http://copland.udel.edu/~schmidtc


From rvos at interchange.ubc.ca  Fri Aug 15 20:11:48 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Fri, 15 Aug 2008 17:11:48 -0700
Subject: [Bioperl-l] Fwd:  lazy symbol binding
In-Reply-To: <2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com>
References: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu>
	<2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com>
Message-ID: <2bb9b24a0808151711m5eba5500k468a0effc711f3c@mail.gmail.com>

---------- Forwarded message ----------
From: Rutger Vos <rutgeraldo at gmail.com>
Date: Fri, Aug 15, 2008 at 5:11 PM
Subject: Re: [Bioperl-l] lazy symbol binding
To: Carl Schmidt <schmidtc at udel.edu>


This looks like a faulty install of the combo of DBI/DBD::mysql

Specifically, the perl driver (DBD::mysql) for the mysql database
builds a dynamic library on installation, which the DBI interface
tries to load dynamically, but fails.

Unfortunately, this is by no means bioperl related. Try a reinstall as
per the instructions at:

http://search.cpan.org/~capttofu/DBD-mysql-4.008/lib/DBD/mysql/INSTALL.pod

On Fri, Aug 15, 2008 at 4:42 PM, Carl Schmidt <schmidtc at udel.edu> wrote:
> When I attempt  bp_load_gff.pl
> I get the following error:
>
> dyld: lazy symbol binding failed: Symbol not found: _mysql_init
>  Referenced from:
> /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle
>  Expected in: dynamic lookup
>
> dyld: Symbol not found: _mysql_init
>  Referenced from:
> /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle
>  Expected in: dynamic lookup
>
> Trace/BPT trap
>
> Any suggestions?  I apologize if this is the wrong place for posting such a
> question.
>
> Thanks
> Carl
>
> Carl J. Schmidt
> Department of Animal & Food Sciences
> University of Delaware
> Newark, DE 19716
> schmidtc at udel.edu
> http://copland.udel.edu/~schmidtc
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com

From hlapp at duke.edu  Sat Aug 16 13:43:46 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Sat, 16 Aug 2008 13:43:46 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <3371D481-1416-4976-9846-83CF83395039@illinois.edu>
References: <C4CA9662.52BD%mirhan@indiana.edu>
	<3371D481-1416-4976-9846-83CF83395039@illinois.edu>
Message-ID: <D096DA37-D36A-4A3E-A6C3-98254CAC6B8B@duke.edu>


On Aug 15, 2008, at 10:12 AM, Chris Fields wrote:

> The method get_all_annotation_keys() is present in  
> AnnotationCollectionI but not get_all_Annotations()

Interesting. I wonder whether that was a result of the forward and  
reverse surgeries done to the Annotation* interfaces. (I'm off-line as  
I write this so can't check.)

> , though I doubt it is set up for recursive retrieval (something  
> that might be worth testing). I don't have a problem adding this  
> in.  Hilmar, thoughts?

get_all_Annotations() has always been recursive (similarly as  
get_all_SeqFeatures() is for SeqI). However, the recursive behavior is  
different from the behavior that Mira wants.

Specifically, if get_all_Annotations() finds a nested collection under  
a matching tag, it will consider the entire nested collection as  
match, and returns the recursively flattened out annotation objects it  
contains. What Mira needs (if I am understanding her implementation  
correctly) is recursively retrieving annotations if their tag matches  
the query key or set of keys. I.e., a nested collection would be  
searched for matching tags even if the tag of the collection itself  
does not match, and if it does match, only those of the contained  
annotations would be returned that have matching tags.

I'm not sure whether it's better to fold both behaviors into one  
method which has an optional argument to control which one is desired,  
or to have two methods. I'm leaning towards having two methods,  
because support for an added optional argument in external  
implementations of the interface is hard to test for, as opposed to  
testing for the presence/absence of a new method. On the other hand,  
if the existing method wasn't even on the interface to begin with  
(which I'm not yet convinced about) then that shouldn't really be an  
issue.

It seems Chris you are also for adding a second method (and putting  
get_all_Annotations() (back) into the interface)? Does anyone else  
have thoughts or preferences on this?

	-hilmar

>
>
> chris
>
> On Aug 15, 2008, at 1:11 AM, Han, Mira wrote:
>
>>
>> Hi,
>> I've fixed the get_deep_Annotations() to get_nested_Annotations()
>> It has arguments -keys and -recursive,
>> And behaves exactly like get_Annotations() when recursive is not  
>> set (tested by replacing the get_Annotations() in the Annotation.t)
>> I made it a new function instead of modifying get_Annotations()  
>> because I wasn't sure how to modify it to be backwards compatible.
>> I thought of adding the function to the interface  
>> AnnotationCollectionI,
>> But it seemed like get_all_Annotations() was missing from the  
>> interface as well,
>> So decided to ask if it should be added to the interface at all.
>> Isn't it possible that certain implementations of the interface has  
>> functions that are only specific to that implementation?
>>
>> Mira
>>
>>
>> On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:
>>
>>
>> Oh yes,
>> I meant get_Annotations,
>> I want a get_Annotations that is recursive and passes the keys to  
>> the recursive calls.
>>
>>
>>
>> On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
>>
>> Hi Mira -
>>
>> On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:
>>
>>> Yes it is true that it's similar to get_all_Annotations, it's
>>> basically a recursive version of it.
>>
>>
>> I suppose you mean get_Annotations(), right? (get_all_Annotations()  
>> is
>> already recursive)
>>
>>       -hilmar
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From heikki at sanbi.ac.za  Sun Aug 17 03:02:31 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 17 Aug 2008 09:02:31 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
References: <200808141514.49124.heikki@sanbi.ac.za>
	<A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
Message-ID: <200808170902.32485.heikki@sanbi.ac.za>


Done.

findnode_by_id() was not and is not used anywhere in BioPerl core code.

Thanks for input,

   -Heikki

On Saturday 16 August 2008 01:00:04 Jason Stajich wrote:
> I think D.
>
> There should probably only be one find_node function.
>
> findnode_by_id was written by Ramiro for the re-rooting code and I
> guess it wasn't checked to reduce unneeded functions.  I don't have
> any problems removing/deprecating it but will need to update the code
> that depends on it to use find_node properly.
>
> -jason
>
> On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:
> > A generic method for retrieving nodes from a Bio::Tree::TreeI
> > objects is
> > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the
> > 'id'
> > attribute unless a field is given. I can retrieve nodes based on
> > internal id
> > like this:
> >
> >   $tree->find_node(-internal_id => $internal_id);
> >
> > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that
> > retrieves by id.
> > However, the POD documentation claims that it retrieves by internal
> > id.
> >
> > What needs to be done?
> >
> > A. Fix the doc to speak about id
> > B. Fix to code to retrieve by internal_id
> > C. Fix the doc and create findnode_by_internal_id()
> > C. Remove findnode_by_id() as redundant and confusing
> > D. Deprecate findnode_by_id() as redundant and confusing
> >
> > There are no tests for findnode_by_id() which to me tilts selection
> > to D and A
> > for now.
> >
> > Any other opinions?
> >
> > 	-Heikki
> >
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From heikki at sanbi.ac.za  Sun Aug 17 03:02:31 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 17 Aug 2008 09:02:31 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
References: <200808141514.49124.heikki@sanbi.ac.za>
	<A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
Message-ID: <200808170902.32485.heikki@sanbi.ac.za>


Done.

findnode_by_id() was not and is not used anywhere in BioPerl core code.

Thanks for input,

   -Heikki

On Saturday 16 August 2008 01:00:04 Jason Stajich wrote:
> I think D.
>
> There should probably only be one find_node function.
>
> findnode_by_id was written by Ramiro for the re-rooting code and I
> guess it wasn't checked to reduce unneeded functions.  I don't have
> any problems removing/deprecating it but will need to update the code
> that depends on it to use find_node properly.
>
> -jason
>
> On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:
> > A generic method for retrieving nodes from a Bio::Tree::TreeI
> > objects is
> > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the
> > 'id'
> > attribute unless a field is given. I can retrieve nodes based on
> > internal id
> > like this:
> >
> >   $tree->find_node(-internal_id => $internal_id);
> >
> > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that
> > retrieves by id.
> > However, the POD documentation claims that it retrieves by internal
> > id.
> >
> > What needs to be done?
> >
> > A. Fix the doc to speak about id
> > B. Fix to code to retrieve by internal_id
> > C. Fix the doc and create findnode_by_internal_id()
> > C. Remove findnode_by_id() as redundant and confusing
> > D. Deprecate findnode_by_id() as redundant and confusing
> >
> > There are no tests for findnode_by_id() which to me tilts selection
> > to D and A
> > for now.
> >
> > Any other opinions?
> >
> > 	-Heikki
> >
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________

From mike.thon at gmail.com  Mon Aug 18 01:01:43 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Mon, 18 Aug 2008 07:01:43 +0200
Subject: [Bioperl-l] Build.PL options?
Message-ID: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>

Hi all - I am trying to write a port of bioperl 1.5.2 to enable its  
installation with the macports system (www.macports.org).  I'm not too  
familiar with the Build.PL build system - is there any way to disable  
the dependency resolution that the build script does (i.e. without  
patching the script)?
Thanks
Mike


From David.Messina at sbc.su.se  Mon Aug 18 03:37:53 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 18 Aug 2008 09:37:53 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
Message-ID: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>

Hi Mike,

Great to hear you're planning add BioPerl to MacPorts. Thanks!

I'm afraid I don't know how to disable the dependency resolution, but while
you're waiting for others to chime in , here are a couple of things you
might try if you haven't already:

- The Build.PL for BioPerl is configured to automatically write out a
Makefile.PL for you. If you're more familiar with MakeFile.PL, then you
could work with that.

Module::Build tries to maintain a certain level of cross-compatibility with
ExtUtils::MakeMaker; how do you disable dependency resolution with the
Makefile.PL system?


- The Build.PL API is pretty flexible, and it's easy to change just about
any behavior by passing parameters to Build.PL on the command line. See


http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/API.pod

and


http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/Cookbook.pm


I'm curious -- it seems to me that a major purpose of Build.PL and
Makefile.PL is specifying dependencies (and installing them where
necessary). Does MacPorts override that and do its own dependency-checking
for Perl modules?


Dave

From neetisomaiya at gmail.com  Mon Aug 18 07:45:38 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 18 Aug 2008 17:15:38 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
Message-ID: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>

Hi,

I am fetching data from the ent gene file of KEGG which is available here :
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent

I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
gene names and pathways in which they participate. I am getting the gene
names fine. But this method

"for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
}"

does'nt seem to be working. I am not able to get the data of the pathways in
which the gene is involved.

Can someone please suggest how I can get the pathway data of genes from the
KEGG ent file??

Thanks.

-- 
-Neeti
Even my blood says, B positive

From neetisomaiya at gmail.com  Mon Aug 18 08:13:08 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 18 Aug 2008 17:43:08 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
Message-ID: <764978cf0808180513k75388671if72ca7913f6e7fc2@mail.gmail.com>

Hi,

I am fetching data from the ent gene file of KEGG which is available here :
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent

I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
gene names and pathways in which they participate. I am getting the gene
names fine. But this method

"for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
}"

does'nt seem to be working. I am not able to get the data of the pathways in
which the gene is involved.

Can someone please suggest how I can get the pathway data of genes from the
KEGG ent file??

Thanks.

-- 
-Neeti
Even my blood says, B positive

From johnsonm at gmail.com  Mon Aug 18 09:26:26 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 08:26:26 -0500
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
Message-ID: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>

On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya <neetisomaiya at gmail.com> wrote:

> I am fetching data from the ent gene file of KEGG which is available here :
> ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent
>
> I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
> gene names and pathways in which they participate. I am getting the gene
> names fine. But this method
>
> "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
> }"
>
> does'nt seem to be working. I am not able to get the data of the pathways in
> which the gene is involved.
>
> Can someone please suggest how I can get the pathway data of genes from the
> KEGG ent file??

What exactly do you mean by "doesn't seem to be working" and what
version of BioPerl are you using?  The code below seems to function as
expected with BioPerl 1.5.2, producing output like this:

hsa04612  Antigen processing and presentation
hsa01430  Cell Communication
hsa04020  Calcium signaling pathway
hsa04080  Neuroactive ligand-receptor interaction
hsa04540  Gap junction
...
...
...

#!/wherever/bin/perl

use strict;
use warnings;

use Bio::SeqIO;


my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]);

while (my $seq = $seqio->next_seq()) {

    foreach my $pathway ($seq->annotation->get_Annotations('pathway')) {

        ## $pathway should be a Bio::Annotation::Comment
        print $pathway->text(), "\n";

    }

}

From johnsonm at gmail.com  Mon Aug 18 09:29:38 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 08:29:38 -0500
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
	<ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
Message-ID: <ebf5eb170808180629j7813a2d7w2a9b238bd455b962@mail.gmail.com>

On Mon, Aug 18, 2008 at 8:26 AM, Mark Johnson <johnsonm at gmail.com> wrote:

> What exactly do you mean by "doesn't seem to be working" and what
> version of BioPerl are you using?  The code below seems to function as
> expected with BioPerl 1.5.2, producing output like this:

Note that I downloaded
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent and
provided the path to it as the argument to the script.

From mike.thon at gmail.com  Mon Aug 18 15:48:02 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Mon, 18 Aug 2008 21:48:02 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
	<628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>
Message-ID: <5BE575B1-67E1-41B1-AB2F-728558377DDA@gmail.com>


On Aug 18, 2008, at 9:37 AM, Dave Messina wrote:

>
> I'm curious -- it seems to me that a major purpose of Build.PL and  
> Makefile.PL is specifying dependencies (and installing them where  
> necessary). Does MacPorts override that and do its own dependency- 
> checking for Perl modules?
>
Hi Dave - Thanks for the links- I will check them out.  MacPorts can  
resolve dependencies that are specified for each package, much like  
rpm and other linux packaging systems, so its probably better to  
disable the dependency resolution in the bioperl build script and let  
macports handle them.  It looks like I can patch the build.PL script  
pretty easily, unless I can find a better way.

Mike

From johnsonm at gmail.com  Mon Aug 18 16:53:48 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 15:53:48 -0500
Subject: [Bioperl-l] Bio::Annotation issues with BioSQL
Message-ID: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>

    I'm presently refactoring an in-house protein annotation pipeline
and converting it to use BioSQL as a data store.  I've noticed some
slightly screwy behavior with regard to how some of the
Bio::Annotation classes are handled:

-Instances of Bio::Annotation::SimpleValue and
Bio::Annotation::StructuredValue attached to the annotation collection
for a sequence feature (Bio::SeqFeature::Generic) are converted to
tags/values on the feature.
-Instances of Bio::AnnotationDBLink with attached comments loose the comment.

    I'm storing and retrieving things thusly:

my $dbadp = Bio::DB::BioDB->new(
                                                   -database => 'biosql',
                                                   -user        => $user',
                                                   -pass       => $pass,
                                                   -dbname  => $ora_instance,
                                                   -driver      => 'Oracle'
                            );

my $adp = $dbadp->get_object_adaptor("Bio::SeqI");

my $seq = Bio::Seq->new(
                        -id                         => 'DEBUG001',
                        -accession_number => 'DBG001',
                        -desc                     => 'Debug Sequence',
                        -seq                      => 'GATTACA',
                        -namespace           => 'DEBUG',
                       );

my $feature = Bio::SeqFeature::Generic->new(
                                            -seq_id           => 'DEBUG001',
                                            -display_name => 'FEAT0001',
                                            -primary          => 'debug',
                                            -source           => 'test',
                                            -start              => 3,
                                            -end               => 5,
                                            -strand           => 1,
                                           );

my $dblink = Bio::Annotation::DBLink->new(

-database   => 'FAKE001',

-primary_id => 'FK1234567890'',

-comment    => 'This is a fake comment',
                                                              );

$feature->annotation->add_Annotation('ANNO0001, $dblink);
$seq->add_SeqFeature($feature);

my $pseq = $dbadp->create_persistent($seq);

$pseq->store();
$adp->commit();


my $dbadp = Bio::DB::BioDB->new(
                                                    ...
                                                   );

my $adp = $dbadp->get_object_adaptor("Bio::SeqI");

my $query = Bio::DB::Query::BioQuery->new();

$query->datacollections([
                         "Bio::PrimarySeqI s",
                        ]);

$query->where(["s.display_id like DEBUG%'"]);

my $result = $adp->find_by_query($query);

while (my $seq = $result->next_object()) {

    my @features = $seq->get_SeqFeatures();

    foreach my $feature (@features) {

        ## Contents of Bio::Annotation::SimpleValue and
Bio::Annotation::StructeredValue have
        ## migrated to tag/value pairs on $feature and are missing
from $annotation_collection.
        ##
        ## Comments have gone missing from Bio::Annotation::DBLink,
but DBLinks are otherwise intact and present.
        my $annotation_collection = $feature->annotation();
        ...
        ...

    }

}

    Is bioperl-db / BioSQL trying to tell me that I shouldn't be using
Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue?  Is
there even a place in the BioSQL schema for a comment to be attached
to a DBLink?

From neetisomaiya at gmail.com  Tue Aug 19 00:31:28 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 19 Aug 2008 10:01:28 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
	<ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
Message-ID: <764978cf0808182131p620a2dedu40b651be50be5b3c@mail.gmail.com>

Thanks a lot for the reply. It was a problem of the bioperl version.

~Neeti.

On Mon, Aug 18, 2008 at 6:56 PM, Mark Johnson <johnsonm at gmail.com> wrote:

> On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya <neetisomaiya at gmail.com>
> wrote:
>
> > I am fetching data from the ent gene file of KEGG which is available here
> :
> > ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent
> >
> > I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
> > gene names and pathways in which they participate. I am getting the gene
> > names fine. But this method
> >
> > "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
> > }"
> >
> > does'nt seem to be working. I am not able to get the data of the pathways
> in
> > which the gene is involved.
> >
> > Can someone please suggest how I can get the pathway data of genes from
> the
> > KEGG ent file??
>
> What exactly do you mean by "doesn't seem to be working" and what
> version of BioPerl are you using?  The code below seems to function as
> expected with BioPerl 1.5.2, producing output like this:
>
> hsa04612  Antigen processing and presentation
> hsa01430  Cell Communication
> hsa04020  Calcium signaling pathway
> hsa04080  Neuroactive ligand-receptor interaction
> hsa04540  Gap junction
> ...
> ...
> ...
>
> #!/wherever/bin/perl
>
> use strict;
> use warnings;
>
> use Bio::SeqIO;
>
>
> my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]);
>
> while (my $seq = $seqio->next_seq()) {
>
>    foreach my $pathway ($seq->annotation->get_Annotations('pathway')) {
>
>        ## $pathway should be a Bio::Annotation::Comment
>        print $pathway->text(), "\n";
>
>    }
>
> }
>


-- 
-Neeti
Even my blood says, B positive

From wgallin at ualberta.ca  Tue Aug 19 02:25:27 2008
From: wgallin at ualberta.ca (Warren Gallin)
Date: Tue, 19 Aug 2008 00:25:27 -0600
Subject: [Bioperl-l] EUtilities help
Message-ID: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>

Hi,

	Is there a cogent document on using Bio::DB::EUtilities with Bioperl  
1.5.2 around somewhere?

	We upgraded and now my scripts are broken when invoking it.

	Any pointers appreciated.

Thanks,

Warren Gallin

From David.Messina at sbc.su.se  Tue Aug 19 03:30:26 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 19 Aug 2008 09:30:26 +0200
Subject: [Bioperl-l] EUtilities help
In-Reply-To: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
References: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
Message-ID: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>

Hi Warren,

Are you upgrading to 1.5.2 or downgrading from bioperl-live?

If the former, you might consider going all the way to bioperl-live, whose
EUtilities support is improved significantly and documented extensively
here:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

If the latter, I don't believe there is, but Chris Fields will know for sure
and will probably chime in.


Dave

To get bioperl-live
as a tarball: http://www.bioperl.org/DIST/nightly_builds/
or via Subversion: http://www.bioperl.org/wiki/Using_Subversion

From bix at sendu.me.uk  Tue Aug 19 04:34:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Aug 2008 09:34:50 +0100
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
Message-ID: <48AA85AA.7010405@sendu.me.uk>

Michael Thon wrote:
> Hi all - I am trying to write a port of bioperl 1.5.2 to enable its 
> installation with the macports system (www.macports.org).  I'm not too 
> familiar with the Build.PL build system - is there any way to disable 
> the dependency resolution that the build script does (i.e. without 
> patching the script)?

How else will you be doing the dependency resolution? If your system 
just installs all dependencies itself beforehand, then dependency 
resolution won't be invoked in Build.PL :)

Otherwise, don't run Build.PL at all (why do you need to?), or have your 
system answer default to all questions:

echo | perl Build.PL


From mike.thon at gmail.com  Tue Aug 19 11:16:03 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Tue, 19 Aug 2008 17:16:03 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <48AA85AA.7010405@sendu.me.uk>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
	<48AA85AA.7010405@sendu.me.uk>
Message-ID: <E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>

> How else will you be doing the dependency resolution? If your system  
> just installs all dependencies itself beforehand, then dependency  
> resolution won't be invoked in Build.PL :)
> Otherwise, don't run Build.PL at all (why do you need to?),
Good point.  Now, why didn't I think of that? :)
Does Build.PL do anything other than copy Perl modules (and install  
dependencies)?
> or have your system answer default to all questions:
> echo | perl Build.PL
>


From downloadondemand at gmail.com  Thu Aug 14 19:06:04 2008
From: downloadondemand at gmail.com (N)
Date: Fri, 15 Aug 2008 02:06:04 +0300
Subject: [Bioperl-l] HOWTO:Graphics/BLAST output
Message-ID: <923c9ce30808141606k61d9cc23nb18e55dec5112ac4@mail.gmail.com>

Hello again!
Followed HOWTO:Graphics and encountered problem.
While parsing blast output i have clearly offending for me part of file. The
problem is that in this hit there are two HSPs, but the second one is in
"not right" strand orientation (Plus/Plus vs. Plus/Minus). How can i tell
bioperl to use only HSPs oriented along with the best-scored HSP? Or better,
althrough not related to this mailing list: How can i tell BLAST to put this
second HSP to a separate hit? What am i doing/thinking wrong?
Attached small png with problem. The second HSP is in white, but it is
biologically without sense, isn't it?
Thanks.


BLASTN 2.2.18 [Mar-02-2008]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|145226209
         (471 letters)

Database: ../database/1000
           24 sequences; 19,482 total letters

Searching..................................................done


                                                                 Score    E
Sequences producing significant alignments:                      (bits)
Value

gi|145226176 hypothetical_protein                                     599
e-174
gi|145226174 hypothetical_protein                                      30
0.007
gi|145226175 ATP-dependent_exoDNAse_(exonuclease_V)_alpha_subuni...    26
0.11
gi|145226190 hypothetical_protein                                      24
0.43
gi|145226195 protein_of_unknown_function_DUF1526                       22
1.7
gi|145226193 hypothetical_protein                                      22
1.7
gi|145226187 aminoglycoside_nucleotidyltransferase                     22
1.7
gi|145226181 diguanylate_cyclase                                       22
1.7
gi|145226179 hypothetical_protein                                      22
1.7
gi|145226178 hypothetical_protein                                      22
1.7
gi|145226177 putative_methyl-accepting_chemotaxis_sensory_transd...    22
1.7

>gi|145226176 hypothetical_protein
          Length = 477

 Score =  599 bits (302), Expect = e-174
 Identities = 428/470 (91%)
 Strand = Plus / Plus


Query: 1   atgaatgcagacgtgtcagtcaaccagtggaatccgttagaggtagccgctgaggcgaca 60
           |||||||| ||||||||||||||||||||||||||||||||||| |||||||||||||||
Sbjct: 1   atgaatgcggacgtgtcagtcaaccagtggaatccgttagaggtggccgctgaggcgaca 60


Query: 61  atcgctgccgccacagccgcgctggtgtgggaaggcccagacagctacggggtgctggaa 120
           || ||||||||||| |||||  |||||||||||||||||||||||||||||||| ||||
Sbjct: 61  attgctgccgccacggccgcattggtgtgggaaggcccagacagctacggggtgttggag 120


Query: 121 cgggtcgccggggccacagcgaaaggcatagcaacagctcggatagccgccgaaatcatg 180
           ||||| ||||||||||||||||||||||| ||||||||||||||| ||||||||||||||
Sbjct: 121 cgggtagccggggccacagcgaaaggcatggcaacagctcggataaccgccgaaatcatg 180


Query: 181 gctgacgtcaccacctcagttcagttcactgcggccaccgaacatgcgcgcggcggcgct 240
           ||||||||||||||||||||||||||||||||||||  |||| |||||||||||||||||
Sbjct: 181 gctgacgtcaccacctcagttcagttcactgcggccgacgaagatgcgcgcggcggcgct 240


Query: 241 gtagcgggacttccggggtggctggcgccgcggtgggcggcgtccgtgcgtgccgcactg 300
           |||||||| ||||||||||||||||||||||||||||||||||||||||| | |||||||
Sbjct: 241 gtagcggggcttccggggtggctggcgccgcggtgggcggcgtccgtgcgcggcgcactg 300


Query: 301 gacgaactcgaagccgccgggcggcccggctacgccatggtcaaggcgatcacctggcct 360
           ||||| ||||||||||||||||| |||||| ||  |||| | || ||    ||  ||||
Sbjct: 301 gacgagctcgaagccgccgggcgccccggcgacatcatgatgaaagcccggacacggccg 360


Query: 361 gccttgcgcagcgtcgcggggtggacccaagacgggccgctgcaaacatggcagacggct 420
           ||  |||||||||| |||| ||||||||||||||| |||||||| |||||||||||||||
Sbjct: 361 gcactgcgcagcgtggcggtgtggacccaagacggaccgctgcagacatggcagacggct 420


Query: 421 ctaattgtgagcgaagcacggactgctctggctcaccgcgtaggcgtctg 470
           || || |||  |||||||||||||||||||||||||||||||||||||||
Sbjct: 421 ctgatcgtggacgaagcacggactgctctggctcaccgcgtaggcgtctg 470


 Score = 22.3 bits (11), Expect = 1.7
 Identities = 11/11 (100%)
 Strand = Plus / Minus


Query: 124 gtcgccggggc 134
           |||||||||||
Sbjct: 333 gtcgccggggc 323


The rest of output truncated...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tst.png
Type: image/png
Size: 1668 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080815/24a3f555/attachment.png>

From UKaraoz at lbl.gov  Thu Aug 14 20:03:51 2008
From: UKaraoz at lbl.gov (Ulas Karaoz)
Date: Thu, 14 Aug 2008 17:03:51 -0700
Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for
	blasttable
Message-ID: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>

Hi,

I found out that the save_output routine in RemoteBlast.pm doesn't  
save the output properly into a file when the Blast output is a  
hittable(blasttable). Might the reason be the fact that the tabular  
output has a line that starts with a # while the parser is looking for  
a line starting with just BLASTN, as in the section pasted below:

if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i ||
			 $l =~/^RPS-BLAST\s*.+$/i ) {
			$seentop=1;
}

From erant at psb.ugent.be  Fri Aug 15 08:25:59 2008
From: erant at psb.ugent.be (Erick Antezana)
Date: Fri, 15 Aug 2008 14:25:59 +0200
Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API?
In-Reply-To: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>
References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com>
	<48A448DD.4000206@psb.ugent.be>
	<1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>
Message-ID: <48A575D7.7010709@psb.ugent.be>

Hi Hilmar,

Hilmar Lapp wrote:
> Hi Erick,
>
> how did you determine that go-perl is specific to GO? I've found it to 
> work quite well for any kind of OBO-formatted ontology.
we have used go-perl during the gestation (2005) of the ontologies we 
built/handled/etc. In particular while dealing with GO (as it was 
originally conceived for that purpose). It is extremely useful indeed. 
However; as new extensions were needed by our team, new modules were 
developed on top of the existing ones and that code was actually 
inducing a new development branch which ended up in something called 
onto-perl. Onto-perl, as you might have realized, is strongly influenced 
by go-perl. On the other hand,  we had had some communications with 
Chris Mungall at that time, and he manifested he might drop further 
development on go-perl. Nevertheless, some time later he told me he will 
continue, which is good since many systems are based on it. But, by then 
we already had a sort of independent package which has shown to be useful.
>
> Also, you note that BioPerl doesn't have the ability to write in 
> certain formats, and to intersect and "unify" (would you mind 
> explaining what you mean by that?) ontologies.
While working with several OBO ontologies, we needed to have them (or 
part of them) merged, intersected, join (=get one ontology=unify). It 
can be of course a bit subjective..since you can "unify" ontologies 
based on different features/approaches/etc and while building 
application ontologies (such as CCO) you might be confronted to identify 
identical terms coming from different ontologies and get only one in 
your integrated resource....
> It seems that your implementation of RDF etc export isn't really 
> reusable or modular in any way, 
The exports (RDF, OWL, ...) are part of the Ontology module's 
functionality. have you had any particular problems while exporting an 
ontology? Please let us know so that it can be fixed or improved. On the 
other hand, that module is nowadays undergoing a deep improvement (not 
released yet) to accommodate a huge set of "ontologiz-ed" resources into 
an RDF repository. I would be also interested in discussing about it so 
that we could improve it.
> but I'd love to bring the intersection function over to BioPerl 
excellent! I think this is an appropriate time to make the diverse 
developments converge so that the users community could have a standard 
set of tools. We are  interested in following up these discussions.
> (BTW when you decided to roll your own ontology API, did you get the 
> impression that BioPerl isn't receptive to you adding to it?).
as I mentioned, we took originally go-perl since it offered much more 
functionalities that BioPerl::Ontology::*
> Would you mind pointing me to the place in the code where I would find 
> that, as I can't seem to find it.
I forwarded your request to the developer who might give you more 
details about it.
>
>     -hilmar
>


cheers,
Erick

From bix at sendu.me.uk  Tue Aug 19 11:56:32 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Aug 2008 16:56:32 +0100
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>	<48AA85AA.7010405@sendu.me.uk>
	<E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>
Message-ID: <48AAED30.1090103@sendu.me.uk>

Michael Thon wrote:
>> How else will you be doing the dependency resolution? If your system 
>> just installs all dependencies itself beforehand, then dependency 
>> resolution won't be invoked in Build.PL :)
>> Otherwise, don't run Build.PL at all (why do you need to?),
>
> Good point.  Now, why didn't I think of that? :)
> Does Build.PL do anything other than copy Perl modules (and install 
> dependencies)?

It generates the 'Build' script, which provides a whole host of functions:

./Build help

The only one other than 'install' that might be relevant to you is

./Build test

to run the test suite, if testing is part of you own system. There are 
other standard ways to run the tests though, that don't need you to 
create the Build script.

Somewhere along the lines, probably during the install function, it also 
creates man files and other forms of documentation iirc, and installs 
those too.

If you handle dependencies, installation and testing (or don't do 
testing) yourself, and don't care about man files (perldoc is good 
enough?) you can ignore Build.PL.

From xxq.t.xu at gmail.com  Tue Aug 19 12:57:41 2008
From: xxq.t.xu at gmail.com (XQ Xu)
Date: Tue, 19 Aug 2008 09:57:41 -0700
Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined primer
Message-ID: <3fde82050808190957y271aa52eh30e39a438cc8a8e3@mail.gmail.com>

Hi all,
I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use
Primer3 to calculate Tm for some pre-defined primers; however there is no
direct way to calculate Tm with Primer3. I have to call Primer3 and supply a
pre-defined primer, a template, etc to let it run and hopefully Primer3
finds a pair of primers for me, then I have to open the output and find out
what the Tm is for my pre-defined primer. Do I miss any function that can do
this quickly for me?
I know there's another module (Bio::SeqFeature::Primer) can do this
quickly, but the Tm is calculated with different parameters; therefore it's
not good to use it while I use Primer3 to design primers.
Any input?
Thanks!
-Tony

From cjfields at illinois.edu  Tue Aug 19 13:39:03 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 19 Aug 2008 12:39:03 -0500
Subject: [Bioperl-l] EUtilities help
In-Reply-To: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>
References: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
	<628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>
Message-ID: <E67F3EE3-7D3B-497A-926F-101A06D34956@illinois.edu>

As mentioned in previous posts and in the original POD, the original  
Bio::DB::EUtilities was experimental (unstable API).  It is deprecated  
in favor of the bioperl-live interface, which splits the user agent  
and parameter handling (Bio::DB::EUtilities, EUtilParameters) from the  
parsers (Bio::Tools::EUtilities).  Unfortunately, the original design  
was too rushed so any XML changes broke the tools; I basically had to  
start from the ground up again.  Any changes to eutil output should  
now be easier to deal with (famous last words).

I am still planning on adding a few things to it (including tests for  
the parser and user agent) but it shouldn't change substantially from  
what is in the cookbook.

chris

On Aug 19, 2008, at 2:30 AM, Dave Messina wrote:

> Hi Warren,
>
> Are you upgrading to 1.5.2 or downgrading from bioperl-live?
>
> If the former, you might consider going all the way to bioperl-live,  
> whose
> EUtilities support is improved significantly and documented  
> extensively
> here:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook
>
> If the latter, I don't believe there is, but Chris Fields will know  
> for sure
> and will probably chime in.
>
>
> Dave
>
> To get bioperl-live
> as a tarball: http://www.bioperl.org/DIST/nightly_builds/
> or via Subversion: http://www.bioperl.org/wiki/Using_Subversion
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at illinois.edu  Tue Aug 19 14:00:42 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 19 Aug 2008 13:00:42 -0500
Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for
	blasttable
In-Reply-To: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>
References: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>
Message-ID: <4B8B416E-2DA7-4B1E-9DE2-E1301B467637@illinois.edu>

Saving tabular BLAST to a file is working for me using bioperl-live.   
NCBI recently changed tabular BLAST output which broke parsing (there  
is an extra column now, can't remember what), but it is now fixed.

chris

On Aug 14, 2008, at 7:03 PM, Ulas Karaoz wrote:

> Hi,
>
> I found out that the save_output routine in RemoteBlast.pm doesn't  
> save the output properly into a file when the Blast output is a  
> hittable(blasttable). Might the reason be the fact that the tabular  
> output has a line that starts with a # while the parser is looking  
> for a line starting with just BLASTN, as in the section pasted below:
>
> if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i ||
> 			 $l =~/^RPS-BLAST\s*.+$/i ) {
> 			$seentop=1;
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Tue Aug 19 13:56:42 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 19 Aug 2008 13:56:42 -0400
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
Message-ID: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>


On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote:

>    I'm presently refactoring an in-house protein annotation pipeline
> and converting it to use BioSQL as a data store.  I've noticed some
> slightly screwy behavior with regard to how some of the
> Bio::Annotation classes are handled:
>
> -Instances of Bio::Annotation::SimpleValue and
> Bio::Annotation::StructuredValue attached to the annotation collection
> for a sequence feature (Bio::SeqFeature::Generic) are converted to
> tags/values on the feature.
>
> -Instances of Bio::Annotation::DBLink with attached comments loose  
> the comment.
> [...]
> $query->where(["s.display_id like DEBUG%'"]);

There's a single quote missing here, but I'm assuming that's a result  
of copy/paste editing?

> [...]
>    Is bioperl-db / BioSQL trying to tell me that I shouldn't be using
> Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue?

Your example code doesn't contain an example for where you are getting  
the B::A::StructuredValue object from. If you didn't create that  
yourself, it would be good to know what you did to end up with that.  
Chris Fields has written B::A::Tagtree which would be way forward, and  
if you created the object yourself, can you take a look at that and  
see whether that class wouldn't serve your purpose as well or even  
better?

In order to be stored in BioSQL structured (hierarchical, nested)  
annotation is flattened into a string representation, because BioSQL  
can't store nested annotation collections natively. Right now if I am  
not mistaken upon retrieval this is not converted back into a  
B::A::Tagtree object but rather left flat. This is being worked on  
though, we've just discussed some issues connected with that.

I could make B::A::StructuredValue work the same way, but I'm not sure  
what it provides that B::A::Tagtree doesn't. The latter uses  
Data::Stag under the hood, which is much cleaner, and more extensible  
in the future.

As for SimpleValue annotation versus tag/value annotation for  
seqfeatures, yes right now these are treated interchangeably for the  
purposes of BioSQL and Bioperl-db. You can do this easily too on your  
end by using Bio::SeqFeature::AnnotationAdaptor.

> Is there even a place in the BioSQL schema for a comment to be  
> attached
> to a DBLink?

No there isn't. I thought it is but it turns out that this isn't yet  
one of the desirable extensions to BioSQL from 1.1.x onwards, as  
documented on the wiki:

http://www.biosql.org/wiki/Enhancement_Requests

I'll add it (but feel free to do so yourself, especially if you have  
other enhancmenets).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From johnsonm at gmail.com  Wed Aug 20 14:43:25 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 20 Aug 2008 13:43:25 -0500
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
	<2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
Message-ID: <ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>

On Tue, Aug 19, 2008 at 12:56 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote:
> There's a single quote missing here, but I'm assuming that's a result of
> copy/paste editing?

Yes, I was a bit sloppy with the example.

> Your example code doesn't contain an example for where you are getting the
> B::A::StructuredValue object from. If you didn't create that yourself, it
> would be good to know what you did to end up with that. Chris Fields has
> written B::A::Tagtree which would be way forward, and if you created the
> object yourself, can you take a look at that and see whether that class
> wouldn't serve your purpose as well or even better?

I created the B::A::StructuredValue myself.  I'm using it to store the
output from PSORTb, which gives a cellular localization and a score
for a protein sequence (gene), which I'm trying to keep paired
together, if possible.  I'll take a look at B::A::Tagtree, that's
probably a better fit.

> In order to be stored in BioSQL structured (hierarchical, nested) annotation
> is flattened into a string representation, because BioSQL can't store nested
> annotation collections natively. Right now if I am not mistaken upon
> retrieval this is not converted back into a B::A::Tagtree object but rather
> left flat. This is being worked on though, we've just discussed some issues
> connected with that.

The data I have isn't really deeply nested.  I just like to keep
related annotation in one object, if possible.

> I could make B::A::StructuredValue work the same way, but I'm not sure what
> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the
> hood, which is much cleaner, and more extensible in the future.

Perhaps B::A::StructuredValue should be deprecated?

> As for SimpleValue annotation versus tag/value annotation for seqfeatures,
> yes right now these are treated interchangeably for the purposes of BioSQL
> and Bioperl-db. You can do this easily too on your end by using
> Bio::SeqFeature::AnnotationAdaptor.

I'll check out the AnnotationAdaptor, but I'll probably just end using
seqfeature tags/values.  They're functionally equivalent to
B::A::SimpleValue.

>> Is there even a place in the BioSQL schema for a comment to be attached
>> to a DBLink?
>
> No there isn't. I thought it is but it turns out that this isn't yet one of
> the desirable extensions to BioSQL from 1.1.x onwards, as documented on the
> wiki:
>
> http://www.biosql.org/wiki/Enhancement_Requests
>
> I'll add it (but feel free to do so yourself, especially if you have other
> enhancmenets).

I'll take a look at the wiki....I'll file that as a feature request if
I get there before you do it.

From cjfields at illinois.edu  Wed Aug 20 16:25:55 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 20 Aug 2008 15:25:55 -0500
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
	<2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
	<ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>
Message-ID: <9872D07D-61AB-4F0A-A477-35AA87ABF72E@illinois.edu>


On Aug 20, 2008, at 1:43 PM, Mark Johnson wrote:

> ...
>
>> I could make B::A::StructuredValue work the same way, but I'm not  
>> sure what
>> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag  
>> under the
>> hood, which is much cleaner, and more extensible in the future.
>
> Perhaps B::A::StructuredValue should be deprecated?

Probably.  The only place it was used in core was SeqIO::swiss (and  
now that uses Tagtree in bioperl-live).

Let me know if you have any problems with Bio::Annotation::Tagtree.  I  
am planning on doing some more work with it soon.

chris


From cjfields at illinois.edu  Thu Aug 21 10:26:22 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Aug 2008 09:26:22 -0500
Subject: [Bioperl-l] Annotations issue (GenBank)
Message-ID: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>

I'm working on a GenBank patch and noticed a few cases where  
annotations are being stored as all uppercase strings (CONTIG, WGS,  
etc).  I'm planning on converting these to lowercase (e.g. 'wgs',  
'contig') for consistency with other annotation tag values.  I'm  
making sure output is consistent as well.

These are used fairly infrequently so I don't think it should cause  
problems, but just in case, does anyone have a problem with this change?

chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From awitney at sgul.ac.uk  Thu Aug 21 10:41:10 2008
From: awitney at sgul.ac.uk (Adam Witney)
Date: Thu, 21 Aug 2008 15:41:10 +0100
Subject: [Bioperl-l] adding HSP information to BLAST output graphic
	(Bio::Graphics)
Message-ID: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>


Hi,

I am going through the Bio::Graphics HOWTO on the wiki.

Looking at render_blast4.pl, the description text describes the whole  
hit and is set for the whole track, but i would like to be able to add  
HSP information such as the identity matches onto the picture, this is  
stored in the $hsp object. How would i go about adding that to the  
picture?

the relevant piece of code is:

       my $track = $panel->add_track(
                                     -glyph       => 'graded_segments',
                                     -label       => 1,
                                     -connector   => 'dashed',
                                     -bgcolor     => 'blue',
                                     -font2color  => 'red',
                                     -sort_order  => 'high_score',
                                     -description => sub {
                                       my $feature = shift;
                                       return unless $feature- 
 >has_tag('description');
                                       my ($description) = $feature- 
 >each_tag_value('description');
                                       my $score = $feature->score;
                                       "$description, score=$score";
                           #            "score=$score";
                                      },
                                    );

       next unless $hit->significance < 1E-20;

       my $feature = Bio::SeqFeature::Generic->new(
                                                   -score        =>  
$hit->raw_score,
                                                   -display_name =>  
$hit->name,
                                                   -tag          => {
                                                                      
description => $hit->description
                                                                    },
                                                  );

       while( my $hsp = $hit->next_hsp )
         {
          $feature->add_sub_SeqFeature($hsp,'EXPAND');
         }

       $track->add_feature($feature);

thanks for any help

adam

From cjfields at illinois.edu  Thu Aug 21 12:01:11 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Aug 2008 11:01:11 -0500
Subject: [Bioperl-l] Annotations issue (GenBank)
In-Reply-To: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>
References: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>
Message-ID: <0ACE20F0-43E5-4E12-9437-094871766083@illinois.edu>

I went ahead and committed this; if there are any disagreements about  
it I can back it out or modify as needed.

chris

On Aug 21, 2008, at 9:26 AM, Chris Fields wrote:

> I'm working on a GenBank patch and noticed a few cases where  
> annotations are being stored as all uppercase strings (CONTIG, WGS,  
> etc).  I'm planning on converting these to lowercase (e.g. 'wgs',  
> 'contig') for consistency with other annotation tag values.  I'm  
> making sure output is consistent as well.
>
> These are used fairly infrequently so I don't think it should cause  
> problems, but just in case, does anyone have a problem with this  
> change?
>
> chris

From mshafiullah at mail.unomaha.edu  Thu Aug 21 15:35:54 2008
From: mshafiullah at mail.unomaha.edu (Mohammad Shafiullah)
Date: Thu, 21 Aug 2008 14:35:54 -0500
Subject: [Bioperl-l] bioperl-network test error
Message-ID: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>

To whom it may concern: 

Encountered the following error while running ./Build test on 
bioperl-network-1.5.2_100

Can't stat scripts: No such file or directory
 at /usr/share/perl5/Module/Build/Base.pm line 3836
t/Edge...........ok 
t/Graph-MD5......1/19 Not an ARRAY reference at 
/usr/share/perl5/Heap/Elem.pm line 31.
t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
 All 19 subtests passed 
t/Graph-Seq......1/16 Not an ARRAY reference at 
/usr/share/perl5/Heap/Elem.pm line 31.
t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
 All 16 subtests passed 
t/IO_dip_tab.....ok 
t/IO_psi.........ok 
t/Interaction....ok 
t/Node...........ok 
t/ProteinNet.....ok 

Test Summary Report
-------------------
t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
  Non-zero exit status: 9
t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
  Non-zero exit status: 255
Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33 cusr 
0.37 csys =  1.72 CPU)
Result: FAIL
Failed 2/8 test programs. 0/292 subtests failed.

Please advise on the issue. 

Sincerely,

- Mohammad

From bosborne11 at verizon.net  Thu Aug 21 16:43:35 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 21 Aug 2008 16:43:35 -0400
Subject: [Bioperl-l] bioperl-network test error
In-Reply-To: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
References: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
Message-ID: <83EB2D5B-37F2-4163-8A14-F93ECE740197@verizon.net>

Mohammed,

Try replacing the ModuleBuildBioperl.pm file that you have with the  
one that's attached. Then do this inside the bioperl-network-1.5.2_100  
directory:

./Build clean
perl Build.PL
./Build test

What do you see? Also, what's your version of Perl?


Brian O.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ModuleBuildBioperl.pm
Type: text/x-perl-script
Size: 40360 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080821/17d19dd5/attachment-0001.bin>
-------------- next part --------------
:


On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote:

> To whom it may concern:
>
> Encountered the following error while running ./Build test on
> bioperl-network-1.5.2_100
>
> Can't stat scripts: No such file or directory
> at /usr/share/perl5/Module/Build/Base.pm line 3836
> t/Edge...........ok
> t/Graph-MD5......1/19 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
> All 19 subtests passed
> t/Graph-Seq......1/16 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
> All 16 subtests passed
> t/IO_dip_tab.....ok
> t/IO_psi.........ok
> t/Interaction....ok
> t/Node...........ok
> t/ProteinNet.....ok
>
> Test Summary Report
> -------------------
> t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
>  Non-zero exit status: 9
> t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
>  Non-zero exit status: 255
> Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33  
> cusr
> 0.37 csys =  1.72 CPU)
> Result: FAIL
> Failed 2/8 test programs. 0/292 subtests failed.
>
> Please advise on the issue.
>
> Sincerely,
>
> - Mohammad
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Thu Aug 21 17:57:17 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 21 Aug 2008 17:57:17 -0400
Subject: [Bioperl-l] bioperl-network test error
In-Reply-To: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
References: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
Message-ID: <AC71DAAF-C324-4D6E-8D56-EE19B1957A27@verizon.net>

Mohammad,

Take a look at this:

http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2007-06/msg00365.html

It discusses that Heap::Elem error when using Graph.

Brian O.


On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote:

> To whom it may concern:
>
> Encountered the following error while running ./Build test on
> bioperl-network-1.5.2_100
>
> Can't stat scripts: No such file or directory
> at /usr/share/perl5/Module/Build/Base.pm line 3836
> t/Edge...........ok
> t/Graph-MD5......1/19 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
> All 19 subtests passed
> t/Graph-Seq......1/16 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
> All 16 subtests passed
> t/IO_dip_tab.....ok
> t/IO_psi.........ok
> t/Interaction....ok
> t/Node...........ok
> t/ProteinNet.....ok
>
> Test Summary Report
> -------------------
> t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
>  Non-zero exit status: 9
> t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
>  Non-zero exit status: 255
> Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33  
> cusr
> 0.37 csys =  1.72 CPU)
> Result: FAIL
> Failed 2/8 test programs. 0/292 subtests failed.
>
> Please advise on the issue.
>
> Sincerely,
>
> - Mohammad
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dominic at bioinf.uni-leipzig.de  Fri Aug 22 09:35:21 2008
From: dominic at bioinf.uni-leipzig.de (Dominic Rose)
Date: Fri, 22 Aug 2008 15:35:21 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
Message-ID: <48AEC099.20106@bioinf.uni-leipzig.de>

Hi,

just a short suggestion to improve the code:

in function

sub _build_nt_matrix()

one finds the following lines:

         my $ti_index = $NucleotideIndexes{$ti};
         my $tj_index = $NucleotideIndexes{$tj};

         if( ! defined $ti_index ) {
             print "ti_index not defined for $ti\n";
             next;
         }

However, it should be possible to stop/silence the printing of that 
error message. Many alignments contain N's what causes many many
"ti_index not defined for N" messages. That should be avoidable.

Thanks,
Dominic

-- 
Dominic Rose
Professur f?r Bioinformatik
Institut f?r Informatik
Universit?t Leipzig
H?rtelstr. 16-18
D-04107 Leipzig
WWW    http://www.bioinf.uni-leipzig.de
Phone: +49 341 97-16698
Fax:   +49 341 97-16679

From heikki at sanbi.ac.za  Wed Aug 27 02:23:39 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 27 Aug 2008 08:23:39 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
In-Reply-To: <48AEC099.20106@bioinf.uni-leipzig.de>
References: <48AEC099.20106@bioinf.uni-leipzig.de>
Message-ID: <200808270823.39852.heikki@sanbi.ac.za>

Dominic,

You are absolutely right. 

I've changed 'print' into '$self->warn' in the SVN. Now it is possible to set 
$object->verbose(-1) to silence the warning or, if deemed necessary, set 
$object->verbose(2) and catch the error with an eval statement.

Thanks for reporting this,

    -Heikki

On Friday 22 August 2008 15:35:21 Dominic Rose wrote:
> Hi,
>
> just a short suggestion to improve the code:
>
> in function
>
> sub _build_nt_matrix()
>
> one finds the following lines:
>
>          my $ti_index = $NucleotideIndexes{$ti};
>          my $tj_index = $NucleotideIndexes{$tj};
>
>          if( ! defined $ti_index ) {
>              print "ti_index not defined for $ti\n";
>              next;
>          }
>
> However, it should be possible to stop/silence the printing of that
> error message. Many alignments contain N's what causes many many
> "ti_index not defined for N" messages. That should be avoidable.
>
> Thanks,
> Dominic

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From giles.weaver at googlemail.com  Wed Aug 27 06:39:19 2008
From: giles.weaver at googlemail.com (Giles Weaver)
Date: Wed, 27 Aug 2008 11:39:19 +0100
Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined
	primer
In-Reply-To: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com>
References: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com>
Message-ID: <1d06cd5d0808270339p4432f91bo72f48364696caf5a@mail.gmail.com>

Hi Tony,

It isn't well documented, but Primer3 includes a program called oligotm,
which is used to calculate the Tm of short sequences (up to 32bp). You can
run it directly by typing something like "oligotm ACGTACGTACGTACGT" in the
terminal. Just typing oligotm will give you the options.

If you are using Linux, these snippets of code may help you call oligotm
from within a perl script:

use IPC::Open3;

sub _run_oligotm
{
    my ($class, $sequence) = @_;

    my $run = "oligotm -tp 1 -sc 1 $sequence";
    my $pid = open3(\*WTRFH, \*RDRFH, \*ERRFH, $run);
    close (WTRFH);

    my ($tm, $errors);
    while (<RDRFH>) { $tm .= $_;}
    while (<ERRFH>) { $errors .= $_;}
    chomp $tm;
    return ($tm, $errors);
}

You'll need to put this in a package or edit out the $class bit for it to
work.

This is my first post to this list. I'm receiving the digest so replying to
posts is a bit of a faff. Can anyone recommend a better way of replying to
posts than replying to the digest, editing it and pasting the subject into
the subject field?

Giles Weaver
Unilever R&D


>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 19 Aug 2008 09:57:41 -0700
> From: "XQ Xu" <xxq.t.xu at gmail.com>
> Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined
>        primer
> To: bioperl-l at lists.open-bio.org
> Message-ID:
>        <3fde82050808190957y271aa52eh30e39a438cc8a8e3 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi all,
> I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use
> Primer3 to calculate Tm for some pre-defined primers; however there is no
> direct way to calculate Tm with Primer3. I have to call Primer3 and supply
> a
> pre-defined primer, a template, etc to let it run and hopefully Primer3
> finds a pair of primers for me, then I have to open the output and find out
> what the Tm is for my pre-defined primer. Do I miss any function that can
> do
> this quickly for me?
> I know there's another module (Bio::SeqFeature::Primer) can do this
> quickly, but the Tm is calculated with different parameters; therefore it's
> not good to use it while I use Primer3 to design primers.
> Any input?
> Thanks!
> -Tony
>

From awitney at sgul.ac.uk  Wed Aug 27 12:21:12 2008
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 27 Aug 2008 17:21:12 +0100
Subject: [Bioperl-l] adding HSP information to BLAST output graphic
	(Bio::Graphics)
In-Reply-To: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>
References: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>
Message-ID: <8939216F-7AFC-40BB-A416-8508E3E5D871@sgul.ac.uk>


after some digging around myself, this seems to do the trick, although  
i don't know if it will always work, as i am having to add a separate  
sort function.

On 21 Aug 2008, at 15:41, Adam Witney wrote:

>
> Hi,
>
> I am going through the Bio::Graphics HOWTO on the wiki.
>
> Looking at render_blast4.pl, the description text describes the  
> whole hit and is set for the whole track, but i would like to be  
> able to add HSP information such as the identity matches onto the  
> picture, this is stored in the $hsp object. How would i go about  
> adding that to the picture?
>
> the relevant piece of code is:
>
>      my $track = $panel->add_track(
>                                    -glyph       => 'graded_segments',
>                                    -label       => 1,
>                                    -connector   => 'dashed',
>                                    -bgcolor     => 'blue',
>                                    -font2color  => 'red',
>                                    -sort_order  => 'high_score',

                                     -part_labels => sub {
                                       my ($feature,undef,$partno) = @_;
                                       my @features =  
sort_features($feature->get_SeqFeatures());
                                       return $features[$partno]- 
 >num_identical.'/'.$features[$partno]->length.' (score='. 
$features[$partno]->score.')' if $features[$partno];
                                      },

>                                    -description => sub {
>                                      my $feature = shift;
>                                      return unless $feature- 
> >has_tag('description');
>                                      my ($description) = $feature- 
> >each_tag_value('description');
>                                      my $score = $feature->score;
>                                      "$description, score=$score";
>                          #            "score=$score";
>                                     },
>                                   );


sub sort_features {
   my @array = @_;

   if(@array < 2){return @array}

   my @sorted = sort {$a->start <=> $b->start} @array;

   return @sorted;
}


is this the best way to achieve this?

thanks

adam


From mauricio at open-bio.org  Thu Aug 28 13:43:04 2008
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Thu, 28 Aug 2008 12:43:04 -0500
Subject: [Bioperl-l] Pdoc updates
Message-ID: <48B6E3A8.3050305@open-bio.org>

For those who use the online Pdoc documentation (doc.bioperl.org), this 
is to let you know that the CvsWeb links at the top of any module page 
now link to the proper place in the SVN web interface.

Cheers,
Mauricio.

From jaudall at gmail.com  Fri Aug 29 02:46:20 2008
From: jaudall at gmail.com (Joshua Udall)
Date: Thu, 28 Aug 2008 23:46:20 -0700
Subject: [Bioperl-l] DB_File and assembly IO
Message-ID: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>

Bioperl -

I'm trying to read/parse a single cap3 ace file with several thousand
contigs.  I get a DB_File error at Contig247.  Here's the error:

------------- EXCEPTION -------------
MSG: Unable to tie DB_File handle
STACK Bio::SeqFeature::Collection::new
/Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
STACK Bio::Assembly::Contig::new
/Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
STACK Bio::Assembly::IO::ace::next_assembly
/Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
STACK toplevel /Users/jaudall/bin/read_ace.pl:214
-------------------------------------

Looking at the Collection::new, the error is on the middle line:

  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
$self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die "Cannot open
file: $!\n" ;
  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
  return $self;

If I uncomment out the $! die statement that I inserted, I get this:

'Cannot open file tree: Too many open files'

Apparently the Collection constructor is creating a new index file for each
one and the handles for each are sticking around?  That confuses me because
reading more about the Collection.pm and DB_File, it appeared to me that no
files were written by default (as I'm doing), rather the Collection objects
are all stored in memory.  I'm pretty sure the error is not a permission
error, and if it is not the open file-handles, what else should I look for?


If I 'warn' the error instead of throwing it, I get:

Can't call method "get_dup" on an undefined value at
/Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360

This kind of makes sense because the index appears not be be created and it
can't look stuff up in an undefined tied hash.  I'm stuck.

Thanks for any help and suggestions.

OSX, perl 5.8.8, bioperl-live (svn last week)

-- 
Joshua Udall
Assistant Professor
295 WIDB
Plant and Wildlife Science Dept.
Brigham Young University
Provo, UT 84602
801-422-9307
Fax: 801-422-0008
USA

From florent.angly at gmail.com  Fri Aug 29 04:40:25 2008
From: florent.angly at gmail.com (Florent Angly)
Date: Fri, 29 Aug 2008 18:40:25 +1000
Subject: [Bioperl-l] DB_File and assembly IO
In-Reply-To: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
Message-ID: <48B7B5F9.1050608@gmail.com>

Hi Joshua,

I don't know the specifics of DB_File, but the 'Cannot open file tree: 
Too many open files' is pretty explicit.
If you're on Unix/Linux you can check the files that are open by your 
program by typing:
    lsof | grep name_of_program
There is probably a filehandle that in not closed somewhere in your code 
or the BioPerl code.
Best,

Florent


Joshua Udall wrote:
> Bioperl -
>
> I'm trying to read/parse a single cap3 ace file with several thousand
> contigs.  I get a DB_File error at Contig247.  Here's the error:
>
> ------------- EXCEPTION -------------
> MSG: Unable to tie DB_File handle
> STACK Bio::SeqFeature::Collection::new
> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
> STACK Bio::Assembly::Contig::new
> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
> STACK Bio::Assembly::IO::ace::next_assembly
> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
> -------------------------------------
>
> Looking at the Collection::new, the error is on the middle line:
>
>   $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die "Cannot open
> file: $!\n" ;
>   $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>   return $self;
>
> If I uncomment out the $! die statement that I inserted, I get this:
>
> 'Cannot open file tree: Too many open files'
>
> Apparently the Collection constructor is creating a new index file for each
> one and the handles for each are sticking around?  That confuses me because
> reading more about the Collection.pm and DB_File, it appeared to me that no
> files were written by default (as I'm doing), rather the Collection objects
> are all stored in memory.  I'm pretty sure the error is not a permission
> error, and if it is not the open file-handles, what else should I look for?
>
>
> If I 'warn' the error instead of throwing it, I get:
>
> Can't call method "get_dup" on an undefined value at
> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360
>
> This kind of makes sense because the index appears not be be created and it
> can't look stuff up in an undefined tied hash.  I'm stuck.
>
> Thanks for any help and suggestions.
>
> OSX, perl 5.8.8, bioperl-live (svn last week)
>
>   


From cjfields at illinois.edu  Fri Aug 29 10:30:49 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 29 Aug 2008 09:30:49 -0500
Subject: [Bioperl-l] DB_File and assembly IO
In-Reply-To: <48B7B5F9.1050608@gmail.com>
References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
	<48B7B5F9.1050608@gmail.com>
Message-ID: <5717CE96-EC24-46D9-A922-88702B1647A1@illinois.edu>

This is a known problem with Bio::Assembly and stems from having a  
DB_File tied (opened) for each Bio::Assembly::Contig (via a retained  
Bio::SeqFeature::Collection).  You can extend the number of open  
filehandles on UNIX'y flavors using ulimit (see following link), but  
I'm not sure about Win32.

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

The general bug is reproducible using the following simple script.  If  
needed adjust the range end in the for loop to exceed the ulimit (via  
'ulimit -n);  Mac OS X 10.5 is set to 2560.

---------------------------
use Bio::Assembly::Contig;

my @contigs;

push @contigs, Bio::Assembly::Contig->new() for (1..10000);
---------------------------

I'll open a bug report on this for tracking (for release 1.7, along  
with any other Bio::Assembly issues).  That doesn't mean it won't get  
fixed sooner, just that we aren't under pressure with the next  
release, which already has a full plate.  IMO, I don't think there  
needs to be one SF::Collection per contig; one instance should work do  
for the entire assembly, using the same SF::Collection passed in to  
each contig and distinguishing the contig using the SeqFeature  
seq_id.  It would also be nice if we could change that to also allow  
other SeqFeature::CollectionI (i.e. Bio::DB::SeqFeature::Store and the  
like, for instance).

chris

On Aug 29, 2008, at 3:40 AM, Florent Angly wrote:

> Hi Joshua,
>
> I don't know the specifics of DB_File, but the 'Cannot open file  
> tree: Too many open files' is pretty explicit.
> If you're on Unix/Linux you can check the files that are open by  
> your program by typing:
>   lsof | grep name_of_program
> There is probably a filehandle that in not closed somewhere in your  
> code or the BioPerl code.
> Best,
>
> Florent
>
>
>
> Joshua Udall wrote:
>> Bioperl -
>>
>> I'm trying to read/parse a single cap3 ace file with several thousand
>> contigs.  I get a DB_File error at Contig247.  Here's the error:
>>
>> ------------- EXCEPTION -------------
>> MSG: Unable to tie DB_File handle
>> STACK Bio::SeqFeature::Collection::new
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
>> STACK Bio::Assembly::Contig::new
>> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
>> STACK Bio::Assembly::IO::ace::next_assembly
>> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
>> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
>> -------------------------------------
>>
>> Looking at the Collection::new, the error is on the middle line:
>>
>>  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
>> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die  
>> "Cannot open
>> file: $!\n" ;
>>  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>>  return $self;
>>
>> If I uncomment out the $! die statement that I inserted, I get this:
>>
>> 'Cannot open file tree: Too many open files'
>>
>> Apparently the Collection constructor is creating a new index file  
>> for each
>> one and the handles for each are sticking around?  That confuses me  
>> because
>> reading more about the Collection.pm and DB_File, it appeared to me  
>> that no
>> files were written by default (as I'm doing), rather the Collection  
>> objects
>> are all stored in memory.  I'm pretty sure the error is not a  
>> permission
>> error, and if it is not the open file-handles, what else should I  
>> look for?
>>
>>
>> If I 'warn' the error instead of throwing it, I get:
>>
>> Can't call method "get_dup" on an undefined value at
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm  
>> line 360
>>
>> This kind of makes sense because the index appears not be be  
>> created and it
>> can't look stuff up in an undefined tied hash.  I'm stuck.
>>
>> Thanks for any help and suggestions.
>>
>> OSX, perl 5.8.8, bioperl-live (svn last week)
>>

From milan.gilic at st.t-com.hr  Sat Aug 23 17:15:39 2008
From: milan.gilic at st.t-com.hr (Milan)
Date: Sat, 23 Aug 2008 23:15:39 +0200
Subject: [Bioperl-l]  Count or weight matrix in bioperl?
Message-ID: <48B07DFB.3040201@st.t-com.hr>


From dominic at bioinf.uni-leipzig.de  Mon Aug 25 06:40:42 2008
From: dominic at bioinf.uni-leipzig.de (Dominic Rose)
Date: Mon, 25 Aug 2008 12:40:42 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
Message-ID: <48B28C2A.8050803@bioinf.uni-leipzig.de>

Hi,

just a short suggestion to improve the code:

in function

sub _build_nt_matrix()

one finds the following lines:

         my $ti_index = $NucleotideIndexes{$ti};
         my $tj_index = $NucleotideIndexes{$tj};

         if( ! defined $ti_index ) {
             print "ti_index not defined for $ti\n";
             next;
         }

However, it should be possible to avoid the printing of that error 
message. Many alignments contain N's what causes many many
"ti_index not defined for N" messages. It should be possible to switch 
that message off.

Thanks,
Dominic

-- 
Dominic Rose
Professur f?r Bioinformatik
Institut f?r Informatik
Universit?t Leipzig
H?rtelstr. 16-18
D-04107 Leipzig
WWW    http://www.bioinf.uni-leipzig.de
Phone: +49 341 97-16698
Fax:   +49 341 97-16679

From brunovecchi at yahoo.com.ar  Fri Aug  1 00:16:16 2008
From: brunovecchi at yahoo.com.ar (Bruno Vecchi)
Date: Fri, 01 Aug 2008 01:16:16 -0300
Subject: [Bioperl-l] Bio::Biblio doesn't find articles [SOLVED]
Message-ID: <48928E10.7090903@yahoo.com.ar>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080801/f000f7b8/attachment-0002.html>

From Kevin.Clancy at invitrogen.com  Fri Aug  1 18:30:30 2008
From: Kevin.Clancy at invitrogen.com (Clancy, Kevin)
Date: Fri, 1 Aug 2008 15:30:30 -0700
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
Message-ID: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>

Hi Folks
I am using the windows version of Bioperl 1.5.2_100. I recently was
compiling a tool using ActiveState's PerlApp which included Bioperl
modules. I received an error for the Bio::SeqIO module, which was
calling for the Bio::SeqIO::staden::read method(?) on line 312 - 314 of
the Bio::SeqIO.pm module. I don't appear to have a copy of the staden
module under the Bio::SeqIO directory and it doesn't appear to be
present in the current BioPerl trunk. I simply commented this out of my
SeqIO.pm file to perform my build and its all running normally. Was this
simply a reference to a non existent module or am I missing something?
Thank you for your help.
kevin

Kevin Clancy, PhD
Senior Scientist, Informatic Sciences
Invitrogen Corp
Carlsbad, CA 92008
Phone: (768) 268 8356
Email: kevin.clancy at invitrogen.com 


From jason at bioperl.org  Sat Aug  2 08:58:05 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 2 Aug 2008 07:58:05 -0500
Subject: [Bioperl-l] Inframe stop codon
In-Reply-To: <516747.39380.qm@web36405.mail.mud.yahoo.com>
References: <516747.39380.qm@web36405.mail.mud.yahoo.com>
Message-ID: <F3DAE939-82AC-4EB8-8667-F6612478477E@bioperl.org>

[regarding PAML analyses]

You would need to translate the cDNA sequence and identify where the  
stop codon is, then remove that codon or remove that sequence from  
your bulk analyses.  it depends on why you think the stop codon is in  
the sequence - mis-annotation, this is a pseudogene, or what?  If  
this is a small percentage of a lot of sequences I would probably  
just skip these, if this is the terminal stop codon that being  
included in the sequences, you just need to remove the last codon  
from the sequences before providing it to PAML. There Seq HOWTO has  
many examples of how to manipulate a sequence object with substr,  
trunc, as well as just the simple seq() method that gives you the  
sequence as a string, which you can manipulate, then update the  
sequence object afterwards. As in
my $str = $seq->seq;
# remove the last codon from this cDNA sequence
substr($str, -3, 3,'');
$seq->seq($str);

Alternatively you can use trunc to truncate the sequence
my $trunc = $seq->trunc(1,$seq->length -3);
$seq = $trunc;


You can translate the sequence with the $seq->translate command, then  
test for presence of a stop codon (This is exactly the code that is  
running in the pairwise_kaks script that is in the scripts/utilities/  
directory).  If you have a stop codon you need to figure out where it  
is at the end of the sequence or not.  If it is the terminal codon,  
you can just lop off the last codon on all your sequences, but if it  
is internal, you need to decide what you want to do with this sequence.

If there are multiple stop codons, I am not sure it is appropriate to  
run PAML here, unless you are interested in some sort of pseudo-rate  
calculation that has many of the codons omitted.  Otherwise you may  
just want to calculate a DNA substitution rate for the sequences to  
make comparison.

I suggest working a single file by hand to get the appropriate steps  
down and then coding it up will be easier.

I am sure folks on the list can help too so it is important to post  
to the mailing list - I don't see any messages from you on the list  
about this query.

-jason
On Aug 2, 2008, at 5:42 AM, Tannistha wrote:

>
> Hi Jason,
>
> Please suggest me how to filter the inframe stop codons,  
> aa_to_dna_aln returns the sequence with in-frame stop codons.
> I have posted my query along with the input files to the forum.
>
> Thanks for your earlier advice, runmode =0 is working for me.
>
> Look forward to your reply
>
> Best Regards
> Tannistha
>
>
> Dr. Tannistha Nandi
> email: tannistha3 at yahoo.com
>
>
>


From David.Messina at sbc.su.se  Sun Aug  3 15:10:18 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 3 Aug 2008 21:10:18 +0200
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
In-Reply-To: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
References: <Acj0JjSSmx+wYhbUS0GAoF0WhwWjzg==>
	<28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
Message-ID: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>

Hi Kevin,

The staden module is a oddball one, to be sure.

A search on the BioPerl website turns up this FAQ entry:
http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F

Also the Windows install page
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

says:

> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden> and
> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs can only
> be installed on Windows by using Cygwin <http://www.cygwin.com/> and its gcc
> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>


In any case, the staden module (and associated external libraries) is used
only if you are trying to read the scf, abi, alf, pln, exp, ctf, or ztr
binary formats. So your edit shouldn't cause you any problems otherwise.

Dave


From cjfields at uiuc.edu  Sun Aug  3 16:20:52 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Aug 2008 15:20:52 -0500
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
In-Reply-To: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>
References: <Acj0JjSSmx+wYhbUS0GAoF0WhwWjzg==>
	<28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
	<628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>
Message-ID: <BA800345-8498-46CF-AD94-E8FD9098DEBB@uiuc.edu>

This seems to be a problem with PerlApp and eval{}; judging by a quick  
Google search this isn't the only module affected.  The line in  
question is wrapped in an eval{} to check for the availability of  
Bio::SeqIO::staden::read (but not die on it).

BTW, the eval was moved into the relevant plugin modules post-1.5.2,  
so the eval{} is checked when the module is loaded dynamically (i.e.  
when a format requiring it is passed in).  It was causing other issues  
with ActivePerl installations and was redundant, so it was removed.

http://bugzilla.open-bio.org/show_bug.cgi?id=2295

chris

On Aug 3, 2008, at 2:10 PM, Dave Messina wrote:

> Hi Kevin,
>
> The staden module is a oddball one, to be sure.
>
> A search on the BioPerl website turns up this FAQ entry:
> http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F
>
> Also the Windows install page
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> says:
>
>> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden 
>> > and
>> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs  
>> can only
>> be installed on Windows by using Cygwin <http://www.cygwin.com/>  
>> and its gcc
>> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>>
>
>
> In any case, the staden module (and associated external libraries)  
> is used
> only if you are trying to read the scf, abi, alf, pln, exp, ctf, or  
> ztr
> binary formats. So your edit shouldn't cause you any problems  
> otherwise.
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From btemperton at googlemail.com  Sat Aug  2 16:05:37 2008
From: btemperton at googlemail.com (Benbo)
Date: Sat, 2 Aug 2008 13:05:37 -0700 (PDT)
Subject: [Bioperl-l]  Finding possible primers regex
Message-ID: <18792782.post@talk.nabble.com>


Hi there, 
I'm trying to write a perl script to scan an aligned multiple entry fasta
file and find possible primers. So far I've produced a string which contains
bases which match all sequences and * where they don't match e.g.
1) TTAGCCTAA
2) TTAGCAGAA
3) TTACCCTAA

would give TTA*C**AA.

I want to parse this string and pull out all sequences which are 18-21 bp in
length and have no more than 4 * in them.

So far, I've got this:

while($fragment_match =~ /([GTAC*]{18,21})/g){
print "$1\n";
}

hoping to match all fragments 18-21 characters in length. However even that
doesn't work as it has essentially chunked it into 21 char blocks, rather
than what I hoped for of
0-18
0-19
0-20
0-21
1-19
1-20
1-21
1-22

etc.

Can anyone let me know if this is already possible in BioPerl, or how one
would go about it with regex. Sadly I'm fairly new to perl and getting to
grips with BioPerl, so please treat me gently :).

Many thanks,

Ben


-- 
View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Aug  4 00:08:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Aug 2008 23:08:51 -0500
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <18792782.post@talk.nabble.com>
References: <18792782.post@talk.nabble.com>
Message-ID: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>

On Aug 2, 2008, at 3:05 PM, Benbo wrote:

>
> Hi there,
> I'm trying to write a perl script to scan an aligned multiple entry  
> fasta
> file and find possible primers. So far I've produced a string which  
> contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
>
> would give TTA*C**AA.
>
> I want to parse this string and pull out all sequences which are  
> 18-21 bp in
> length and have no more than 4 * in them.
>
> So far, I've got this:
>
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
>
> hoping to match all fragments 18-21 characters in length. However  
> even that
> doesn't work as it has essentially chunked it into 21 char blocks,  
> rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
>
> etc.
>
> Can anyone let me know if this is already possible in BioPerl, or  
> how one
> would go about it with regex. Sadly I'm fairly new to perl and  
> getting to
> grips with BioPerl, so please treat me gently :).
>
> Many thanks,
>
> Ben

There is a trick to this which is discussed more extensively in  
'Mastering Regular Expressions'.  Essentially you have to embed code  
into the regex and trick the parser into backtracking using a negative  
lookahead.  The match itself fails (i.e. no match is returned), but  
the embedded code is executed for each match attempt,

The following script is a slight modification of one I used which  
checks the consensus string from the input alignment (in aligned FASTA  
format here), extracts the alignment slice using that match, then spit  
the alignment out to STDOUT in clustalw format.  This should work for  
perl 5.8 and up, but it's only been tested on perl 5.10.  You should  
be able to use this to fit what you want.

my $in = Bio::AlignIO->new(-file => $file,
                            -format => 'fasta');
my $out = Bio::AlignIO->new(-fh => \*STDOUT,
                            -format => 'clustalw');

while (my $aln = $in->next_aln) {
     my $c = $aln->consensus_string(100);
     my @matches;
     $c =~ m/
         ([GTAC?]{18,21})
         (?{my $match = check_match($1);
            push @matches, [$match,
                            pos(),
                            length($match)]
               if defined $match;})
         (?!)
         /xig;
     for my $match (@matches) {
         my ($hit, $st, $end) = ($match->[0],
                                 $match->[1] - $match->[2] + 1,
                                 $match->[1]);
         my $newaln = $aln->slice($st, $end);
         $out->write_aln($newaln);
     }
}

sub check_match {
     my $match = shift;
     return unless $match;
     my $ct = $match =~ tr/?/?/;
     return $match if $ct <= 4;
}


chris


From heikki at sanbi.ac.za  Mon Aug  4 02:42:57 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Mon, 4 Aug 2008 08:42:57 +0200
Subject: [Bioperl-l] Bio::Coordinate::Pair
In-Reply-To: <C4B4F0DB.498%pchanpa@emory.edu>
References: <C4B4F0DB.498%pchanpa@emory.edu>
Message-ID: <200808040842.57466.heikki@sanbi.ac.za>

Prashanth,

Your example coordinates do not do the conversion but more or less report the 
locations of your features in some third coordinates.

The way to think coordinates pairs is to use them as HSPs. You tell the pair 
object what is the matching segment in the pair of sequences.


The synopsis in Bio::Coordinate::Pair class file gives the following example:


use Bio::Location::Simple;
use Bio::Coordinate::Pair;

my $match1 = Bio::Location::Simple->new
    (-seq_id => 'propeptide', -start => 21, -end => 40, -strand=>1 );
my $match2 = Bio::Location::Simple->new
    (-seq_id => 'peptide', -start => 1, -end => 20, -strand=>1 );
my $pair = Bio::Coordinate::Pair->new(-in => $match1,
				      -out => $match2
    );
# location to match
$pos = Bio::Location::Simple->new
    (-start => 25, -end => 25, -strand=> -1 );

$res = $pair->map($pos);
print $res->match->start; # 5


In other words, region 25-40 in the propeptide matches locations 1-20 in the 
final peptide. Therefore conversion from 25 gives 5:

     signalp        21  25             40
--------------------|---|--------------|
                    1   5  pep         20

I hope this clarifies it.

The advantage of using these objects over manual conversion is that the code 
has been debugged (no all too easy +/-1 errors) and that they can be chained 
together.

Yours,

      -Heikki


On Tuesday 29 July 2008 22:07:55 Prashanth Athri wrote:
> Dear Professor Lehvaslaiho:
>
> I had a quick question about the module- Bio::Coordinate::Pair
>
> The BioPerl tutorial has the following example:
>
> $input_coordinates = Bio::Location::Simple->new
> (-seq_id => 'propeptide', -start => 1000, -end => 2000, -strand=>1 );
>
> $output_coordinates = Bio::Location::Simple->new
> (-seq_id => 'peptide', -start => 1100, -end => 2100, -strand=>1 );
>
> $pair = Bio::Coordinate::Pair->new
> (-in => $input_coordinates , -out => $output_coordinates );
>
> $pos = Bio::Location::Simple->new (-start => 500, -end => 500 );
>
> $res = $pair->map($pos);
> $converted_start = $res->start;
>
> The way I understand it, $converted_start should return ?1600?. But when I
> run this snippet, it returns ?500?. Could you please let me know how
> $pair->map($pos) is processed?
>
> I appreciate your time and thanks in advance.
>
> Regards,
> Prashanth


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From lengjingmao at gmail.com  Tue Aug  5 03:36:23 2008
From: lengjingmao at gmail.com (Shaohua Fan)
Date: Tue, 5 Aug 2008 15:36:23 +0800
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
References: <18792782.post@talk.nabble.com>
Message-ID: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>

Hi, there ,

I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?

thanks a lot. 
yours,
shaohua
----- Original Message ----- 
From: "Benbo" <btemperton at googlemail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Sunday, August 03, 2008 4:05 AM
Subject: [Bioperl-l] Finding possible primers regex


> 
> Hi there, 
> I'm trying to write a perl script to scan an aligned multiple entry fasta
> file and find possible primers. So far I've produced a string which contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
> 
> would give TTA*C**AA.
> 
> I want to parse this string and pull out all sequences which are 18-21 bp in
> length and have no more than 4 * in them.
> 
> So far, I've got this:
> 
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
> 
> hoping to match all fragments 18-21 characters in length. However even that
> doesn't work as it has essentially chunked it into 21 char blocks, rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
> 
> etc.
> 
> Can anyone let me know if this is already possible in BioPerl, or how one
> would go about it with regex. Sadly I'm fairly new to perl and getting to
> grips with BioPerl, so please treat me gently :).
> 
> Many thanks,
> 
> Ben
> 
> 
> 
> -- 
> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Tue Aug  5 05:49:55 2008
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 5 Aug 2008 11:49:55 +0200
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
References: <18792782.post@talk.nabble.com>
	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
Message-ID: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>

Hi,

There is a BioPerl Utility script doing this.
See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header.

" scripts/utilities/bp_nrdb.PLS
    Make a non-redundant database based on sequence, not id. Requires
Digest::MD5."

Alternatively, you can make a hash using the sequences as keys.


Regards,
Bernd

On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com> wrote:
> Hi, there ,
>
> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?
>
> thanks a lot.
> yours,
> shaohua
> ----- Original Message -----
> From: "Benbo" <btemperton at googlemail.com>
> To: <Bioperl-l at lists.open-bio.org>
> Sent: Sunday, August 03, 2008 4:05 AM
> Subject: [Bioperl-l] Finding possible primers regex
>
>
>>
>> Hi there,
>> I'm trying to write a perl script to scan an aligned multiple entry fasta
>> file and find possible primers. So far I've produced a string which contains
>> bases which match all sequences and * where they don't match e.g.
>> 1) TTAGCCTAA
>> 2) TTAGCAGAA
>> 3) TTACCCTAA
>>
>> would give TTA*C**AA.
>>
>> I want to parse this string and pull out all sequences which are 18-21 bp in
>> length and have no more than 4 * in them.
>>
>> So far, I've got this:
>>
>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>> print "$1\n";
>> }
>>
>> hoping to match all fragments 18-21 characters in length. However even that
>> doesn't work as it has essentially chunked it into 21 char blocks, rather
>> than what I hoped for of
>> 0-18
>> 0-19
>> 0-20
>> 0-21
>> 1-19
>> 1-20
>> 1-21
>> 1-22
>>
>> etc.
>>
>> Can anyone let me know if this is already possible in BioPerl, or how one
>> would go about it with regex. Sadly I'm fairly new to perl and getting to
>> grips with BioPerl, so please treat me gently :).
>>
>> Many thanks,
>>
>> Ben
>>
>>
>>
>> --
>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From diriano at uni-potsdam.de  Tue Aug  5 06:28:58 2008
From: diriano at uni-potsdam.de (Diego Mauricio Riano Pachon)
Date: Tue, 05 Aug 2008 12:28:58 +0200
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
Message-ID: <48982B6A.4050304@uni-potsdam.de>

Hi all,

Or you might try a non-bioperl solution that works pretty well, check:

http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86

Best,

Diego

Bernd Web wrote:
> Hi,
> 
> There is a BioPerl Utility script doing this.
> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header.
> 
> " scripts/utilities/bp_nrdb.PLS
>     Make a non-redundant database based on sequence, not id. Requires
> Digest::MD5."
> 
> Alternatively, you can make a hash using the sequences as keys.
> 
> 
> Regards,
> Bernd
> 
> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com> wrote:
>> Hi, there ,
>>
>> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?
>>
>> thanks a lot.
>> yours,
>> shaohua
>> ----- Original Message -----
>> From: "Benbo" <btemperton at googlemail.com>
>> To: <Bioperl-l at lists.open-bio.org>
>> Sent: Sunday, August 03, 2008 4:05 AM
>> Subject: [Bioperl-l] Finding possible primers regex
>>
>>
>>> Hi there,
>>> I'm trying to write a perl script to scan an aligned multiple entry fasta
>>> file and find possible primers. So far I've produced a string which contains
>>> bases which match all sequences and * where they don't match e.g.
>>> 1) TTAGCCTAA
>>> 2) TTAGCAGAA
>>> 3) TTACCCTAA
>>>
>>> would give TTA*C**AA.
>>>
>>> I want to parse this string and pull out all sequences which are 18-21 bp in
>>> length and have no more than 4 * in them.
>>>
>>> So far, I've got this:
>>>
>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>> print "$1\n";
>>> }
>>>
>>> hoping to match all fragments 18-21 characters in length. However even that
>>> doesn't work as it has essentially chunked it into 21 char blocks, rather
>>> than what I hoped for of
>>> 0-18
>>> 0-19
>>> 0-20
>>> 0-21
>>> 1-19
>>> 1-20
>>> 1-21
>>> 1-22
>>>
>>> etc.
>>>
>>> Can anyone let me know if this is already possible in BioPerl, or how one
>>> would go about it with regex. Sadly I'm fairly new to perl and getting to
>>> grips with BioPerl, so please treat me gently :).
>>>
>>> Many thanks,
>>>
>>> Ben
>>>
>>>
>>>
>>> --
>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
___________________________________
Diego Mauricio Ria?o Pach?n
Biologist - PhD student
AG Mueller-Roeber
Institute for Biochemistry and Biology
University of Potsdam

Address: Karl-Liebknecht-Str. 24-25
	 Haus 20
	 14476 Golm
	 Germany

Tel:	 +49 331 977 2809
Fax:	 +49 331 977 2512

web:	http://www.geocities.com/dmrp.geo


From cjfields at uiuc.edu  Tue Aug  5 11:19:54 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Aug 2008 10:19:54 -0500
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <48982B6A.4050304@uni-potsdam.de>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>

Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> 	 Haus 20
> 	 14476 Golm
> 	 Germany
>
> Tel:	 +49 331 977 2809
> Fax:	 +49 331 977 2512
>
> web:	http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Aug  5 11:19:54 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Aug 2008 10:19:54 -0500
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <48982B6A.4050304@uni-potsdam.de>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>

Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> 	 Haus 20
> 	 14476 Golm
> 	 Germany
>
> Tel:	 +49 331 977 2809
> Fax:	 +49 331 977 2512
>
> web:	http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From lengjingmao at gmail.com  Tue Aug  5 11:24:22 2008
From: lengjingmao at gmail.com (Shaohua Fan)
Date: Tue, 5 Aug 2008 23:24:22 +0800
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
	<4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>
Message-ID: <3A95AD6D18A749F3B73C135CCC8E7C90@6B2F7FFC298C46F>

hi,
thanks a lot for the help!
cheers,
shaohua
----- Original Message ----- 
From: "Chris Fields" <cjfields at uiuc.edu>
To: "Diego Mauricio Riano Pachon" <diriano at uni-potsdam.de>
Cc: "Bernd Web" <bernd.web at gmail.com>; <Bioperl-l at lists.open-bio.org>; "Shaohua Fan" <lengjingmao at gmail.com>
Sent: Tuesday, August 05, 2008 11:19 PM
Subject: Re: [Bioperl-l] how to remove indentical sequences from a dataset


Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> Haus 20
> 14476 Golm
> Germany
>
> Tel: +49 331 977 2809
> Fax: +49 331 977 2512
>
> web: http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From martin.senger at gmail.com  Tue Aug  5 22:53:07 2008
From: martin.senger at gmail.com (Martin Senger)
Date: Wed, 6 Aug 2008 10:53:07 +0800
Subject: [Bioperl-l] Bio::Biblio doesn't find articles
Message-ID: <4d93f07c0808051953k4cb7511cg5ec4cd93f53cfd0f@mail.gmail.com>

I am afraid that the server that serves the MEDLINE database to the
Bio::Biblio module (using the SOAP protocol), and that is running at EBI,
may be not fully supported. I am not working at EBI anymore and I have
stopped to monitor their servers. I am still their collaborator - but I am
not, unfortunately, involved in the MEDLINE tools anymore.

I would be happy to continue to maintain the Bio::Biblio module but it
relies on a server that I do not anymore control.

Cheers,
Martin

-- 
Martin Senger
email: martin.senger at gmail.com,m.senger at cgiar.org
skype: martinsenger


From Russell.Smithies at agresearch.co.nz  Wed Aug  6 17:20:04 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 7 Aug 2008 09:20:04 +1200
Subject: [Bioperl-l] not BioPerl
Message-ID: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>

Has anyone taken a look at the new Perl interface to the NCBI C++
Toolkit?
Unfortunately, I can't even get their examples working as I'm behind a
firewall and documentation on setting proxy stuff is virtually
non-existant  :-(


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Wed Aug  6 17:33:27 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 6 Aug 2008 16:33:27 -0500
Subject: [Bioperl-l] not BioPerl
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
Message-ID: <E4A73CE7-9288-4245-99D5-90F488698D6C@illinois.edu>

Looks like they're binary releases for 32- and 64-bit linux (quite  
large, at 25 MB).  Would be nice to have the C++ bindings for other  
OS's (my guess is this was set up via swig).

I have access to a linux cluster, so I may give this a try soon.

chris

On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:

> Has anyone taken a look at the new Perl interface to the NCBI C++
> Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>
>
> Russell Smithies
>
> Bioinformatics Applications Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
>
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
>
>
>
>
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From vinaykmittal at gatech.edu  Wed Aug  6 16:56:22 2008
From: vinaykmittal at gatech.edu (Mittal, Vinay K)
Date: Wed, 6 Aug 2008 16:56:22 -0400 (EDT)
Subject: [Bioperl-l] Error installing Biopel Module
Message-ID: <469631287.3995201218056182383.JavaMail.root@mail5.gatech.edu>

Hi,

I just installed Active perl 5.10.0 and was trying to install Bioperl Modules. While installing Bioperl through package manager(ppm), I am getting following errors:

ppm install failed: Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core

I don't know what the problem is. I have never used Bioperl Modules before.

Thanks.


-- 
--------
Vinay Kumar Mittal
MS,Bioinformatics
Georgia Institute of Technology


From rfrancis at ichr.uwa.edu.au  Wed Aug  6 21:11:28 2008
From: rfrancis at ichr.uwa.edu.au (Richard Francis)
Date: Thu, 07 Aug 2008 09:11:28 +0800
Subject: [Bioperl-l] AlignIO::clustalw match_line query
Message-ID: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>

Dear List,

I wonder if you can help.

I?m having trouble finding out on which criteria the conserved and
semi-conserved substitution decisions for a match line produced from the
match_line function in AlignIO are based.

I note that match_line produces the same output as an alignment match
line would from ClustalW and indeed is used in the AlignIO::clustalw
module, but are the substitution decisions based on the same Venn
diagram at http://www.ebi.ac.uk/Tools/clustalw2/clustalw_help.html#color
ie are they faithful to the generation of the match line from within
ClustalW itself?

I need to know this as part of a paper I?m writing so I would really
appreciate your help with this.

Kind regards and thanks in advance,

Richard Francis
#####################################################################################
This e-mail message has been scanned for Viruses and Content and cleared 
by MailMarshal
#####################################################################################


From jason at bioperl.org  Wed Aug  6 22:26:06 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 6 Aug 2008 19:26:06 -0700
Subject: [Bioperl-l] AlignIO::clustalw match_line query
In-Reply-To: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>
References: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>
Message-ID: <A7F598FF-C34F-466E-A73B-D714872201C8@bioperl.org>

Implemented independently, but it was based on what the clustalw  
documentation says.
The main code is in the match_line function in Bio::SimpleAlign.

See the CONSERVATION_GROUPS Hash which looks like this: So a  
'strong' (":") on the match line would be coded where the residues  
seen in a column are only 'S', 'T', or 'A' (for example).

            'strong' => [ qw(
				STA
				NEQK
				NHQK
				NDEQ
				QHRK
				MILV
				MILF
				HY
				FYW )],
             'weak' => [ qw(
				CSA
				ATV
				SAG
				STNK
				STPA
				SGND
				SNDEQK
				NDEQHK
				NEQHRK
				FVLIM
				HFY )],);
}

It was checked against clustalw output by hand when it was  
implemented. If you know of any inconsistencies, let use know.

-jason
On Aug 6, 2008, at 6:11 PM, Richard Francis wrote:

> Dear List,
>
> I wonder if you can help.
>
> I?m having trouble finding out on which criteria the conserved and
> semi-conserved substitution decisions for a match line produced  
> from the
> match_line function in AlignIO are based.
>
> I note that match_line produces the same output as an alignment match
> line would from ClustalW and indeed is used in the AlignIO::clustalw
> module, but are the substitution decisions based on the same Venn
> diagram at http://www.ebi.ac.uk/Tools/clustalw2/ 
> clustalw_help.html#color
> ie are they faithful to the generation of the match line from within
> ClustalW itself?
>
> I need to know this as part of a paper I?m writing so I would really
> appreciate your help with this.
>
> Kind regards and thanks in advance,
>
> Richard Francis
> ###################################################################### 
> ###############
> This e-mail message has been scanned for Viruses and Content and  
> cleared
> by MailMarshal
> ###################################################################### 
> ###############
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From betts at embl.de  Thu Aug  7 08:42:59 2008
From: betts at embl.de (Matthew Betts)
Date: Thu, 7 Aug 2008 14:42:59 +0200 (CEST)
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
Message-ID: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>


Hi,

Has any one tried to draw secondary structure with Bio::Graphics? i.e. two 
different types of glyph with different colours on the same track.

Could use a hash reference to get the different glyph types (would be nice 
if there was a cylinder glyph and a thick arrow glyph), or heterogeneous 
segments to get the different colours, but I can't see how to do both at 
the same time.

Any example code or suggestions on how I could implement it would be 
great.

Thanks,

Matthew

-- 
Matthew Betts PhD, Russell Group (Structural Bioinformatics)
EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
phone: +49 (0)6221 387 8305; mailto:betts at embl.de


From cain.cshl at gmail.com  Thu Aug  7 10:08:39 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 7 Aug 2008 10:08:39 -0400
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
In-Reply-To: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
References: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
Message-ID: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>

Hi Matthew,

I don't have any code examples, but people have used GBrowse for
protein secondary structure, which uses Bio::Graphics underneath the
hood.

If you want to put more than one glyph and/or more than one color in a
track, it is fairly easy.  You just need to provide a callback for
each option when you create the track, like this:

  $panel->add_track($features_array_ref,
                    -glyph       => sub { #code to set the glyph
according the attributes of the feature  },
                    -bgcolor     => sub { #code to set the color },
                    -fgcolor     => 'black',
                 ...etc...
  );

For more information, see the biographics howto:

  http://www.bioperl.org/wiki/HOWTO:Graphics

Scott


On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts <betts at embl.de> wrote:
>
> Hi,
>
> Has any one tried to draw secondary structure with Bio::Graphics? i.e. two
> different types of glyph with different colours on the same track.
>
> Could use a hash reference to get the different glyph types (would be nice
> if there was a cylinder glyph and a thick arrow glyph), or heterogeneous
> segments to get the different colours, but I can't see how to do both at
> the same time.
>
> Any example code or suggestions on how I could implement it would be
> great.
>
> Thanks,
>
> Matthew
>
> --
> Matthew Betts PhD, Russell Group (Structural Bioinformatics)
> EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
> phone: +49 (0)6221 387 8305; mailto:betts at embl.de
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


From betts at embl.de  Thu Aug  7 12:27:28 2008
From: betts at embl.de (Matthew Betts)
Date: Thu, 7 Aug 2008 18:27:28 +0200 (CEST)
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
In-Reply-To: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>
References: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
	<536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0808071823310.8169@pc-russell13>


Hi Scott,

Thanks for that, was a great help - I didn't realise I could use a code 
ref for anything other than the glyph name.

I'm now doing this:

$panel->add_track(
                  '-bgcolor' => sub {
				     my($feature) = @_;
                                     $feature->display_name eq 'strand' ? 'cyan' : 'magenta';
                                    },
                  '-strand_arrow' => sub {
                                          my($feature) = @_;
                                          $feature->display_name eq 'strand' ? 1 : 0;
                                         },
);

Matthew


On Thu, 7 Aug 2008, Scott Cain wrote:

> Hi Matthew,
> 
> I don't have any code examples, but people have used GBrowse for
> protein secondary structure, which uses Bio::Graphics underneath the
> hood.
> 
> If you want to put more than one glyph and/or more than one color in a
> track, it is fairly easy.  You just need to provide a callback for
> each option when you create the track, like this:
> 
>   $panel->add_track($features_array_ref,
>                     -glyph       => sub { #code to set the glyph
> according the attributes of the feature  },
>                     -bgcolor     => sub { #code to set the color },
>                     -fgcolor     => 'black',
>                  ...etc...
>   );
> 
> For more information, see the biographics howto:
> 
>   http://www.bioperl.org/wiki/HOWTO:Graphics
> 
> Scott
> 
> 
> 
> On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts <betts at embl.de> wrote:
> >
> > Hi,
> >
> > Has any one tried to draw secondary structure with Bio::Graphics? i.e. two
> > different types of glyph with different colours on the same track.
> >
> > Could use a hash reference to get the different glyph types (would be nice
> > if there was a cylinder glyph and a thick arrow glyph), or heterogeneous
> > segments to get the different colours, but I can't see how to do both at
> > the same time.
> >
> > Any example code or suggestions on how I could implement it would be
> > great.
> >
> > Thanks,
> >
> > Matthew
> >
> > --
> > Matthew Betts PhD, Russell Group (Structural Bioinformatics)
> > EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
> > phone: +49 (0)6221 387 8305; mailto:betts at embl.de
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> 


From jay at jays.net  Thu Aug  7 12:32:29 2008
From: jay at jays.net (Jay Hannah)
Date: Thu, 07 Aug 2008 11:32:29 -0500
Subject: [Bioperl-l] not BioPerl
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
Message-ID: <489B239D.8060305@jays.net>

Smithies, Russell wrote:
> Has anyone taken a look at the new Perl interface to the NCBI C++ Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>   

Do people actually use the NCBI C++ Toolkit for things outside of NCBI?  
What?

I tried to leverage it a year or so ago for an Entrez/sequence/search 
project and got nowhere.

j


From jcherry at ncbi.nlm.nih.gov  Thu Aug  7 13:06:28 2008
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 7 Aug 2008 13:06:28 -0400 (EDT)
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
Message-ID: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>

For those who may be wondering what this is about, a Perl interface to the 
NCBI C++ Toolkit is available at 
ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/.  The C++ Toolkit is 
the main code base that we develop and use at NCBI.  It includes many 
things that may be of interest to BioPerl users, such as sequence analysis 
algorithms, means for interacting with NCBI databases, and facilities for 
reading, writing, and manipulating NCBI data model objects (usually 
defined by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and 
readable from ASN.1 and XML).

Russell, I think you can make things work from behind a firewall by 
setting some environment variables: set CONN_FIREWALL to 1, possibly set 
CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT 
as appropriate.  Please email me if you can't get things to work.  I'll 
see that decent instructions for this are included in the next release.

Josh Cherry


On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:

> Has anyone taken a look at the new Perl interface to the NCBI C++
> Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>
>
> Russell Smithies


From tristan.lefebure at gmail.com  Thu Aug  7 13:35:24 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Thu, 7 Aug 2008 13:35:24 -0400
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
Message-ID: <200808071335.24668.tristan.lefebure@gmail.com>

Hi list,

I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
BioPerl (with the only difference that I'm using taxids instead of taxon 
names). Basically, the script generates a taxonomic tree given a list of 
taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
object, and then merge this object to a tree object that keeps growing. It 
runs very well with a small number of taxa, but with many taxa (>1000), it is 
very very very slow (about a week for 3000 taxa).

The slowness is due to the  function merge_lineage (line 65), which merges the 
existing tree object with a new taxon object. I guess that the difficulty 
with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
between the tree and the new taxon object...

Would you have any idea of how to get around the problem? Should I look under 
the hood of merge_lineage to try to improve it for large trees?

Thanks!

Version: bioperl-1.5.2_102
OS: GNU/Linux

-Tristan


From cjfields at illinois.edu  Thu Aug  7 13:38:53 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Aug 2008 12:38:53 -0500
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
In-Reply-To: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
References: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
Message-ID: <E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>

Josh,

Thanks for the update.  I saw that these are only binaries for linux  
32/64-bit.  Are there plans to either support other OS's (OS X, Win,  
etc) or to maybe make a release with the XS-bindings so users can work  
towards that?  With additional support I can see this easily fitting  
into several spots in BioPerl, but otherwise I'm unsure.

chris

On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote:

> For those who may be wondering what this is about, a Perl interface  
> to the NCBI C++ Toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/ 
> .  The C++ Toolkit is the main code base that we develop and use at  
> NCBI.  It includes many things that may be of interest to BioPerl  
> users, such as sequence analysis algorithms, means for interacting  
> with NCBI databases, and facilities for reading, writing, and  
> manipulating NCBI data model objects (usually defined by ASN.1  
> specifications; writeable as ASN.1, XML, and JSON, and readable from  
> ASN.1 and XML).
>
> Russell, I think you can make things work from behind a firewall by  
> setting some environment variables: set CONN_FIREWALL to 1, possibly  
> set CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and  
> CONN_HTTP_PROXY_PORT as appropriate.  Please email me if you can't  
> get things to work.  I'll see that decent instructions for this are  
> included in the next release.
>
> Josh Cherry
>
>
> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:
>
>> Has anyone taken a look at the new Perl interface to the NCBI C++
>> Toolkit?
>> Unfortunately, I can't even get their examples working as I'm  
>> behind a
>> firewall and documentation on setting proxy stuff is virtually
>> non-existant  :-(
>>
>>
>> Russell Smithies
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From jcherry at ncbi.nlm.nih.gov  Thu Aug  7 14:04:17 2008
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 7 Aug 2008 14:04:17 -0400 (EDT)
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
In-Reply-To: <E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>
References: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
	<E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>
Message-ID: <Pine.LNX.4.64.0808071359430.13868@widget0.be-md.ncbi.nlm.nih.gov>

Chris,

Support for other OS's is definitely a possibility, depending on community 
feedback (how useful are the wrappers in general, and how much demand is 
there for them on other platforms?).  I wish I could magically make them 
available for Windows and OS X, but there are some technical issues to 
work out.

Josh


On Thu, 7 Aug 2008, Chris Fields wrote:

> Josh,
>
> Thanks for the update.  I saw that these are only binaries for linux 
> 32/64-bit.  Are there plans to either support other OS's (OS X, Win, etc) or 
> to maybe make a release with the XS-bindings so users can work towards that? 
> With additional support I can see this easily fitting into several spots in 
> BioPerl, but otherwise I'm unsure.
>
> chris
>
> On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote:
>
>> For those who may be wondering what this is about, a Perl interface to the 
>> NCBI C++ Toolkit is available at 
>> ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/.  The C++ Toolkit is 
>> the main code base that we develop and use at NCBI.  It includes many 
>> things that may be of interest to BioPerl users, such as sequence analysis 
>> algorithms, means for interacting with NCBI databases, and facilities for 
>> reading, writing, and manipulating NCBI data model objects (usually defined 
>> by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and readable 
>> from ASN.1 and XML).
>> 
>> Russell, I think you can make things work from behind a firewall by setting 
>> some environment variables: set CONN_FIREWALL to 1, possibly set 
>> CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT 
>> as appropriate.  Please email me if you can't get things to work.  I'll see 
>> that decent instructions for this are included in the next release.
>> 
>> Josh Cherry
>> 
>> 
>> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:
>> 
>>> Has anyone taken a look at the new Perl interface to the NCBI C++
>>> Toolkit?
>>> Unfortunately, I can't even get their examples working as I'm behind a
>>> firewall and documentation on setting proxy stuff is virtually
>>> non-existant  :-(
>>> 
>>> 
>>> Russell Smithies
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>


From bix at sendu.me.uk  Thu Aug  7 18:20:29 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Aug 2008 23:20:29 +0100
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
 large trees
In-Reply-To: <200808071335.24668.tristan.lefebure@gmail.com>
References: <200808071335.24668.tristan.lefebure@gmail.com>
Message-ID: <489B752D.2080209@sendu.me.uk>

Tristan Lefebure wrote:
> I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
> BioPerl (with the only difference that I'm using taxids instead of taxon 
> names). Basically, the script generates a taxonomic tree given a list of 
> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
> object, and then merge this object to a tree object that keeps growing. It 
> runs very well with a small number of taxa, but with many taxa (>1000), it is 
> very very very slow (about a week for 3000 taxa).
> 
> The slowness is due to the  function merge_lineage (line 65), which merges the 
> existing tree object with a new taxon object. I guess that the difficulty 
> with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
> between the tree and the new taxon object...
> 
> Would you have any idea of how to get around the problem? Should I look under 
> the hood of merge_lineage to try to improve it for large trees?

Yes, please do. It might have been me that wrote that, in which case I 
didn't do anything fancy or consider the above problem.


From cjfields at illinois.edu  Thu Aug  7 20:42:16 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Aug 2008 19:42:16 -0500
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
In-Reply-To: <489B752D.2080209@sendu.me.uk>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
Message-ID: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>


On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:

> Tristan Lefebure wrote:
>> I'm using a script very similar to bp_taxonomy2tree.pl distributed  
>> with BioPerl (with the only difference that I'm using taxids  
>> instead of taxon names). Basically, the script generates a  
>> taxonomic tree given a list of taxids using the NCBI taxonomy db.  
>> For each taxon, it generates a taxon object, and then merge this  
>> object to a tree object that keeps growing. It runs very well with  
>> a small number of taxa, but with many taxa (>1000), it is very very  
>> very slow (about a week for 3000 taxa).
>> The slowness is due to the  function merge_lineage (line 65), which  
>> merges the existing tree object with a new taxon object. I guess  
>> that the difficulty with a big tree (i.e. more than 1000 leaf) is  
>> to find the nodes in common between the tree and the new taxon  
>> object...
>> Would you have any idea of how to get around the problem? Should I  
>> look under the hood of merge_lineage to try to improve it for large  
>> trees?
>
> Yes, please do. It might have been me that wrote that, in which case  
> I didn't do anything fancy or consider the above problem.

Actually I thought that was fixed; wasn't some caching added for that  
script at one point?

chris


From bix at sendu.me.uk  Fri Aug  8 03:50:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 08 Aug 2008 08:50:50 +0100
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
 large trees
In-Reply-To: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
	<7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
Message-ID: <489BFADA.1060308@sendu.me.uk>

Chris Fields wrote:
> 
> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
> 
>> Tristan Lefebure wrote:
>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed 
>>> with BioPerl (with the only difference that I'm using taxids instead 
>>> of taxon names). Basically, the script generates a taxonomic tree 
>>> given a list of taxids using the NCBI taxonomy db. For each taxon, it 
>>> generates a taxon object, and then merge this object to a tree object 
>>> that keeps growing. It runs very well with a small number of taxa, 
>>> but with many taxa (>1000), it is very very very slow (about a week 
>>> for 3000 taxa).
>>> The slowness is due to the  function merge_lineage (line 65), which 
>>> merges the existing tree object with a new taxon object. I guess that 
>>> the difficulty with a big tree (i.e. more than 1000 leaf) is to find 
>>> the nodes in common between the tree and the new taxon object...
>>> Would you have any idea of how to get around the problem? Should I 
>>> look under the hood of merge_lineage to try to improve it for large 
>>> trees?
>>
>> Yes, please do. It might have been me that wrote that, in which case I 
>> didn't do anything fancy or consider the above problem.
> 
> Actually I thought that was fixed;

Oh yeah. Looks like I did something related to 'speedup for 
merge_lineage()' on the 18th Dec 2006. Tristan, checkout 
Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.


From tristan.lefebure at gmail.com  Fri Aug  8 12:02:32 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Fri, 8 Aug 2008 12:02:32 -0400
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
In-Reply-To: <489BFADA.1060308@sendu.me.uk>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
	<7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
	<489BFADA.1060308@sendu.me.uk>
Message-ID: <d31f7c40808080902o70fbf78dub50c9ba660355ba0@mail.gmail.com>

Yes indeed, with the svn code it took 10 minutes (compared to one week!)
Thanks, -Tristan

On Fri, Aug 8, 2008 at 3:50 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Chris Fields wrote:
>
>>
>> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
>>
>>  Tristan Lefebure wrote:
>>>
>>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed with
>>>> BioPerl (with the only difference that I'm using taxids instead of taxon
>>>> names). Basically, the script generates a taxonomic tree given a list of
>>>> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon
>>>> object, and then merge this object to a tree object that keeps growing. It
>>>> runs very well with a small number of taxa, but with many taxa (>1000), it
>>>> is very very very slow (about a week for 3000 taxa).
>>>> The slowness is due to the  function merge_lineage (line 65), which
>>>> merges the existing tree object with a new taxon object. I guess that the
>>>> difficulty with a big tree (i.e. more than 1000 leaf) is to find the nodes
>>>> in common between the tree and the new taxon object...
>>>> Would you have any idea of how to get around the problem? Should I look
>>>> under the hood of merge_lineage to try to improve it for large trees?
>>>>
>>>
>>> Yes, please do. It might have been me that wrote that, in which case I
>>> didn't do anything fancy or consider the above problem.
>>>
>>
>> Actually I thought that was fixed;
>>
>
> Oh yeah. Looks like I did something related to 'speedup for
> merge_lineage()' on the 18th Dec 2006. Tristan, checkout
> Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.
>


From rvos at interchange.ubc.ca  Fri Aug  8 19:59:20 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Fri, 8 Aug 2008 16:59:20 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
Message-ID: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>

Hi,

while going through a large genbank file
(ftp://ftp.ncbi.nlm.nih.gov/genbank/gbpri21.seq.gz) I ran into malloc
errors. Just for the record (I doubt this does anyone any good), I
got:

perl(391) malloc: *** vm_allocate(size=8421376) failed (error code=3)
perl(391) malloc: *** error: can't allocate region
perl(391) malloc: *** set a breakpoint in szone_error to debug
Out of memory!

What I was trying to do is go through the file, and only write out
those seq objects that aren't human, and that have CDS features, i.e.:

################################################

#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

my $dir = shift @ARGV; # the directory with *.gz files
my $out = shift @ARGV; # the directory to write to...
mkdir $out if not -d $out; # ...which may need to be created

opendir my $dirhandle, $dir or die $!;
for my $archive ( readdir $dirhandle ) {
    next if $archive !~ /\.gz$/;
    my $file = $archive;
    $file =~ s/\.gz$//;

    # external call to the gunzip utility,
    # such that we keep the archive
    system( "gunzip -c \"${dir}/${archive}\" > \"${dir}/${file}\"" );

    # object that parses genbank files,
    # returns Bio::Seq objects
    my $reader = Bio::SeqIO->new(
        '-format' => 'genbank',
        '-file'   => "${dir}/${file}"
    );

    # object that receives Bio::Seq objects,
    # writes genbank files
    my $writer = Bio::SeqIO->new(
        '-format' => 'genbank',
        '-file'   => ">${out}/${file}",
    );
    while ( my $seq = $reader->next_seq ) {
        my $name = $seq->species->binomial;
        if ( $name ne 'Homo sapiens' ) {

            # search for coding sequences among the features
            my $HasCDS = 0;
            FEATURE: for my $f ( $seq->get_SeqFeatures ) {
                if ( $f->primary_tag eq 'CDS' ) {
                    $HasCDS++;
                    last FEATURE;
                }
            }

            # write the sequence to file
            if ( $HasCDS ) {
                $writer->write_seq( $seq );
            }
        }
    }

    # delete the extracted, unfiltered file
    unlink "${dir}/${file}";
}

################################################

Okay, so it runs out of memory. Can I do something to fix that? Should
I flush on either of the I/O objects after each $seq? Could there be
memory leaks in the Bio::Seq objects? Should I $seq->DESTROY them
explicitly or something like that?

Thanks,

Rutger

-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From David.Messina at sbc.su.se  Sat Aug  9 07:04:04 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 9 Aug 2008 13:04:04 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
Message-ID: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>

Hi Rutger,
I ran your script on the same genbank file and, while I did not run out of
memory, I did see what appears to be a memory leak. Even when I manually
undef'd the reader and writer object every 1000 records, memory usage
continued to grow.

I can't quite figure out what's going on, though.

If I run a different program using SeqIO (the simple sequence converter from
the SeqIO HOWTO) on the same input file, I don't see this same runaway
growth.

Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the
sequence converter holds steady at about 12MB of real memory, whereas on
5.10 it grows, albeit slowly, for as long as the program is executing. When
I killed it about 20% of the way through the file, it was up to  about 44MB
of real memory.

Anyone else have a chance to look at this?


Dave


From rvos at interchange.ubc.ca  Sat Aug  9 07:36:20 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Sat, 9 Aug 2008 04:36:20 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
Message-ID: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>

Hi Dave,

thanks for the reply. The memory usage is in fact much more atrocious
than just 44 mb: I'm actually looping over all 36 such archives (the
genbank primates), and on my macbook it steadily increase to over 1gb
of memory. What seemed to be helping somewhat is to call
$reader->flush and $writer->flush after each seq, at least to the
extent that I make it through that one file, but last time I tried I
didn't get much further: the whole terminal process died shortly after
instead. I seem to vaguely recall that even if perl free()'s memory,
that doesn't necessarily mean that the memory is returned to the OS
for the runtime of the program - depending on the OS and perl version.
What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.

Rutger

On Sat, Aug 9, 2008 at 4:04 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Rutger,
> I ran your script on the same genbank file and, while I did not run out of
> memory, I did see what appears to be a memory leak. Even when I manually
> undef'd the reader and writer object every 1000 records, memory usage
> continued to grow.
>
> I can't quite figure out what's going on, though.
> If I run a different program using SeqIO (the simple sequence converter from
> the SeqIO HOWTO) on the same input file, I don't see this same runaway
> growth.
> Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the
> sequence converter holds steady at about 12MB of real memory, whereas on
> 5.10 it grows, albeit slowly, for as long as the program is executing. When
> I killed it about 20% of the way through the file, it was up to  about 44MB
> of real memory.
> Anyone else have a chance to look at this?
>
> Dave
>


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From David.Messina at sbc.su.se  Sat Aug  9 08:58:56 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 9 Aug 2008 14:58:56 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
Message-ID: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>

>
>  I seem to vaguely recall that even if perl free()'s memory that doesn't
> necessarily mean that the memory is returned to the OS for the runtime of
> the program


I believe that's correct.


> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>

perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.


Dave


From cjfields at illinois.edu  Sat Aug  9 09:56:19 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Aug 2008 08:56:19 -0500
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
Message-ID: <57147D88-ABE6-44E0-8D76-790B0C735438@illinois.edu>

There is definitely a memory leak.  I can confirm it on OSX 10.4/10.5  
using bioperl-live.

I'll try looking into it this weekend, but I can't promise when it'll  
be fixed; my laptop is on the fritz.

chris

On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:

>>
>> I seem to vaguely recall that even if perl free()'s memory that  
>> doesn't
>> necessarily mean that the memory is returned to the OS for the  
>> runtime of
>> the program
>
>
> I believe that's correct.
>
>
>
>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>
>
> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>
>
> Dave


From cjfields at illinois.edu  Sat Aug  9 10:15:23 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Aug 2008 09:15:23 -0500
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
Message-ID: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>

Forgot to mention, maybe we can file this as a bug?  It's a pretty  
serious one but it should be easy to narrow down; the change had to be  
introduced fairly recently.

chris

On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:

>>
>> I seem to vaguely recall that even if perl free()'s memory that  
>> doesn't
>> necessarily mean that the memory is returned to the OS for the  
>> runtime of
>> the program
>
>
> I believe that's correct.
>
>
>
>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>
>
> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat Aug  9 12:00:46 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 9 Aug 2008 12:00:46 -0400
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
	<9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
Message-ID: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>

This smells of circular references somewhere. I think the first point  
I would go looking is the species storing - does the problem go away  
if you turn that off? Maybe the version of weaken() is at play here?

	-hilmar

On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:

> Forgot to mention, maybe we can file this as a bug?  It's a pretty  
> serious one but it should be easy to narrow down; the change had to  
> be introduced fairly recently.
>
> chris
>
> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>
>>>
>>> I seem to vaguely recall that even if perl free()'s memory that  
>>> doesn't
>>> necessarily mean that the memory is returned to the OS for the  
>>> runtime of
>>> the program
>>
>>
>> I believe that's correct.
>>
>>
>>
>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>
>>
>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>
>>
>> Dave
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Aug  9 12:07:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 9 Aug 2008 12:07:30 -0400
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>
References: <18792782.post@talk.nabble.com>
	<33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>
Message-ID: <591AE8EB-4D45-4859-A93E-EA9BF01CA9C6@gmx.net>

This looks like a neat trick. Do you think it's worth including as a  
SimpleAlign method (obviously w/o the printing to STDOUT)? I can  
imagine that a lot of people might appreciate it.

	-hilmar

On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:

> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>
>>
>> Hi there,
>> I'm trying to write a perl script to scan an aligned multiple entry  
>> fasta
>> file and find possible primers. So far I've produced a string which  
>> contains
>> bases which match all sequences and * where they don't match e.g.
>> 1) TTAGCCTAA
>> 2) TTAGCAGAA
>> 3) TTACCCTAA
>>
>> would give TTA*C**AA.
>>
>> I want to parse this string and pull out all sequences which are  
>> 18-21 bp in
>> length and have no more than 4 * in them.
>>
>> So far, I've got this:
>>
>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>> print "$1\n";
>> }
>>
>> hoping to match all fragments 18-21 characters in length. However  
>> even that
>> doesn't work as it has essentially chunked it into 21 char blocks,  
>> rather
>> than what I hoped for of
>> 0-18
>> 0-19
>> 0-20
>> 0-21
>> 1-19
>> 1-20
>> 1-21
>> 1-22
>>
>> etc.
>>
>> Can anyone let me know if this is already possible in BioPerl, or  
>> how one
>> would go about it with regex. Sadly I'm fairly new to perl and  
>> getting to
>> grips with BioPerl, so please treat me gently :).
>>
>> Many thanks,
>>
>> Ben
>
> There is a trick to this which is discussed more extensively in  
> 'Mastering Regular Expressions'.  Essentially you have to embed code  
> into the regex and trick the parser into backtracking using a  
> negative lookahead.  The match itself fails (i.e. no match is  
> returned), but the embedded code is executed for each match attempt,
>
> The following script is a slight modification of one I used which  
> checks the consensus string from the input alignment (in aligned  
> FASTA format here), extracts the alignment slice using that match,  
> then spit the alignment out to STDOUT in clustalw format.  This  
> should work for perl 5.8 and up, but it's only been tested on perl  
> 5.10.  You should be able to use this to fit what you want.
>
> my $in = Bio::AlignIO->new(-file => $file,
>                           -format => 'fasta');
> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>                           -format => 'clustalw');
>
> while (my $aln = $in->next_aln) {
>    my $c = $aln->consensus_string(100);
>    my @matches;
>    $c =~ m/
>        ([GTAC?]{18,21})
>        (?{my $match = check_match($1);
>           push @matches, [$match,
>                           pos(),
>                           length($match)]
>              if defined $match;})
>        (?!)
>        /xig;
>    for my $match (@matches) {
>        my ($hit, $st, $end) = ($match->[0],
>                                $match->[1] - $match->[2] + 1,
>                                $match->[1]);
>        my $newaln = $aln->slice($st, $end);
>        $out->write_aln($newaln);
>    }
> }
>
> sub check_match {
>    my $match = shift;
>    return unless $match;
>    my $ct = $match =~ tr/?/?/;
>    return $match if $ct <= 4;
> }
>
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From rvos at interchange.ubc.ca  Sat Aug  9 13:47:33 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Sat, 9 Aug 2008 10:47:33 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
	<9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
	<897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>
Message-ID: <2bb9b24a0808091047t46a6bfa8r7e11a3a1537180@mail.gmail.com>

I am sure my version of weaken() works as advertised. Is there a way
to turn off species storing from outside the code base or do you mean
I go and start commenting bits out in Bio::SeqIO::genbank (or
wherever)?

On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
> This smells of circular references somewhere. I think the first point I
> would go looking is the species storing - does the problem go away if you
> turn that off? Maybe the version of weaken() is at play here?
>
>        -hilmar
>
> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:
>
>> Forgot to mention, maybe we can file this as a bug?  It's a pretty serious
>> one but it should be easy to narrow down; the change had to be introduced
>> fairly recently.
>>
>> chris
>>
>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>>
>>>>
>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't
>>>> necessarily mean that the memory is returned to the OS for the runtime
>>>> of
>>>> the program
>>>
>>>
>>> I believe that's correct.
>>>
>>>
>>>
>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>>
>>>
>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>>
>>>
>>> Dave
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Marie-Claude Hofmann
>> College of Veterinary Medicine
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From hartzell at alerce.com  Sat Aug  9 14:33:51 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 9 Aug 2008 11:33:51 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
Message-ID: <18589.58127.57270.352974@almost.alerce.com>


I'm pretty sure that this fixes the problem:

g.

Index: Bio/Species.pm
===================================================================
--- Bio/Species.pm	(revision 14791)
+++ Bio/Species.pm	(working copy)
@@ -340,6 +340,7 @@
 		}
 		
 		$self->{_species} = $species;
+		weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
     }
 	return $self->{_species};
 }


From cjfields at illinois.edu  Sat Aug  9 15:08:27 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Sat,  9 Aug 2008 14:08:27 -0500 (CDT)
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
Message-ID: <20080809140827.BHN28056@expms6.cites.uiuc.edu>

I'm pretty sure it's not due to a particular version of weaken(), though it does 
sound like a circular references issue.  I have tried this with perl 5.8.6, 5.8.8, 
and 5.10 (all Mac OS, either 10.4 or 10.5); all have the same memory leak 
issues.  

You can try using SeqBuilder to get rid of the Bio::Species object.  I'll give that a 
try when I can.  Unfortunately my laptop is now with the local Apple geniuses 
awaiting a motherboard, so I can't get to it right away (I'll give it a try on my 
wife's laptop).

chris

---- Original message ----
>Date: Sat, 9 Aug 2008 10:47:33 -0700
>From: "Rutger Vos" <rvos at interchange.ubc.ca>  
>Subject: Re: [Bioperl-l] malloc errors while using Bio::SeqIO?  
>To: "Hilmar Lapp" <hlapp at gmx.net>
>Cc: Chris Fields <cjfields at illinois.edu>, bioperl list <bioperl-l at lists.open-
bio.org>
>
>I am sure my version of weaken() works as advertised. Is there a way
>to turn off species storing from outside the code base or do you mean
>I go and start commenting bits out in Bio::SeqIO::genbank (or
>wherever)?
>
>On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> This smells of circular references somewhere. I think the first point I
>> would go looking is the species storing - does the problem go away if you
>> turn that off? Maybe the version of weaken() is at play here?
>>
>>        -hilmar
>>
>> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:
>>
>>> Forgot to mention, maybe we can file this as a bug?  It's a pretty serious
>>> one but it should be easy to narrow down; the change had to be 
introduced
>>> fairly recently.
>>>
>>> chris
>>>
>>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>>>
>>>>>
>>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't
>>>>> necessarily mean that the memory is returned to the OS for the runtime
>>>>> of
>>>>> the program
>>>>
>>>>
>>>> I believe that's correct.
>>>>
>>>>
>>>>
>>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>>>
>>>>
>>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>>>
>>>>
>>>> Dave
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Marie-Claude Hofmann
>>> College of Veterinary Medicine
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> 
=================================================
==========
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> 
=================================================
==========
>>
>>
>>
>>
>
>
>
>-- 
>Dr. Rutger A. Vos
>Department of zoology
>University of British Columbia
>http://www.nexml.org
>http://rutgervos.blogspot.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Sat Aug  9 20:17:52 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 9 Aug 2008 17:17:52 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <18589.58127.57270.352974@almost.alerce.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
Message-ID: <18590.13232.892714.952555@almost.alerce.com>


George Hartzell writes:
 > 
 > I'm pretty sure that this fixes the problem:
 > 
 > g.
 > 
 > Index: Bio/Species.pm
 > ===================================================================
 > --- Bio/Species.pm	(revision 14791)
 > +++ Bio/Species.pm	(working copy)
 > @@ -340,6 +340,7 @@
 >  		}
 >  		
 >  		$self->{_species} = $species;
 > +		weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
 >      }
 >  	return $self->{_species};
 >  }

Actually, it's a bit clearer with the weaken moved up in the block so
that it's closer to where the new tree is allocated.

Chris suggested that I go ahead and I commit it.

g.


From David.Messina at sbc.su.se  Sun Aug 10 05:57:07 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 10 Aug 2008 11:57:07 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <18590.13232.892714.952555@almost.alerce.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
	<18590.13232.892714.952555@almost.alerce.com>
Message-ID: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>

Nice, George -- holds steady at about 32MB now.
Much better. :)


Dave


From hartzell at alerce.com  Sun Aug 10 12:51:39 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sun, 10 Aug 2008 09:51:39 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
	<18590.13232.892714.952555@almost.alerce.com>
	<628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>
Message-ID: <18591.7323.244987.436383@almost.alerce.com>

Dave Messina writes:
 > Nice, George -- holds steady at about 32MB now.
 > Much better. :)

Good to hear.

Bonus points go to rvos@ for providing such a nice clean bug report and
test case, it made running it down much more appealing.

g.


From valiente at lsi.upc.edu  Mon Aug 11 04:09:37 2008
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Mon, 11 Aug 2008 11:09:37 +0300
Subject: [Bioperl-l] get_lca method very slow on many nodes
In-Reply-To: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
References: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
Message-ID: <B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>

Despite the speedup for merge_lineage, the get_lca method still runs  
very slow on a large number of nodes (say, 1500 nodes) and it does  
not rely on merge_lineage. In the get_lca method, all the lineages  
are first collected in @paths in order to later find their $lca,  
while it might be faster to process each $path as soon as it is  
obtained with the get_lineage_nodes method. Any other ideas how to  
speedup the get_lca method? Thanks,

Gabriel


From lmanchon at univ-montp2.fr  Mon Aug 11 12:32:20 2008
From: lmanchon at univ-montp2.fr (Laurent Manchon)
Date: Mon, 11 Aug 2008 18:32:20 +0200
Subject: [Bioperl-l] protein pattern scan
Message-ID: <5.0.2.1.2.20080811182952.00bebff0@pop.univ-montp2.fr>

Hi,

do you know if it's possible to search protein motif in a multifasta 
protein file
using bioperl to return the motif, the position and the name of the 
corresponding sequence ?

thank you for your help.


+---------------------------------------------+
  Laurent Manchon
  Email: lmanchon at univ-montp2.fr
+---------------------------------------------+


From cjfields at illinois.edu  Mon Aug 11 13:32:05 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Mon, 11 Aug 2008 12:32:05 -0500 (CDT)
Subject: [Bioperl-l] protein pattern scan
Message-ID: <20080811123205.BHO45474@expms6.cites.uiuc.edu>

This is covered the FAQ:

http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F

chris

---- Original message ----
>Date: Mon, 11 Aug 2008 18:32:20 +0200
>From: Laurent Manchon <lmanchon at univ-montp2.fr>  
>Subject: [Bioperl-l] protein pattern scan  
>To: bioperl-l at lists.open-bio.org
>
>Hi,
>
>do you know if it's possible to search protein motif in a multifasta 
>protein file
>using bioperl to return the motif, the position and the name of the 
>corresponding sequence ?
>
>thank you for your help.
>
>
>+---------------------------------------------+
>  Laurent Manchon
>  Email: lmanchon at univ-montp2.fr
>+---------------------------------------------+
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Mon Aug 11 13:44:37 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Aug 2008 18:44:37 +0100
Subject: [Bioperl-l] get_lca method very slow on many nodes
In-Reply-To: <B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>
References: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
	<B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>
Message-ID: <48A07A85.6050601@sendu.me.uk>

Gabriel Valiente wrote:
> Despite the speedup for merge_lineage, the get_lca method still runs 
> very slow on a large number of nodes (say, 1500 nodes) and it does not 
> rely on merge_lineage. In the get_lca method, all the lineages are first 
> collected in @paths in order to later find their $lca, while it might be 
> faster to process each $path as soon as it is obtained with the 
> get_lineage_nodes method.

If you try that idea out and it works, please do commit it. I've no 
further suggestions atm, but I haven't had a chance to look at it to 
remind myself what happens.


From cjfields at illinois.edu  Mon Aug 11 15:50:38 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Mon, 11 Aug 2008 14:50:38 -0500 (CDT)
Subject: [Bioperl-l] Finding possible primers regex
Message-ID: <20080811145038.BHO59267@expms6.cites.uiuc.edu>

When I can I could try generating a method which accepts a regex/Bio::Tools::SeqPattern and returns an AlignIO stream or array of SimpleAlign instances (the former could be attached to a temp file for iteration).  Any preference?

chris

---- Original message ----
>Date: Sat, 9 Aug 2008 12:07:30 -0400
>From: Hilmar Lapp <hlapp at gmx.net>  
>Subject: Re: [Bioperl-l] Finding possible primers regex  
>To: Chris Fields <cjfields at uiuc.edu>
>Cc: Benbo <btemperton at googlemail.com>, Bioperl-l at lists.open-bio.org
>
>This looks like a neat trick. Do you think it's worth including as a  
>SimpleAlign method (obviously w/o the printing to STDOUT)? I can  
>imagine that a lot of people might appreciate it.
>
>	-hilmar
>
>On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:
>
>> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>>
>>>
>>> Hi there,
>>> I'm trying to write a perl script to scan an aligned multiple entry  
>>> fasta
>>> file and find possible primers. So far I've produced a string which  
>>> contains
>>> bases which match all sequences and * where they don't match e.g.
>>> 1) TTAGCCTAA
>>> 2) TTAGCAGAA
>>> 3) TTACCCTAA
>>>
>>> would give TTA*C**AA.
>>>
>>> I want to parse this string and pull out all sequences which are  
>>> 18-21 bp in
>>> length and have no more than 4 * in them.
>>>
>>> So far, I've got this:
>>>
>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>> print "$1\n";
>>> }
>>>
>>> hoping to match all fragments 18-21 characters in length. However  
>>> even that
>>> doesn't work as it has essentially chunked it into 21 char blocks,  
>>> rather
>>> than what I hoped for of
>>> 0-18
>>> 0-19
>>> 0-20
>>> 0-21
>>> 1-19
>>> 1-20
>>> 1-21
>>> 1-22
>>>
>>> etc.
>>>
>>> Can anyone let me know if this is already possible in BioPerl, or  
>>> how one
>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>> getting to
>>> grips with BioPerl, so please treat me gently :).
>>>
>>> Many thanks,
>>>
>>> Ben
>>
>> There is a trick to this which is discussed more extensively in  
>> 'Mastering Regular Expressions'.  Essentially you have to embed code  
>> into the regex and trick the parser into backtracking using a  
>> negative lookahead.  The match itself fails (i.e. no match is  
>> returned), but the embedded code is executed for each match attempt,
>>
>> The following script is a slight modification of one I used which  
>> checks the consensus string from the input alignment (in aligned  
>> FASTA format here), extracts the alignment slice using that match,  
>> then spit the alignment out to STDOUT in clustalw format.  This  
>> should work for perl 5.8 and up, but it's only been tested on perl  
>> 5.10.  You should be able to use this to fit what you want.
>>
>> my $in = Bio::AlignIO->new(-file => $file,
>>                           -format => 'fasta');
>> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>>                           -format => 'clustalw');
>>
>> while (my $aln = $in->next_aln) {
>>    my $c = $aln->consensus_string(100);
>>    my @matches;
>>    $c =~ m/
>>        ([GTAC?]{18,21})
>>        (?{my $match = check_match($1);
>>           push @matches, [$match,
>>                           pos(),
>>                           length($match)]
>>              if defined $match;})
>>        (?!)
>>        /xig;
>>    for my $match (@matches) {
>>        my ($hit, $st, $end) = ($match->[0],
>>                                $match->[1] - $match->[2] + 1,
>>                                $match->[1]);
>>        my $newaln = $aln->slice($st, $end);
>>        $out->write_aln($newaln);
>>    }
>> }
>>
>> sub check_match {
>>    my $match = shift;
>>    return unless $match;
>>    my $ct = $match =~ tr/?/?/;
>>    return $match if $ct <= 4;
>> }
>>
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>-- 
>===========================================================
>: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>===========================================================
>
>
>


From hlapp at gmx.net  Mon Aug 11 22:35:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 11 Aug 2008 22:35:13 -0400
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <20080811145038.BHO59267@expms6.cites.uiuc.edu>
References: <20080811145038.BHO59267@expms6.cites.uiuc.edu>
Message-ID: <B79B827A-E66C-417D-A5E8-B4FE96791089@gmx.net>

Actually, now that you ask I'm wondering whether one wouldn't  
sometimes want to retain the relationship between the match and the  
resulting spliced alignment? If so, neither AlignIO nor array would  
accomplish that, right?

Other than that I myself don't have a strong preference either way. I  
suppose AlignIO stream is somewhat more extensible, since as you say  
it could be coupled to a file if the resulting set of alignments is  
really large.

	-hilmar

On Aug 11, 2008, at 3:50 PM, Christopher Fields wrote:

> When I can I could try generating a method which accepts a regex/ 
> Bio::Tools::SeqPattern and returns an AlignIO stream or array of  
> SimpleAlign instances (the former could be attached to a temp file  
> for iteration).  Any preference?
>
> chris
>
> ---- Original message ----
>> Date: Sat, 9 Aug 2008 12:07:30 -0400
>> From: Hilmar Lapp <hlapp at gmx.net>
>> Subject: Re: [Bioperl-l] Finding possible primers regex
>> To: Chris Fields <cjfields at uiuc.edu>
>> Cc: Benbo <btemperton at googlemail.com>, Bioperl-l at lists.open-bio.org
>>
>> This looks like a neat trick. Do you think it's worth including as a
>> SimpleAlign method (obviously w/o the printing to STDOUT)? I can
>> imagine that a lot of people might appreciate it.
>>
>> 	-hilmar
>>
>> On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:
>>
>>> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>>>
>>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple entry
>>>> fasta
>>>> file and find possible primers. So far I've produced a string which
>>>> contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char blocks,
>>>> rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>
>>> There is a trick to this which is discussed more extensively in
>>> 'Mastering Regular Expressions'.  Essentially you have to embed code
>>> into the regex and trick the parser into backtracking using a
>>> negative lookahead.  The match itself fails (i.e. no match is
>>> returned), but the embedded code is executed for each match attempt,
>>>
>>> The following script is a slight modification of one I used which
>>> checks the consensus string from the input alignment (in aligned
>>> FASTA format here), extracts the alignment slice using that match,
>>> then spit the alignment out to STDOUT in clustalw format.  This
>>> should work for perl 5.8 and up, but it's only been tested on perl
>>> 5.10.  You should be able to use this to fit what you want.
>>>
>>> my $in = Bio::AlignIO->new(-file => $file,
>>>                          -format => 'fasta');
>>> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>>>                          -format => 'clustalw');
>>>
>>> while (my $aln = $in->next_aln) {
>>>   my $c = $aln->consensus_string(100);
>>>   my @matches;
>>>   $c =~ m/
>>>       ([GTAC?]{18,21})
>>>       (?{my $match = check_match($1);
>>>          push @matches, [$match,
>>>                          pos(),
>>>                          length($match)]
>>>             if defined $match;})
>>>       (?!)
>>>       /xig;
>>>   for my $match (@matches) {
>>>       my ($hit, $st, $end) = ($match->[0],
>>>                               $match->[1] - $match->[2] + 1,
>>>                               $match->[1]);
>>>       my $newaln = $aln->slice($st, $end);
>>>       $out->write_aln($newaln);
>>>   }
>>> }
>>>
>>> sub check_match {
>>>   my $match = shift;
>>>   return unless $match;
>>>   my $ct = $match =~ tr/?/?/;
>>>   return $match if $ct <= 4;
>>> }
>>>
>>>
>>> chris
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mirhan at indiana.edu  Mon Aug 11 23:46:35 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Mon, 11 Aug 2008 23:46:35 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <D8FD887C-86A9-4E53-9A59-10FDE295C067@duke.edu>
Message-ID: <C4C67FDB.5282%mirhan@indiana.edu>


Hi,
Yes it is true that it's similar to get_all_Annotations, it's basically a recursive version of it.
I wanted to provide a method to get at nested annotations without going through all the if(isa collection) do recursive call.. etc. everytime, because most of the xml elements are implemented as nested annotation collections to the nodes.
( I am contemplating on using tagtrees instead of nested annotation collections in the future, but as of now, Annotation::tagtrees was documented as a temporary implementation, so I passed on that option. )
I forgot about the interface part.
At least for my purpose I would think it's a good function to have in the interface.
I agree that adding a recursive option to the get_all_Annotation would be better.

Mira


On 8/11/08 11:28 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:33 AM, Han, Mira wrote:

> Added get_deep_Annotations in Annotation::Collection
> in order to get annotations that are within nested collections.


I hope I'm not contradicting Chris here, but we will probably want to
think about this a bit more. Your implementation won't work as it is
assuming an interface function that isn't defined on the interface
(both get_deep_Annotations() and _deep_annotation_helper()). Also, it
does nearly the same as get_all_Annotations(), and passing on the keys
to nested collections should maybe simply be an option to that method.
Alternatively, one could add an option -recurse to get_Annotation.

The other difference you note is that your method does not flatten the
nested annotations, but unless I am missing something your
implementation does flatten annotations from nested collections.

So even if we need a separate method for this, something like
get_nested_Annotations() would probably be a more appropriate name,
and if we do need a separate method, it should be compelling enough to
add it to the interface too (as otherwise your code will only work
with certain implementation classes).

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From mirhan at indiana.edu  Tue Aug 12 00:00:28 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Tue, 12 Aug 2008 00:00:28 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu>
Message-ID: <C4C6831C.5287%mirhan@indiana.edu>


Oh yes,
I meant get_Annotations,
I want a get_Annotations that is recursive and passes the keys to the recursive calls.


On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is
already recursive)

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From hlapp at duke.edu  Mon Aug 11 23:54:43 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 11 Aug 2008 23:54:43 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4C67FDB.5282%mirhan@indiana.edu>
References: <C4C67FDB.5282%mirhan@indiana.edu>
Message-ID: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu>

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's  
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is  
already recursive)

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From mrphysh at juno.com  Tue Aug 12 10:30:36 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Tue, 12 Aug 2008 14:30:36 GMT
Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem
Message-ID: <20080812.083036.25924.0@webmail02.vgs.untd.com>

I am studying bioperl and making progress.  I have been struggling with the database retrieval from 
on-line databases.  This is an example................

#!/usr/bin/perl -w
 
use Bio::Perl;

 
$seq_object = get_sequence('swiss',"ROA1_HUMAN");
 
write_sequence(">roa1.fasta",'fasta',$seq_object);

exit;

This script gives

Can't locate IO/String.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at ee_bpo.pl line 12.
BEGIN failed--compilation aborted at ee_bpo.pl line 12.

I have chased around with the paths in @INC, using "use lib'.

This is an install problem.  The original installation was with perl Makefile.pl.  I reinstalled over the old with cpan.  stuff like this:
cpan> o conf prerequisites_policy follow
cpan> i /bioperl/
cpan> install Bundle::BioPerl
cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
This all seemed to proceed smoothly.

this guy did not produce an error.
use Bio::Perl;

I am almost thinking that the problem is with the perl. But regular ftp through perl works:

use Net::FTP;#I found this in usr/share/perl/5.8.8/Net
As a perl command this module seems to work.

I looked in the archives and found nothing.  I think I have done my homework.  any ideas?

I run Ubuntu on a pentium III (and love it).  the version of Ubuntu is new.  the Perl (and MySQL) came with the OS: perl 5.8.8

John Brigham in Denver.


____________________________________________________________
Click to get a free auto insurance quotes from top companies.
http://thirdpartyoffers.juno.com/TGL2141/fc/Ioyw6i3m2nsox4VCjepKpyEFCMEzNF4I2x42PAQjIIwUwo0E7h1wL0/


From jay at jays.net  Tue Aug 12 11:08:59 2008
From: jay at jays.net (Jay Hannah)
Date: Tue, 12 Aug 2008 10:08:59 -0500
Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem
In-Reply-To: <20080812.083036.25924.0@webmail02.vgs.untd.com>
References: <20080812.083036.25924.0@webmail02.vgs.untd.com>
Message-ID: <CA21F550-DF97-41EE-BE8C-CBDFE67B0148@jays.net>

On Aug 12, 2008, at 2:30 PM, mrphysh at juno.com wrote:
> Can't locate IO/String.pm in @INC
...
> cpan> install Bundle::BioPerl
> cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> This all seemed to proceed smoothly

bioperl-1.2.1 is very old. Apparently Bundle::BioPerl is out of date?

Here's lots of info about installing BioPerl:

    http://www.bioperl.org/wiki/Getting_BioPerl

I recommend using bioperl-live directly from SVN, but I'm sort of a  
rebel like that.  :)

Alternately, you could try just doing a

    cpan> install IO::String

HTH,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From heikki at sanbi.ac.za  Thu Aug 14 09:14:48 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 14 Aug 2008 15:14:48 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
Message-ID: <200808141514.49124.heikki@sanbi.ac.za>

A generic method for retrieving nodes from a Bio::Tree::TreeI objects is 
Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the 'id' 
attribute unless a field is given. I can retrieve nodes based on internal id 
like this:

   $tree->find_node(-internal_id => $internal_id);

I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that retrieves by id. 
However, the POD documentation claims that it retrieves by internal id.

What needs to be done?

A. Fix the doc to speak about id
B. Fix to code to retrieve by internal_id
C. Fix the doc and create findnode_by_internal_id()
C. Remove findnode_by_id() as redundant and confusing
D. Deprecate findnode_by_id() as redundant and confusing

There are no tests for findnode_by_id() which to me tilts selection to D and A 
for now.

Any other opinions?

	-Heikki


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From hlapp at gmx.net  Thu Aug 14 18:28:20 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 14 Aug 2008 18:28:20 -0400
Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API?
In-Reply-To: <48A448DD.4000206@psb.ugent.be>
References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com>
	<48A448DD.4000206@psb.ugent.be>
Message-ID: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>

Hi Erick,

how did you determine that go-perl is specific to GO? I've found it to  
work quite well for any kind of OBO-formatted ontology.

Also, you note that BioPerl doesn't have the ability to write in  
certain formats, and to intersect and "unify" (would you mind  
explaining what you mean by that?) ontologies. It seems that your  
implementation of RDF etc export isn't really reusable or modular in  
any way, but I'd love to bring the intersection function over to  
BioPerl (BTW when you decided to roll your own ontology API, did you  
get the impression that BioPerl isn't receptive to you adding to it?).  
Would you mind pointing me to the place in the code where I would find  
that, as I can't seem to find it.

	-hilmar

On Aug 14, 2008, at 11:01 AM, Erick Antezana wrote:

> Hi Arne,
>
> if you plan to work with PERL, you might take a look at ONTO-PERL :
>
> http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn042
> http://search.cpan.org/dist/ONTO-PERL/
> http://search.cpan.org/src/EASR/ONTO-PERL-1.13/doc/example00.html
>
> ONTO-PERL has been used intensively to build the Cell Cycle Ontology.
>
> cheers,
> Erick
>
> Arne Muller wrote:
>> Dear All,
>>
>> I'm new to this list and don't know much about ontologies in general
>> (I worked a bit with GO some time ago).
>>
>> Let me explain my problem: We have several related vocabularies
>> (non-hierarchical and redundant because of different spellings etc
>> ...) to describe organs and tissues in our department, and we need to
>> map each of these vocabs to all of our other legacy vocabs that
>> describe similar concepts. We'd like to use the adult mouse anatomy
>> ontology and modify/extend it with additional terms (if necessary),
>> synonyms and dbXrefs. Most of our vocabs should be mapped as dbXrefs
>> to existing terms in the MA ontology. The goal is that different  
>> units
>> in our department use slightly different vocabulary to describe
>> samples, and we now need link these different system (always the same
>> old story ... ;-).
>>
>> For the moment I'm not planning to turn our messy legacy vocabs into
>> OBO formated ontologies and to map them via cross products and the  
>> OBO
>> relation ontology - though this might be the most  proper way to do
>> ... (comments are welcome).
>>
>> I'll have to write an "easy to use" tool that allows our data curator
>> to easily map the legacy vocabs as dbXrefs of terms in the MA
>> ontology. The question is, how am I gonna do this? I've a fairly good
>> idea of how my software (java webapp) should look like, but are there
>> any APIs and implementations of the OBO model as well as a DB schema
>> and mappings between the model and the schema?
>>
>> I've had a look into the OLS from the EBI that seems to be fairly
>> simple (which is good ;-) and that uses the oboedit.jar somewhere at
>> the back-end. I've also found something like an obo api on
>> http://wiki.geneontology.org/index.php/OBO-Edit:_Getting_the_Source_Code#.28Optional.29_Getting_the_OBO_API_from_Subclipse
>> but so far I've not found any documentation nor examples on how to  
>> get
>> started.
>>
>> I'd be happy to hear how developers and bioinformatics people use obo
>> in their own tools (I better ask before going DIY ...).
>>
>>   thanks a lot for your comments and help
>>   +kind regards,
>>
>>   Arne
>>
>> ------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's  
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win  
>> great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in  
>> the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Obo-discuss mailing list
>> Obo-discuss at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/obo-discuss
>>
>
> -- 
> ==================================================================
> Erick Antezana                    http://www.cellcycleontology.org
> PhD student
> Tel:+32 (0)9 331 38 24                        fax:+32 (0)9 3313809
> VIB Department of Plant Systems Biology, Ghent University
> Technologiepark 927, 9052 Gent, BELGIUM
> erant at psb.ugent.be                  http://www.psb.ugent.be/~erant
> ==================================================================
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Obo-discuss mailing list
> Obo-discuss at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/obo-discuss

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mjanis at chem.ucla.edu  Thu Aug 14 19:37:05 2008
From: mjanis at chem.ucla.edu (Michael Janis)
Date: Thu, 14 Aug 2008 16:37:05 -0700
Subject: [Bioperl-l] Code to contribute
Message-ID: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>

Hi,

 
I've had some perl code lying around for what seems like forever and I'd
like to contribute it to bioperl, if such facilities don't already exist in
bioperl.  The code implements shuffling (DNA or RNA) keeping the
dinucleotide composition (and codon usage) intact through a Eularian path
approach as described in Altschul and Erickson (1985).  The code seeds the
Eularian paths by keeping the first and last nucleotide invariant in the
shuffle - which has minimal detrimental effects to the purpose of the
algorithm, in my experience.

 
A quick search on the bioperl website shows that there is a mutation.pls
script, and facilities for using Sean Eddy's SQUID C library, which
implements the same function (I wrote this particular function before I knew
how to use C).  As such, it's probably not as elegant as Sean Eddy's
implementation, but it works - and it's entirely in perl.

 
The bioperl developer pages suggest a post to the mailing list as the best
place to start contributing to bioperl.  Is this a useful function to add to
the project?

 
Best Regards,

 
Michael

 
-------------------------------

Michael Janis

 <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu

-------------------------------

 
From rvos at interchange.ubc.ca  Thu Aug 14 19:51:43 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Thu, 14 Aug 2008 16:51:43 -0700
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>
	<2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
Message-ID: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>

Sounds exciting! I bet the general advice you'll get is to i) check
out the latest code from svn ii) see which bioperl objects/interfaces
(e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
iii) write a class that performs the algorithm as some sort of
analysis factory taking the sequence object (or ideally object
interface) as an input iv) run that class by the mailing list v) check
it into svn.

On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu> wrote:
> Hi,
>
>
>
> I've had some perl code lying around for what seems like forever and I'd
> like to contribute it to bioperl, if such facilities don't already exist in
> bioperl.  The code implements shuffling (DNA or RNA) keeping the
> dinucleotide composition (and codon usage) intact through a Eularian path
> approach as described in Altschul and Erickson (1985).  The code seeds the
> Eularian paths by keeping the first and last nucleotide invariant in the
> shuffle - which has minimal detrimental effects to the purpose of the
> algorithm, in my experience.
>
>
>
> A quick search on the bioperl website shows that there is a mutation.pls
> script, and facilities for using Sean Eddy's SQUID C library, which
> implements the same function (I wrote this particular function before I knew
> how to use C).  As such, it's probably not as elegant as Sean Eddy's
> implementation, but it works - and it's entirely in perl.
>
>
>
> The bioperl developer pages suggest a post to the mailing list as the best
> place to start contributing to bioperl.  Is this a useful function to add to
> the project?
>
>
>
> Best Regards,
>
>
>
> Michael
>
>
>
> -------------------------------
>
> Michael Janis
>
>  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>
> -------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From mjanis at chem.ucla.edu  Thu Aug 14 19:55:04 2008
From: mjanis at chem.ucla.edu (Michael Janis)
Date: Thu, 14 Aug 2008 16:55:04 -0700
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>
	<2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
Message-ID: <008701c8fe69$2cee6020$86cb2060$@ucla.edu>

Thanks, Rutger, I'll do exactly that!  (give me a few days)


Best Regards,

Michael

-------------------------------
Michael Janis
mjanis at chem.ucla.edu
-------------------------------


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rutger Vos
Sent: Thursday, August 14, 2008 4:52 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Fwd: Code to contribute

Sounds exciting! I bet the general advice you'll get is to i) check
out the latest code from svn ii) see which bioperl objects/interfaces
(e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
iii) write a class that performs the algorithm as some sort of
analysis factory taking the sequence object (or ideally object
interface) as an input iv) run that class by the mailing list v) check
it into svn.

On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu> wrote:
> Hi,
>
>
>
> I've had some perl code lying around for what seems like forever and I'd
> like to contribute it to bioperl, if such facilities don't already exist
in
> bioperl.  The code implements shuffling (DNA or RNA) keeping the
> dinucleotide composition (and codon usage) intact through a Eularian path
> approach as described in Altschul and Erickson (1985).  The code seeds the
> Eularian paths by keeping the first and last nucleotide invariant in the
> shuffle - which has minimal detrimental effects to the purpose of the
> algorithm, in my experience.
>
>
>
> A quick search on the bioperl website shows that there is a mutation.pls
> script, and facilities for using Sean Eddy's SQUID C library, which
> implements the same function (I wrote this particular function before I
knew
> how to use C).  As such, it's probably not as elegant as Sean Eddy's
> implementation, but it works - and it's entirely in perl.
>
>
>
> The bioperl developer pages suggest a post to the mailing list as the best
> place to start contributing to bioperl.  Is this a useful function to add
to
> the project?
>
>
>
> Best Regards,
>
>
>
> Michael
>
>
>
> -------------------------------
>
> Michael Janis
>
>  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>
> -------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Thu Aug 14 21:17:23 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 15 Aug 2008 13:17:23 +1200
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>

You forgot 2 points,

vi) write documentation/examples, and  vii) write tests

;-)


Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Rutger Vos
> Sent: Friday, 15 August 2008 11:52 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Fwd: Code to contribute
> 
> Sounds exciting! I bet the general advice you'll get is to i) check
> out the latest code from svn ii) see which bioperl objects/interfaces
> (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
> iii) write a class that performs the algorithm as some sort of
> analysis factory taking the sequence object (or ideally object
> interface) as an input iv) run that class by the mailing list v) check
> it into svn.
> 
> On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu>
wrote:
> > Hi,
> >
> >
> >
> > I've had some perl code lying around for what seems like forever and
I'd
> > like to contribute it to bioperl, if such facilities don't already
exist in
> > bioperl.  The code implements shuffling (DNA or RNA) keeping the
> > dinucleotide composition (and codon usage) intact through a Eularian
path
> > approach as described in Altschul and Erickson (1985).  The code
seeds the
> > Eularian paths by keeping the first and last nucleotide invariant in
the
> > shuffle - which has minimal detrimental effects to the purpose of
the
> > algorithm, in my experience.
> >
> >
> >
> > A quick search on the bioperl website shows that there is a
mutation.pls
> > script, and facilities for using Sean Eddy's SQUID C library, which
> > implements the same function (I wrote this particular function
before I knew
> > how to use C).  As such, it's probably not as elegant as Sean Eddy's
> > implementation, but it works - and it's entirely in perl.
> >
> >
> >
> > The bioperl developer pages suggest a post to the mailing list as
the best
> > place to start contributing to bioperl.  Is this a useful function
to add to
> > the project?
> >
> >
> >
> > Best Regards,
> >
> >
> >
> > Michael
> >
> >
> >
> > -------------------------------
> >
> > Michael Janis
> >
> >  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
> >
> > -------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> --
> Dr. Rutger A. Vos
> Department of zoology
> University of British Columbia
> http://www.nexml.org
> http://rutgervos.blogspot.com
> 
> 
> 
> --
> Dr. Rutger A. Vos
> Department of zoology
> University of British Columbia
> http://www.nexml.org
> http://rutgervos.blogspot.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mirhan at indiana.edu  Fri Aug 15 02:11:46 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Fri, 15 Aug 2008 02:11:46 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4C6831C.5287%mirhan@indiana.edu>
Message-ID: <C4CA9662.52BD%mirhan@indiana.edu>


Hi,
I've fixed the get_deep_Annotations() to get_nested_Annotations()
It has arguments -keys and -recursive,
And behaves exactly like get_Annotations() when recursive is not set (tested by replacing the get_Annotations() in the Annotation.t)
I made it a new function instead of modifying get_Annotations() because I wasn't sure how to modify it to be backwards compatible.
I thought of adding the function to the interface AnnotationCollectionI,
But it seemed like get_all_Annotations() was missing from the interface as well,
So decided to ask if it should be added to the interface at all.
Isn't it possible that certain implementations of the interface has functions that are only specific to that implementation?

Mira


On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:


Oh yes,
I meant get_Annotations,
I want a get_Annotations that is recursive and passes the keys to the recursive calls.


On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is
already recursive)

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at illinois.edu  Fri Aug 15 09:59:42 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Aug 2008 08:59:42 -0500
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>
Message-ID: <BF55B9C8-B43A-404A-8085-4C8200B9B4AA@illinois.edu>

Agreed!

We're hoping to move to a more structured core after 1.6 which will  
require decent documentation and tests for inclusion.  My feeling is  
one should free to add code samples to relevant pages/sections in the  
BioPerl wiki, or write up your own HOWTO (it's not terribly hard to  
do, and it adds to your karma).

chris

On Aug 14, 2008, at 8:17 PM, Smithies, Russell wrote:

> You forgot 2 points,
>
> vi) write documentation/examples, and  vii) write tests
>
> ;-)
>
>
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Rutger Vos
>> Sent: Friday, 15 August 2008 11:52 a.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Fwd: Code to contribute
>>
>> Sounds exciting! I bet the general advice you'll get is to i) check
>> out the latest code from svn ii) see which bioperl objects/interfaces
>> (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
>> iii) write a class that performs the algorithm as some sort of
>> analysis factory taking the sequence object (or ideally object
>> interface) as an input iv) run that class by the mailing list v)  
>> check
>> it into svn.
>>
>> On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu>
> wrote:
>>> Hi,
>>>
>>>
>>>
>>> I've had some perl code lying around for what seems like forever and
> I'd
>>> like to contribute it to bioperl, if such facilities don't already
> exist in
>>> bioperl.  The code implements shuffling (DNA or RNA) keeping the
>>> dinucleotide composition (and codon usage) intact through a Eularian
> path
>>> approach as described in Altschul and Erickson (1985).  The code
> seeds the
>>> Eularian paths by keeping the first and last nucleotide invariant in
> the
>>> shuffle - which has minimal detrimental effects to the purpose of
> the
>>> algorithm, in my experience.
>>>
>>>
>>>
>>> A quick search on the bioperl website shows that there is a
> mutation.pls
>>> script, and facilities for using Sean Eddy's SQUID C library, which
>>> implements the same function (I wrote this particular function
> before I knew
>>> how to use C).  As such, it's probably not as elegant as Sean Eddy's
>>> implementation, but it works - and it's entirely in perl.
>>>
>>>
>>>
>>> The bioperl developer pages suggest a post to the mailing list as
> the best
>>> place to start contributing to bioperl.  Is this a useful function
> to add to
>>> the project?
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> Michael
>>>
>>>
>>>
>>> -------------------------------
>>>
>>> Michael Janis
>>>
>>> <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>>>
>>> -------------------------------
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> --
>> Dr. Rutger A. Vos
>> Department of zoology
>> University of British Columbia
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>>
>>
>>
>> --
>> Dr. Rutger A. Vos
>> Department of zoology
>> University of British Columbia
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at illinois.edu  Fri Aug 15 10:12:10 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Aug 2008 09:12:10 -0500
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4CA9662.52BD%mirhan@indiana.edu>
References: <C4CA9662.52BD%mirhan@indiana.edu>
Message-ID: <3371D481-1416-4976-9846-83CF83395039@illinois.edu>

The method get_all_annotation_keys() is present in  
AnnotationCollectionI but not get_all_Annotations(), though I doubt it  
is set up for recursive retrieval (something that might be worth  
testing).  I don't have a problem adding this in.  Hilmar, thoughts?

chris

On Aug 15, 2008, at 1:11 AM, Han, Mira wrote:

>
> Hi,
> I've fixed the get_deep_Annotations() to get_nested_Annotations()
> It has arguments -keys and -recursive,
> And behaves exactly like get_Annotations() when recursive is not set  
> (tested by replacing the get_Annotations() in the Annotation.t)
> I made it a new function instead of modifying get_Annotations()  
> because I wasn't sure how to modify it to be backwards compatible.
> I thought of adding the function to the interface  
> AnnotationCollectionI,
> But it seemed like get_all_Annotations() was missing from the  
> interface as well,
> So decided to ask if it should be added to the interface at all.
> Isn't it possible that certain implementations of the interface has  
> functions that are only specific to that implementation?
>
> Mira
>
>
> On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:
>
>
> Oh yes,
> I meant get_Annotations,
> I want a get_Annotations that is recursive and passes the keys to  
> the recursive calls.
>
>
>
> On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
>
> Hi Mira -
>
> On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:
>
>> Yes it is true that it's similar to get_all_Annotations, it's
>> basically a recursive version of it.
>
>
> I suppose you mean get_Annotations(), right? (get_all_Annotations() is
> already recursive)
>
>        -hilmar
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From jorvis at gmail.com  Fri Aug 15 15:45:23 2008
From: jorvis at gmail.com (Joshua Orvis)
Date: Fri, 15 Aug 2008 15:45:23 -0400
Subject: [Bioperl-l] help creating de novo GFF3
Message-ID: <a58187640808151245o48af6311hb8afee40341b7ef2@mail.gmail.com>

I don't have a lot of experience with Bioperl and have used it mostly for
simple format conversions or parsing Genbank files.  I need to create a
quick script to create GFF3 and decided to give bioperl a try again instead
of just printing the columns myself but have had a few problems.  My
apologies for the narrative here but I know it can sometimes be informative
to hear 'how' a user arrived at a problem rather than just knowing the
problem itself.

Is there a documented explicit mapping between the GFF3 columns and the
predefined tags (ID, Name, etc.) and their Bioperl object attribute
equivalents?  Is it preferrable to create Bio::SeqFeature::Generic objects
and pass them to Bio::Tools::GFF->write_feature or rather to create
Bio::SeqFeature::Annotated and pass them to Bio::FeatureIO::gff ?   I may be
overlooking it, but a simple tutorial showing how to create and define a new
sequence object, attach annotations to it and dump in GFF format seems to be
missing.  This seems like a basic thing to do - most of the documentation I
find is about converting between formats rather than creating new
annotation.

Here are some of the problems I (a typical naive user?) ran into when
adventuring with bioperl here.   My first attempt resulted in the string
"SEQ" as column 0 in all my GFF output.  I thought that maybe this was
because my features weren't 'attached' to a sequence, so I created a
Bio::Seq::RichSeq object and tried both (separately):

$seq->add_SeqFeature( $feat );

and

$feat->attach_seq( $seq );

Neither changed the first column of output.  Looking at the
docs.bioperl.orgmethods for Bio::SeqFeature::Generic I found the
seq_id attribute, which
came with the warning: "This attribute should *not* be used in GFF dumping"
- but since it's the only thing I did that worked, I used it anyway.

Next I wanted to have ID tags within my last column.  I first tried setting
all the relevant attributes I could see on my features (id, primary_tag,
display_name, display_id, etc.) but none of these caused ID=? to be
created.  Next, I tried something like this:


my $feat = new Bio::SeqFeature::Annotated (
        -start        => $start,
        -end          => $end,
        -strand       => $strand,
        -primary      => 'gene',
        -seq_id       => $asmbl_id,  ## this works but is discouraged
        -tag          => { ID => $transcript->{pub_locus},
                           product_name => $transcript->{com_name},
                           ec_number => $transcript->{'ec#'},
                           gene_symbol => $transcript->{gene_sym}
                         }
);

My hopes that passing it via the -tag option would do the trick failed, as
it created a line like this instead:

10263   .       gene    58512   56983   .       +       .       iD=AN9220.4;

Notice the 'ID' -> 'iD' transformation (without any command-line warnings).

I'm still stuck on this one (Parent would be next) but overall guidance or
pointers to a tutorial/documentation I'm overlooking would be great.

JO


From jason at bioperl.org  Fri Aug 15 19:00:04 2008
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Aug 2008 16:00:04 -0700
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <200808141514.49124.heikki@sanbi.ac.za>
References: <200808141514.49124.heikki@sanbi.ac.za>
Message-ID: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>

I think D.

There should probably only be one find_node function.

findnode_by_id was written by Ramiro for the re-rooting code and I  
guess it wasn't checked to reduce unneeded functions.  I don't have  
any problems removing/deprecating it but will need to update the code  
that depends on it to use find_node properly.

-jason
On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:

> A generic method for retrieving nodes from a Bio::Tree::TreeI  
> objects is
> Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the  
> 'id'
> attribute unless a field is given. I can retrieve nodes based on  
> internal id
> like this:
>
>   $tree->find_node(-internal_id => $internal_id);
>
> I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that  
> retrieves by id.
> However, the POD documentation claims that it retrieves by internal  
> id.
>
> What needs to be done?
>
> A. Fix the doc to speak about id
> B. Fix to code to retrieve by internal_id
> C. Fix the doc and create findnode_by_internal_id()
> C. Remove findnode_by_id() as redundant and confusing
> D. Deprecate findnode_by_id() as redundant and confusing
>
> There are no tests for findnode_by_id() which to me tilts selection  
> to D and A
> for now.
>
> Any other opinions?
>
> 	-Heikki
>
>
> -- 
> ______ _/      _/_____________________________________________________
>      _/      _/
>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>   _/  _/  _/  SANBI, South African National Bioinformatics Institute
>  _/  _/  _/  University of Western Cape, South Africa
>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason at bioperl.org


From schmidtc at udel.edu  Fri Aug 15 19:42:44 2008
From: schmidtc at udel.edu (Carl Schmidt)
Date: Fri, 15 Aug 2008 19:42:44 -0400
Subject: [Bioperl-l] lazy symbol binding
Message-ID: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu>

When I attempt  bp_load_gff.pl
I get the following error:

dyld: lazy symbol binding failed: Symbol not found: _mysql_init
   Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ 
auto/DBD/mysql/mysql.bundle
   Expected in: dynamic lookup

dyld: Symbol not found: _mysql_init
   Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ 
auto/DBD/mysql/mysql.bundle
   Expected in: dynamic lookup

Trace/BPT trap

Any suggestions?  I apologize if this is the wrong place for posting  
such a question.

Thanks
Carl

Carl J. Schmidt
Department of Animal & Food Sciences
University of Delaware
Newark, DE 19716
schmidtc at udel.edu
http://copland.udel.edu/~schmidtc


From rvos at interchange.ubc.ca  Fri Aug 15 20:11:48 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Fri, 15 Aug 2008 17:11:48 -0700
Subject: [Bioperl-l] Fwd:  lazy symbol binding
In-Reply-To: <2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com>
References: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu>
	<2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com>
Message-ID: <2bb9b24a0808151711m5eba5500k468a0effc711f3c@mail.gmail.com>

---------- Forwarded message ----------
From: Rutger Vos <rutgeraldo at gmail.com>
Date: Fri, Aug 15, 2008 at 5:11 PM
Subject: Re: [Bioperl-l] lazy symbol binding
To: Carl Schmidt <schmidtc at udel.edu>


This looks like a faulty install of the combo of DBI/DBD::mysql

Specifically, the perl driver (DBD::mysql) for the mysql database
builds a dynamic library on installation, which the DBI interface
tries to load dynamically, but fails.

Unfortunately, this is by no means bioperl related. Try a reinstall as
per the instructions at:

http://search.cpan.org/~capttofu/DBD-mysql-4.008/lib/DBD/mysql/INSTALL.pod

On Fri, Aug 15, 2008 at 4:42 PM, Carl Schmidt <schmidtc at udel.edu> wrote:
> When I attempt  bp_load_gff.pl
> I get the following error:
>
> dyld: lazy symbol binding failed: Symbol not found: _mysql_init
>  Referenced from:
> /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle
>  Expected in: dynamic lookup
>
> dyld: Symbol not found: _mysql_init
>  Referenced from:
> /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle
>  Expected in: dynamic lookup
>
> Trace/BPT trap
>
> Any suggestions?  I apologize if this is the wrong place for posting such a
> question.
>
> Thanks
> Carl
>
> Carl J. Schmidt
> Department of Animal & Food Sciences
> University of Delaware
> Newark, DE 19716
> schmidtc at udel.edu
> http://copland.udel.edu/~schmidtc
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From hlapp at duke.edu  Sat Aug 16 13:43:46 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Sat, 16 Aug 2008 13:43:46 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <3371D481-1416-4976-9846-83CF83395039@illinois.edu>
References: <C4CA9662.52BD%mirhan@indiana.edu>
	<3371D481-1416-4976-9846-83CF83395039@illinois.edu>
Message-ID: <D096DA37-D36A-4A3E-A6C3-98254CAC6B8B@duke.edu>


On Aug 15, 2008, at 10:12 AM, Chris Fields wrote:

> The method get_all_annotation_keys() is present in  
> AnnotationCollectionI but not get_all_Annotations()

Interesting. I wonder whether that was a result of the forward and  
reverse surgeries done to the Annotation* interfaces. (I'm off-line as  
I write this so can't check.)

> , though I doubt it is set up for recursive retrieval (something  
> that might be worth testing). I don't have a problem adding this  
> in.  Hilmar, thoughts?

get_all_Annotations() has always been recursive (similarly as  
get_all_SeqFeatures() is for SeqI). However, the recursive behavior is  
different from the behavior that Mira wants.

Specifically, if get_all_Annotations() finds a nested collection under  
a matching tag, it will consider the entire nested collection as  
match, and returns the recursively flattened out annotation objects it  
contains. What Mira needs (if I am understanding her implementation  
correctly) is recursively retrieving annotations if their tag matches  
the query key or set of keys. I.e., a nested collection would be  
searched for matching tags even if the tag of the collection itself  
does not match, and if it does match, only those of the contained  
annotations would be returned that have matching tags.

I'm not sure whether it's better to fold both behaviors into one  
method which has an optional argument to control which one is desired,  
or to have two methods. I'm leaning towards having two methods,  
because support for an added optional argument in external  
implementations of the interface is hard to test for, as opposed to  
testing for the presence/absence of a new method. On the other hand,  
if the existing method wasn't even on the interface to begin with  
(which I'm not yet convinced about) then that shouldn't really be an  
issue.

It seems Chris you are also for adding a second method (and putting  
get_all_Annotations() (back) into the interface)? Does anyone else  
have thoughts or preferences on this?

	-hilmar

>
>
> chris
>
> On Aug 15, 2008, at 1:11 AM, Han, Mira wrote:
>
>>
>> Hi,
>> I've fixed the get_deep_Annotations() to get_nested_Annotations()
>> It has arguments -keys and -recursive,
>> And behaves exactly like get_Annotations() when recursive is not  
>> set (tested by replacing the get_Annotations() in the Annotation.t)
>> I made it a new function instead of modifying get_Annotations()  
>> because I wasn't sure how to modify it to be backwards compatible.
>> I thought of adding the function to the interface  
>> AnnotationCollectionI,
>> But it seemed like get_all_Annotations() was missing from the  
>> interface as well,
>> So decided to ask if it should be added to the interface at all.
>> Isn't it possible that certain implementations of the interface has  
>> functions that are only specific to that implementation?
>>
>> Mira
>>
>>
>> On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:
>>
>>
>> Oh yes,
>> I meant get_Annotations,
>> I want a get_Annotations that is recursive and passes the keys to  
>> the recursive calls.
>>
>>
>>
>> On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
>>
>> Hi Mira -
>>
>> On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:
>>
>>> Yes it is true that it's similar to get_all_Annotations, it's
>>> basically a recursive version of it.
>>
>>
>> I suppose you mean get_Annotations(), right? (get_all_Annotations()  
>> is
>> already recursive)
>>
>>       -hilmar
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From heikki at sanbi.ac.za  Sun Aug 17 03:02:31 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 17 Aug 2008 09:02:31 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
References: <200808141514.49124.heikki@sanbi.ac.za>
	<A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
Message-ID: <200808170902.32485.heikki@sanbi.ac.za>


Done.

findnode_by_id() was not and is not used anywhere in BioPerl core code.

Thanks for input,

   -Heikki

On Saturday 16 August 2008 01:00:04 Jason Stajich wrote:
> I think D.
>
> There should probably only be one find_node function.
>
> findnode_by_id was written by Ramiro for the re-rooting code and I
> guess it wasn't checked to reduce unneeded functions.  I don't have
> any problems removing/deprecating it but will need to update the code
> that depends on it to use find_node properly.
>
> -jason
>
> On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:
> > A generic method for retrieving nodes from a Bio::Tree::TreeI
> > objects is
> > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the
> > 'id'
> > attribute unless a field is given. I can retrieve nodes based on
> > internal id
> > like this:
> >
> >   $tree->find_node(-internal_id => $internal_id);
> >
> > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that
> > retrieves by id.
> > However, the POD documentation claims that it retrieves by internal
> > id.
> >
> > What needs to be done?
> >
> > A. Fix the doc to speak about id
> > B. Fix to code to retrieve by internal_id
> > C. Fix the doc and create findnode_by_internal_id()
> > C. Remove findnode_by_id() as redundant and confusing
> > D. Deprecate findnode_by_id() as redundant and confusing
> >
> > There are no tests for findnode_by_id() which to me tilts selection
> > to D and A
> > for now.
> >
> > Any other opinions?
> >
> > 	-Heikki
> >
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Sun Aug 17 03:02:31 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 17 Aug 2008 09:02:31 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
References: <200808141514.49124.heikki@sanbi.ac.za>
	<A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
Message-ID: <200808170902.32485.heikki@sanbi.ac.za>


Done.

findnode_by_id() was not and is not used anywhere in BioPerl core code.

Thanks for input,

   -Heikki

On Saturday 16 August 2008 01:00:04 Jason Stajich wrote:
> I think D.
>
> There should probably only be one find_node function.
>
> findnode_by_id was written by Ramiro for the re-rooting code and I
> guess it wasn't checked to reduce unneeded functions.  I don't have
> any problems removing/deprecating it but will need to update the code
> that depends on it to use find_node properly.
>
> -jason
>
> On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:
> > A generic method for retrieving nodes from a Bio::Tree::TreeI
> > objects is
> > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the
> > 'id'
> > attribute unless a field is given. I can retrieve nodes based on
> > internal id
> > like this:
> >
> >   $tree->find_node(-internal_id => $internal_id);
> >
> > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that
> > retrieves by id.
> > However, the POD documentation claims that it retrieves by internal
> > id.
> >
> > What needs to be done?
> >
> > A. Fix the doc to speak about id
> > B. Fix to code to retrieve by internal_id
> > C. Fix the doc and create findnode_by_internal_id()
> > C. Remove findnode_by_id() as redundant and confusing
> > D. Deprecate findnode_by_id() as redundant and confusing
> >
> > There are no tests for findnode_by_id() which to me tilts selection
> > to D and A
> > for now.
> >
> > Any other opinions?
> >
> > 	-Heikki
> >
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From mike.thon at gmail.com  Mon Aug 18 01:01:43 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Mon, 18 Aug 2008 07:01:43 +0200
Subject: [Bioperl-l] Build.PL options?
Message-ID: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>

Hi all - I am trying to write a port of bioperl 1.5.2 to enable its  
installation with the macports system (www.macports.org).  I'm not too  
familiar with the Build.PL build system - is there any way to disable  
the dependency resolution that the build script does (i.e. without  
patching the script)?
Thanks
Mike


From David.Messina at sbc.su.se  Mon Aug 18 03:37:53 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 18 Aug 2008 09:37:53 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
Message-ID: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>

Hi Mike,

Great to hear you're planning add BioPerl to MacPorts. Thanks!

I'm afraid I don't know how to disable the dependency resolution, but while
you're waiting for others to chime in , here are a couple of things you
might try if you haven't already:

- The Build.PL for BioPerl is configured to automatically write out a
Makefile.PL for you. If you're more familiar with MakeFile.PL, then you
could work with that.

Module::Build tries to maintain a certain level of cross-compatibility with
ExtUtils::MakeMaker; how do you disable dependency resolution with the
Makefile.PL system?


- The Build.PL API is pretty flexible, and it's easy to change just about
any behavior by passing parameters to Build.PL on the command line. See


http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/API.pod

and


http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/Cookbook.pm


I'm curious -- it seems to me that a major purpose of Build.PL and
Makefile.PL is specifying dependencies (and installing them where
necessary). Does MacPorts override that and do its own dependency-checking
for Perl modules?


Dave


From neetisomaiya at gmail.com  Mon Aug 18 07:45:38 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 18 Aug 2008 17:15:38 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
Message-ID: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>

Hi,

I am fetching data from the ent gene file of KEGG which is available here :
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent

I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
gene names and pathways in which they participate. I am getting the gene
names fine. But this method

"for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
}"

does'nt seem to be working. I am not able to get the data of the pathways in
which the gene is involved.

Can someone please suggest how I can get the pathway data of genes from the
KEGG ent file??

Thanks.

-- 
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Mon Aug 18 08:13:08 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 18 Aug 2008 17:43:08 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
Message-ID: <764978cf0808180513k75388671if72ca7913f6e7fc2@mail.gmail.com>

Hi,

I am fetching data from the ent gene file of KEGG which is available here :
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent

I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
gene names and pathways in which they participate. I am getting the gene
names fine. But this method

"for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
}"

does'nt seem to be working. I am not able to get the data of the pathways in
which the gene is involved.

Can someone please suggest how I can get the pathway data of genes from the
KEGG ent file??

Thanks.

-- 
-Neeti
Even my blood says, B positive


From johnsonm at gmail.com  Mon Aug 18 09:26:26 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 08:26:26 -0500
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
Message-ID: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>

On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya <neetisomaiya at gmail.com> wrote:

> I am fetching data from the ent gene file of KEGG which is available here :
> ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent
>
> I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
> gene names and pathways in which they participate. I am getting the gene
> names fine. But this method
>
> "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
> }"
>
> does'nt seem to be working. I am not able to get the data of the pathways in
> which the gene is involved.
>
> Can someone please suggest how I can get the pathway data of genes from the
> KEGG ent file??

What exactly do you mean by "doesn't seem to be working" and what
version of BioPerl are you using?  The code below seems to function as
expected with BioPerl 1.5.2, producing output like this:

hsa04612  Antigen processing and presentation
hsa01430  Cell Communication
hsa04020  Calcium signaling pathway
hsa04080  Neuroactive ligand-receptor interaction
hsa04540  Gap junction
...
...
...

#!/wherever/bin/perl

use strict;
use warnings;

use Bio::SeqIO;


my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]);

while (my $seq = $seqio->next_seq()) {

    foreach my $pathway ($seq->annotation->get_Annotations('pathway')) {

        ## $pathway should be a Bio::Annotation::Comment
        print $pathway->text(), "\n";

    }

}


From johnsonm at gmail.com  Mon Aug 18 09:29:38 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 08:29:38 -0500
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
	<ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
Message-ID: <ebf5eb170808180629j7813a2d7w2a9b238bd455b962@mail.gmail.com>

On Mon, Aug 18, 2008 at 8:26 AM, Mark Johnson <johnsonm at gmail.com> wrote:

> What exactly do you mean by "doesn't seem to be working" and what
> version of BioPerl are you using?  The code below seems to function as
> expected with BioPerl 1.5.2, producing output like this:

Note that I downloaded
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent and
provided the path to it as the argument to the script.


From mike.thon at gmail.com  Mon Aug 18 15:48:02 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Mon, 18 Aug 2008 21:48:02 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
	<628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>
Message-ID: <5BE575B1-67E1-41B1-AB2F-728558377DDA@gmail.com>


On Aug 18, 2008, at 9:37 AM, Dave Messina wrote:

>
> I'm curious -- it seems to me that a major purpose of Build.PL and  
> Makefile.PL is specifying dependencies (and installing them where  
> necessary). Does MacPorts override that and do its own dependency- 
> checking for Perl modules?
>
Hi Dave - Thanks for the links- I will check them out.  MacPorts can  
resolve dependencies that are specified for each package, much like  
rpm and other linux packaging systems, so its probably better to  
disable the dependency resolution in the bioperl build script and let  
macports handle them.  It looks like I can patch the build.PL script  
pretty easily, unless I can find a better way.

Mike


From johnsonm at gmail.com  Mon Aug 18 16:53:48 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 15:53:48 -0500
Subject: [Bioperl-l] Bio::Annotation issues with BioSQL
Message-ID: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>

    I'm presently refactoring an in-house protein annotation pipeline
and converting it to use BioSQL as a data store.  I've noticed some
slightly screwy behavior with regard to how some of the
Bio::Annotation classes are handled:

-Instances of Bio::Annotation::SimpleValue and
Bio::Annotation::StructuredValue attached to the annotation collection
for a sequence feature (Bio::SeqFeature::Generic) are converted to
tags/values on the feature.
-Instances of Bio::AnnotationDBLink with attached comments loose the comment.

    I'm storing and retrieving things thusly:

my $dbadp = Bio::DB::BioDB->new(
                                                   -database => 'biosql',
                                                   -user        => $user',
                                                   -pass       => $pass,
                                                   -dbname  => $ora_instance,
                                                   -driver      => 'Oracle'
                            );

my $adp = $dbadp->get_object_adaptor("Bio::SeqI");

my $seq = Bio::Seq->new(
                        -id                         => 'DEBUG001',
                        -accession_number => 'DBG001',
                        -desc                     => 'Debug Sequence',
                        -seq                      => 'GATTACA',
                        -namespace           => 'DEBUG',
                       );

my $feature = Bio::SeqFeature::Generic->new(
                                            -seq_id           => 'DEBUG001',
                                            -display_name => 'FEAT0001',
                                            -primary          => 'debug',
                                            -source           => 'test',
                                            -start              => 3,
                                            -end               => 5,
                                            -strand           => 1,
                                           );

my $dblink = Bio::Annotation::DBLink->new(

-database   => 'FAKE001',

-primary_id => 'FK1234567890'',

-comment    => 'This is a fake comment',
                                                              );

$feature->annotation->add_Annotation('ANNO0001, $dblink);
$seq->add_SeqFeature($feature);

my $pseq = $dbadp->create_persistent($seq);

$pseq->store();
$adp->commit();


my $dbadp = Bio::DB::BioDB->new(
                                                    ...
                                                   );

my $adp = $dbadp->get_object_adaptor("Bio::SeqI");

my $query = Bio::DB::Query::BioQuery->new();

$query->datacollections([
                         "Bio::PrimarySeqI s",
                        ]);

$query->where(["s.display_id like DEBUG%'"]);

my $result = $adp->find_by_query($query);

while (my $seq = $result->next_object()) {

    my @features = $seq->get_SeqFeatures();

    foreach my $feature (@features) {

        ## Contents of Bio::Annotation::SimpleValue and
Bio::Annotation::StructeredValue have
        ## migrated to tag/value pairs on $feature and are missing
from $annotation_collection.
        ##
        ## Comments have gone missing from Bio::Annotation::DBLink,
but DBLinks are otherwise intact and present.
        my $annotation_collection = $feature->annotation();
        ...
        ...

    }

}

    Is bioperl-db / BioSQL trying to tell me that I shouldn't be using
Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue?  Is
there even a place in the BioSQL schema for a comment to be attached
to a DBLink?


From neetisomaiya at gmail.com  Tue Aug 19 00:31:28 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 19 Aug 2008 10:01:28 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
	<ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
Message-ID: <764978cf0808182131p620a2dedu40b651be50be5b3c@mail.gmail.com>

Thanks a lot for the reply. It was a problem of the bioperl version.

~Neeti.

On Mon, Aug 18, 2008 at 6:56 PM, Mark Johnson <johnsonm at gmail.com> wrote:

> On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya <neetisomaiya at gmail.com>
> wrote:
>
> > I am fetching data from the ent gene file of KEGG which is available here
> :
> > ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent
> >
> > I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
> > gene names and pathways in which they participate. I am getting the gene
> > names fine. But this method
> >
> > "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
> > }"
> >
> > does'nt seem to be working. I am not able to get the data of the pathways
> in
> > which the gene is involved.
> >
> > Can someone please suggest how I can get the pathway data of genes from
> the
> > KEGG ent file??
>
> What exactly do you mean by "doesn't seem to be working" and what
> version of BioPerl are you using?  The code below seems to function as
> expected with BioPerl 1.5.2, producing output like this:
>
> hsa04612  Antigen processing and presentation
> hsa01430  Cell Communication
> hsa04020  Calcium signaling pathway
> hsa04080  Neuroactive ligand-receptor interaction
> hsa04540  Gap junction
> ...
> ...
> ...
>
> #!/wherever/bin/perl
>
> use strict;
> use warnings;
>
> use Bio::SeqIO;
>
>
> my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]);
>
> while (my $seq = $seqio->next_seq()) {
>
>    foreach my $pathway ($seq->annotation->get_Annotations('pathway')) {
>
>        ## $pathway should be a Bio::Annotation::Comment
>        print $pathway->text(), "\n";
>
>    }
>
> }
>


-- 
-Neeti
Even my blood says, B positive


From wgallin at ualberta.ca  Tue Aug 19 02:25:27 2008
From: wgallin at ualberta.ca (Warren Gallin)
Date: Tue, 19 Aug 2008 00:25:27 -0600
Subject: [Bioperl-l] EUtilities help
Message-ID: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>

Hi,

	Is there a cogent document on using Bio::DB::EUtilities with Bioperl  
1.5.2 around somewhere?

	We upgraded and now my scripts are broken when invoking it.

	Any pointers appreciated.

Thanks,

Warren Gallin


From David.Messina at sbc.su.se  Tue Aug 19 03:30:26 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 19 Aug 2008 09:30:26 +0200
Subject: [Bioperl-l] EUtilities help
In-Reply-To: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
References: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
Message-ID: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>

Hi Warren,

Are you upgrading to 1.5.2 or downgrading from bioperl-live?

If the former, you might consider going all the way to bioperl-live, whose
EUtilities support is improved significantly and documented extensively
here:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

If the latter, I don't believe there is, but Chris Fields will know for sure
and will probably chime in.


Dave

To get bioperl-live
as a tarball: http://www.bioperl.org/DIST/nightly_builds/
or via Subversion: http://www.bioperl.org/wiki/Using_Subversion


From bix at sendu.me.uk  Tue Aug 19 04:34:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Aug 2008 09:34:50 +0100
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
Message-ID: <48AA85AA.7010405@sendu.me.uk>

Michael Thon wrote:
> Hi all - I am trying to write a port of bioperl 1.5.2 to enable its 
> installation with the macports system (www.macports.org).  I'm not too 
> familiar with the Build.PL build system - is there any way to disable 
> the dependency resolution that the build script does (i.e. without 
> patching the script)?

How else will you be doing the dependency resolution? If your system 
just installs all dependencies itself beforehand, then dependency 
resolution won't be invoked in Build.PL :)

Otherwise, don't run Build.PL at all (why do you need to?), or have your 
system answer default to all questions:

echo | perl Build.PL


From mike.thon at gmail.com  Tue Aug 19 11:16:03 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Tue, 19 Aug 2008 17:16:03 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <48AA85AA.7010405@sendu.me.uk>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
	<48AA85AA.7010405@sendu.me.uk>
Message-ID: <E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>

> How else will you be doing the dependency resolution? If your system  
> just installs all dependencies itself beforehand, then dependency  
> resolution won't be invoked in Build.PL :)
> Otherwise, don't run Build.PL at all (why do you need to?),
Good point.  Now, why didn't I think of that? :)
Does Build.PL do anything other than copy Perl modules (and install  
dependencies)?
> or have your system answer default to all questions:
> echo | perl Build.PL
>


From downloadondemand at gmail.com  Thu Aug 14 19:06:04 2008
From: downloadondemand at gmail.com (N)
Date: Fri, 15 Aug 2008 02:06:04 +0300
Subject: [Bioperl-l] HOWTO:Graphics/BLAST output
Message-ID: <923c9ce30808141606k61d9cc23nb18e55dec5112ac4@mail.gmail.com>

Hello again!
Followed HOWTO:Graphics and encountered problem.
While parsing blast output i have clearly offending for me part of file. The
problem is that in this hit there are two HSPs, but the second one is in
"not right" strand orientation (Plus/Plus vs. Plus/Minus). How can i tell
bioperl to use only HSPs oriented along with the best-scored HSP? Or better,
althrough not related to this mailing list: How can i tell BLAST to put this
second HSP to a separate hit? What am i doing/thinking wrong?
Attached small png with problem. The second HSP is in white, but it is
biologically without sense, isn't it?
Thanks.


BLASTN 2.2.18 [Mar-02-2008]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|145226209
         (471 letters)

Database: ../database/1000
           24 sequences; 19,482 total letters

Searching..................................................done


                                                                 Score    E
Sequences producing significant alignments:                      (bits)
Value

gi|145226176 hypothetical_protein                                     599
e-174
gi|145226174 hypothetical_protein                                      30
0.007
gi|145226175 ATP-dependent_exoDNAse_(exonuclease_V)_alpha_subuni...    26
0.11
gi|145226190 hypothetical_protein                                      24
0.43
gi|145226195 protein_of_unknown_function_DUF1526                       22
1.7
gi|145226193 hypothetical_protein                                      22
1.7
gi|145226187 aminoglycoside_nucleotidyltransferase                     22
1.7
gi|145226181 diguanylate_cyclase                                       22
1.7
gi|145226179 hypothetical_protein                                      22
1.7
gi|145226178 hypothetical_protein                                      22
1.7
gi|145226177 putative_methyl-accepting_chemotaxis_sensory_transd...    22
1.7

>gi|145226176 hypothetical_protein
          Length = 477

 Score =  599 bits (302), Expect = e-174
 Identities = 428/470 (91%)
 Strand = Plus / Plus


Query: 1   atgaatgcagacgtgtcagtcaaccagtggaatccgttagaggtagccgctgaggcgaca 60
           |||||||| ||||||||||||||||||||||||||||||||||| |||||||||||||||
Sbjct: 1   atgaatgcggacgtgtcagtcaaccagtggaatccgttagaggtggccgctgaggcgaca 60


Query: 61  atcgctgccgccacagccgcgctggtgtgggaaggcccagacagctacggggtgctggaa 120
           || ||||||||||| |||||  |||||||||||||||||||||||||||||||| ||||
Sbjct: 61  attgctgccgccacggccgcattggtgtgggaaggcccagacagctacggggtgttggag 120


Query: 121 cgggtcgccggggccacagcgaaaggcatagcaacagctcggatagccgccgaaatcatg 180
           ||||| ||||||||||||||||||||||| ||||||||||||||| ||||||||||||||
Sbjct: 121 cgggtagccggggccacagcgaaaggcatggcaacagctcggataaccgccgaaatcatg 180


Query: 181 gctgacgtcaccacctcagttcagttcactgcggccaccgaacatgcgcgcggcggcgct 240
           ||||||||||||||||||||||||||||||||||||  |||| |||||||||||||||||
Sbjct: 181 gctgacgtcaccacctcagttcagttcactgcggccgacgaagatgcgcgcggcggcgct 240


Query: 241 gtagcgggacttccggggtggctggcgccgcggtgggcggcgtccgtgcgtgccgcactg 300
           |||||||| ||||||||||||||||||||||||||||||||||||||||| | |||||||
Sbjct: 241 gtagcggggcttccggggtggctggcgccgcggtgggcggcgtccgtgcgcggcgcactg 300


Query: 301 gacgaactcgaagccgccgggcggcccggctacgccatggtcaaggcgatcacctggcct 360
           ||||| ||||||||||||||||| |||||| ||  |||| | || ||    ||  ||||
Sbjct: 301 gacgagctcgaagccgccgggcgccccggcgacatcatgatgaaagcccggacacggccg 360


Query: 361 gccttgcgcagcgtcgcggggtggacccaagacgggccgctgcaaacatggcagacggct 420
           ||  |||||||||| |||| ||||||||||||||| |||||||| |||||||||||||||
Sbjct: 361 gcactgcgcagcgtggcggtgtggacccaagacggaccgctgcagacatggcagacggct 420


Query: 421 ctaattgtgagcgaagcacggactgctctggctcaccgcgtaggcgtctg 470
           || || |||  |||||||||||||||||||||||||||||||||||||||
Sbjct: 421 ctgatcgtggacgaagcacggactgctctggctcaccgcgtaggcgtctg 470


 Score = 22.3 bits (11), Expect = 1.7
 Identities = 11/11 (100%)
 Strand = Plus / Minus


Query: 124 gtcgccggggc 134
           |||||||||||
Sbjct: 333 gtcgccggggc 323


The rest of output truncated...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tst.png
Type: image/png
Size: 1668 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080815/24a3f555/attachment-0002.png>

From UKaraoz at lbl.gov  Thu Aug 14 20:03:51 2008
From: UKaraoz at lbl.gov (Ulas Karaoz)
Date: Thu, 14 Aug 2008 17:03:51 -0700
Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for
	blasttable
Message-ID: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>

Hi,

I found out that the save_output routine in RemoteBlast.pm doesn't  
save the output properly into a file when the Blast output is a  
hittable(blasttable). Might the reason be the fact that the tabular  
output has a line that starts with a # while the parser is looking for  
a line starting with just BLASTN, as in the section pasted below:

if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i ||
			 $l =~/^RPS-BLAST\s*.+$/i ) {
			$seentop=1;
}


From erant at psb.ugent.be  Fri Aug 15 08:25:59 2008
From: erant at psb.ugent.be (Erick Antezana)
Date: Fri, 15 Aug 2008 14:25:59 +0200
Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API?
In-Reply-To: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>
References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com>
	<48A448DD.4000206@psb.ugent.be>
	<1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>
Message-ID: <48A575D7.7010709@psb.ugent.be>

Hi Hilmar,

Hilmar Lapp wrote:
> Hi Erick,
>
> how did you determine that go-perl is specific to GO? I've found it to 
> work quite well for any kind of OBO-formatted ontology.
we have used go-perl during the gestation (2005) of the ontologies we 
built/handled/etc. In particular while dealing with GO (as it was 
originally conceived for that purpose). It is extremely useful indeed. 
However; as new extensions were needed by our team, new modules were 
developed on top of the existing ones and that code was actually 
inducing a new development branch which ended up in something called 
onto-perl. Onto-perl, as you might have realized, is strongly influenced 
by go-perl. On the other hand,  we had had some communications with 
Chris Mungall at that time, and he manifested he might drop further 
development on go-perl. Nevertheless, some time later he told me he will 
continue, which is good since many systems are based on it. But, by then 
we already had a sort of independent package which has shown to be useful.
>
> Also, you note that BioPerl doesn't have the ability to write in 
> certain formats, and to intersect and "unify" (would you mind 
> explaining what you mean by that?) ontologies.
While working with several OBO ontologies, we needed to have them (or 
part of them) merged, intersected, join (=get one ontology=unify). It 
can be of course a bit subjective..since you can "unify" ontologies 
based on different features/approaches/etc and while building 
application ontologies (such as CCO) you might be confronted to identify 
identical terms coming from different ontologies and get only one in 
your integrated resource....
> It seems that your implementation of RDF etc export isn't really 
> reusable or modular in any way, 
The exports (RDF, OWL, ...) are part of the Ontology module's 
functionality. have you had any particular problems while exporting an 
ontology? Please let us know so that it can be fixed or improved. On the 
other hand, that module is nowadays undergoing a deep improvement (not 
released yet) to accommodate a huge set of "ontologiz-ed" resources into 
an RDF repository. I would be also interested in discussing about it so 
that we could improve it.
> but I'd love to bring the intersection function over to BioPerl 
excellent! I think this is an appropriate time to make the diverse 
developments converge so that the users community could have a standard 
set of tools. We are  interested in following up these discussions.
> (BTW when you decided to roll your own ontology API, did you get the 
> impression that BioPerl isn't receptive to you adding to it?).
as I mentioned, we took originally go-perl since it offered much more 
functionalities that BioPerl::Ontology::*
> Would you mind pointing me to the place in the code where I would find 
> that, as I can't seem to find it.
I forwarded your request to the developer who might give you more 
details about it.
>
>     -hilmar
>


cheers,
Erick


From bix at sendu.me.uk  Tue Aug 19 11:56:32 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Aug 2008 16:56:32 +0100
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>	<48AA85AA.7010405@sendu.me.uk>
	<E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>
Message-ID: <48AAED30.1090103@sendu.me.uk>

Michael Thon wrote:
>> How else will you be doing the dependency resolution? If your system 
>> just installs all dependencies itself beforehand, then dependency 
>> resolution won't be invoked in Build.PL :)
>> Otherwise, don't run Build.PL at all (why do you need to?),
>
> Good point.  Now, why didn't I think of that? :)
> Does Build.PL do anything other than copy Perl modules (and install 
> dependencies)?

It generates the 'Build' script, which provides a whole host of functions:

./Build help

The only one other than 'install' that might be relevant to you is

./Build test

to run the test suite, if testing is part of you own system. There are 
other standard ways to run the tests though, that don't need you to 
create the Build script.

Somewhere along the lines, probably during the install function, it also 
creates man files and other forms of documentation iirc, and installs 
those too.

If you handle dependencies, installation and testing (or don't do 
testing) yourself, and don't care about man files (perldoc is good 
enough?) you can ignore Build.PL.


From xxq.t.xu at gmail.com  Tue Aug 19 12:57:41 2008
From: xxq.t.xu at gmail.com (XQ Xu)
Date: Tue, 19 Aug 2008 09:57:41 -0700
Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined primer
Message-ID: <3fde82050808190957y271aa52eh30e39a438cc8a8e3@mail.gmail.com>

Hi all,
I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use
Primer3 to calculate Tm for some pre-defined primers; however there is no
direct way to calculate Tm with Primer3. I have to call Primer3 and supply a
pre-defined primer, a template, etc to let it run and hopefully Primer3
finds a pair of primers for me, then I have to open the output and find out
what the Tm is for my pre-defined primer. Do I miss any function that can do
this quickly for me?
I know there's another module (Bio::SeqFeature::Primer) can do this
quickly, but the Tm is calculated with different parameters; therefore it's
not good to use it while I use Primer3 to design primers.
Any input?
Thanks!
-Tony


From cjfields at illinois.edu  Tue Aug 19 13:39:03 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 19 Aug 2008 12:39:03 -0500
Subject: [Bioperl-l] EUtilities help
In-Reply-To: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>
References: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
	<628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>
Message-ID: <E67F3EE3-7D3B-497A-926F-101A06D34956@illinois.edu>

As mentioned in previous posts and in the original POD, the original  
Bio::DB::EUtilities was experimental (unstable API).  It is deprecated  
in favor of the bioperl-live interface, which splits the user agent  
and parameter handling (Bio::DB::EUtilities, EUtilParameters) from the  
parsers (Bio::Tools::EUtilities).  Unfortunately, the original design  
was too rushed so any XML changes broke the tools; I basically had to  
start from the ground up again.  Any changes to eutil output should  
now be easier to deal with (famous last words).

I am still planning on adding a few things to it (including tests for  
the parser and user agent) but it shouldn't change substantially from  
what is in the cookbook.

chris

On Aug 19, 2008, at 2:30 AM, Dave Messina wrote:

> Hi Warren,
>
> Are you upgrading to 1.5.2 or downgrading from bioperl-live?
>
> If the former, you might consider going all the way to bioperl-live,  
> whose
> EUtilities support is improved significantly and documented  
> extensively
> here:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook
>
> If the latter, I don't believe there is, but Chris Fields will know  
> for sure
> and will probably chime in.
>
>
> Dave
>
> To get bioperl-live
> as a tarball: http://www.bioperl.org/DIST/nightly_builds/
> or via Subversion: http://www.bioperl.org/wiki/Using_Subversion
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at illinois.edu  Tue Aug 19 14:00:42 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 19 Aug 2008 13:00:42 -0500
Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for
	blasttable
In-Reply-To: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>
References: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>
Message-ID: <4B8B416E-2DA7-4B1E-9DE2-E1301B467637@illinois.edu>

Saving tabular BLAST to a file is working for me using bioperl-live.   
NCBI recently changed tabular BLAST output which broke parsing (there  
is an extra column now, can't remember what), but it is now fixed.

chris

On Aug 14, 2008, at 7:03 PM, Ulas Karaoz wrote:

> Hi,
>
> I found out that the save_output routine in RemoteBlast.pm doesn't  
> save the output properly into a file when the Blast output is a  
> hittable(blasttable). Might the reason be the fact that the tabular  
> output has a line that starts with a # while the parser is looking  
> for a line starting with just BLASTN, as in the section pasted below:
>
> if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i ||
> 			 $l =~/^RPS-BLAST\s*.+$/i ) {
> 			$seentop=1;
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Tue Aug 19 13:56:42 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 19 Aug 2008 13:56:42 -0400
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
Message-ID: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>


On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote:

>    I'm presently refactoring an in-house protein annotation pipeline
> and converting it to use BioSQL as a data store.  I've noticed some
> slightly screwy behavior with regard to how some of the
> Bio::Annotation classes are handled:
>
> -Instances of Bio::Annotation::SimpleValue and
> Bio::Annotation::StructuredValue attached to the annotation collection
> for a sequence feature (Bio::SeqFeature::Generic) are converted to
> tags/values on the feature.
>
> -Instances of Bio::Annotation::DBLink with attached comments loose  
> the comment.
> [...]
> $query->where(["s.display_id like DEBUG%'"]);

There's a single quote missing here, but I'm assuming that's a result  
of copy/paste editing?

> [...]
>    Is bioperl-db / BioSQL trying to tell me that I shouldn't be using
> Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue?

Your example code doesn't contain an example for where you are getting  
the B::A::StructuredValue object from. If you didn't create that  
yourself, it would be good to know what you did to end up with that.  
Chris Fields has written B::A::Tagtree which would be way forward, and  
if you created the object yourself, can you take a look at that and  
see whether that class wouldn't serve your purpose as well or even  
better?

In order to be stored in BioSQL structured (hierarchical, nested)  
annotation is flattened into a string representation, because BioSQL  
can't store nested annotation collections natively. Right now if I am  
not mistaken upon retrieval this is not converted back into a  
B::A::Tagtree object but rather left flat. This is being worked on  
though, we've just discussed some issues connected with that.

I could make B::A::StructuredValue work the same way, but I'm not sure  
what it provides that B::A::Tagtree doesn't. The latter uses  
Data::Stag under the hood, which is much cleaner, and more extensible  
in the future.

As for SimpleValue annotation versus tag/value annotation for  
seqfeatures, yes right now these are treated interchangeably for the  
purposes of BioSQL and Bioperl-db. You can do this easily too on your  
end by using Bio::SeqFeature::AnnotationAdaptor.

> Is there even a place in the BioSQL schema for a comment to be  
> attached
> to a DBLink?

No there isn't. I thought it is but it turns out that this isn't yet  
one of the desirable extensions to BioSQL from 1.1.x onwards, as  
documented on the wiki:

http://www.biosql.org/wiki/Enhancement_Requests

I'll add it (but feel free to do so yourself, especially if you have  
other enhancmenets).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From johnsonm at gmail.com  Wed Aug 20 14:43:25 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 20 Aug 2008 13:43:25 -0500
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
	<2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
Message-ID: <ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>

On Tue, Aug 19, 2008 at 12:56 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote:
> There's a single quote missing here, but I'm assuming that's a result of
> copy/paste editing?

Yes, I was a bit sloppy with the example.

> Your example code doesn't contain an example for where you are getting the
> B::A::StructuredValue object from. If you didn't create that yourself, it
> would be good to know what you did to end up with that. Chris Fields has
> written B::A::Tagtree which would be way forward, and if you created the
> object yourself, can you take a look at that and see whether that class
> wouldn't serve your purpose as well or even better?

I created the B::A::StructuredValue myself.  I'm using it to store the
output from PSORTb, which gives a cellular localization and a score
for a protein sequence (gene), which I'm trying to keep paired
together, if possible.  I'll take a look at B::A::Tagtree, that's
probably a better fit.

> In order to be stored in BioSQL structured (hierarchical, nested) annotation
> is flattened into a string representation, because BioSQL can't store nested
> annotation collections natively. Right now if I am not mistaken upon
> retrieval this is not converted back into a B::A::Tagtree object but rather
> left flat. This is being worked on though, we've just discussed some issues
> connected with that.

The data I have isn't really deeply nested.  I just like to keep
related annotation in one object, if possible.

> I could make B::A::StructuredValue work the same way, but I'm not sure what
> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the
> hood, which is much cleaner, and more extensible in the future.

Perhaps B::A::StructuredValue should be deprecated?

> As for SimpleValue annotation versus tag/value annotation for seqfeatures,
> yes right now these are treated interchangeably for the purposes of BioSQL
> and Bioperl-db. You can do this easily too on your end by using
> Bio::SeqFeature::AnnotationAdaptor.

I'll check out the AnnotationAdaptor, but I'll probably just end using
seqfeature tags/values.  They're functionally equivalent to
B::A::SimpleValue.

>> Is there even a place in the BioSQL schema for a comment to be attached
>> to a DBLink?
>
> No there isn't. I thought it is but it turns out that this isn't yet one of
> the desirable extensions to BioSQL from 1.1.x onwards, as documented on the
> wiki:
>
> http://www.biosql.org/wiki/Enhancement_Requests
>
> I'll add it (but feel free to do so yourself, especially if you have other
> enhancmenets).

I'll take a look at the wiki....I'll file that as a feature request if
I get there before you do it.


From cjfields at illinois.edu  Wed Aug 20 16:25:55 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 20 Aug 2008 15:25:55 -0500
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
	<2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
	<ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>
Message-ID: <9872D07D-61AB-4F0A-A477-35AA87ABF72E@illinois.edu>


On Aug 20, 2008, at 1:43 PM, Mark Johnson wrote:

> ...
>
>> I could make B::A::StructuredValue work the same way, but I'm not  
>> sure what
>> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag  
>> under the
>> hood, which is much cleaner, and more extensible in the future.
>
> Perhaps B::A::StructuredValue should be deprecated?

Probably.  The only place it was used in core was SeqIO::swiss (and  
now that uses Tagtree in bioperl-live).

Let me know if you have any problems with Bio::Annotation::Tagtree.  I  
am planning on doing some more work with it soon.

chris


From cjfields at illinois.edu  Thu Aug 21 10:26:22 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Aug 2008 09:26:22 -0500
Subject: [Bioperl-l] Annotations issue (GenBank)
Message-ID: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>

I'm working on a GenBank patch and noticed a few cases where  
annotations are being stored as all uppercase strings (CONTIG, WGS,  
etc).  I'm planning on converting these to lowercase (e.g. 'wgs',  
'contig') for consistency with other annotation tag values.  I'm  
making sure output is consistent as well.

These are used fairly infrequently so I don't think it should cause  
problems, but just in case, does anyone have a problem with this change?

chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From awitney at sgul.ac.uk  Thu Aug 21 10:41:10 2008
From: awitney at sgul.ac.uk (Adam Witney)
Date: Thu, 21 Aug 2008 15:41:10 +0100
Subject: [Bioperl-l] adding HSP information to BLAST output graphic
	(Bio::Graphics)
Message-ID: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>


Hi,

I am going through the Bio::Graphics HOWTO on the wiki.

Looking at render_blast4.pl, the description text describes the whole  
hit and is set for the whole track, but i would like to be able to add  
HSP information such as the identity matches onto the picture, this is  
stored in the $hsp object. How would i go about adding that to the  
picture?

the relevant piece of code is:

       my $track = $panel->add_track(
                                     -glyph       => 'graded_segments',
                                     -label       => 1,
                                     -connector   => 'dashed',
                                     -bgcolor     => 'blue',
                                     -font2color  => 'red',
                                     -sort_order  => 'high_score',
                                     -description => sub {
                                       my $feature = shift;
                                       return unless $feature- 
 >has_tag('description');
                                       my ($description) = $feature- 
 >each_tag_value('description');
                                       my $score = $feature->score;
                                       "$description, score=$score";
                           #            "score=$score";
                                      },
                                    );

       next unless $hit->significance < 1E-20;

       my $feature = Bio::SeqFeature::Generic->new(
                                                   -score        =>  
$hit->raw_score,
                                                   -display_name =>  
$hit->name,
                                                   -tag          => {
                                                                      
description => $hit->description
                                                                    },
                                                  );

       while( my $hsp = $hit->next_hsp )
         {
          $feature->add_sub_SeqFeature($hsp,'EXPAND');
         }

       $track->add_feature($feature);

thanks for any help

adam


From cjfields at illinois.edu  Thu Aug 21 12:01:11 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Aug 2008 11:01:11 -0500
Subject: [Bioperl-l] Annotations issue (GenBank)
In-Reply-To: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>
References: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>
Message-ID: <0ACE20F0-43E5-4E12-9437-094871766083@illinois.edu>

I went ahead and committed this; if there are any disagreements about  
it I can back it out or modify as needed.

chris

On Aug 21, 2008, at 9:26 AM, Chris Fields wrote:

> I'm working on a GenBank patch and noticed a few cases where  
> annotations are being stored as all uppercase strings (CONTIG, WGS,  
> etc).  I'm planning on converting these to lowercase (e.g. 'wgs',  
> 'contig') for consistency with other annotation tag values.  I'm  
> making sure output is consistent as well.
>
> These are used fairly infrequently so I don't think it should cause  
> problems, but just in case, does anyone have a problem with this  
> change?
>
> chris


From mshafiullah at mail.unomaha.edu  Thu Aug 21 15:35:54 2008
From: mshafiullah at mail.unomaha.edu (Mohammad Shafiullah)
Date: Thu, 21 Aug 2008 14:35:54 -0500
Subject: [Bioperl-l] bioperl-network test error
Message-ID: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>

To whom it may concern: 

Encountered the following error while running ./Build test on 
bioperl-network-1.5.2_100

Can't stat scripts: No such file or directory
 at /usr/share/perl5/Module/Build/Base.pm line 3836
t/Edge...........ok 
t/Graph-MD5......1/19 Not an ARRAY reference at 
/usr/share/perl5/Heap/Elem.pm line 31.
t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
 All 19 subtests passed 
t/Graph-Seq......1/16 Not an ARRAY reference at 
/usr/share/perl5/Heap/Elem.pm line 31.
t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
 All 16 subtests passed 
t/IO_dip_tab.....ok 
t/IO_psi.........ok 
t/Interaction....ok 
t/Node...........ok 
t/ProteinNet.....ok 

Test Summary Report
-------------------
t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
  Non-zero exit status: 9
t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
  Non-zero exit status: 255
Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33 cusr 
0.37 csys =  1.72 CPU)
Result: FAIL
Failed 2/8 test programs. 0/292 subtests failed.

Please advise on the issue. 

Sincerely,

- Mohammad


From bosborne11 at verizon.net  Thu Aug 21 16:43:35 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 21 Aug 2008 16:43:35 -0400
Subject: [Bioperl-l] bioperl-network test error
In-Reply-To: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
References: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
Message-ID: <83EB2D5B-37F2-4163-8A14-F93ECE740197@verizon.net>

Mohammed,

Try replacing the ModuleBuildBioperl.pm file that you have with the  
one that's attached. Then do this inside the bioperl-network-1.5.2_100  
directory:

./Build clean
perl Build.PL
./Build test

What do you see? Also, what's your version of Perl?


Brian O.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ModuleBuildBioperl.pm
Type: text/x-perl-script
Size: 40360 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080821/17d19dd5/attachment-0002.bin>
-------------- next part --------------
:


On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote:

> To whom it may concern:
>
> Encountered the following error while running ./Build test on
> bioperl-network-1.5.2_100
>
> Can't stat scripts: No such file or directory
> at /usr/share/perl5/Module/Build/Base.pm line 3836
> t/Edge...........ok
> t/Graph-MD5......1/19 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
> All 19 subtests passed
> t/Graph-Seq......1/16 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
> All 16 subtests passed
> t/IO_dip_tab.....ok
> t/IO_psi.........ok
> t/Interaction....ok
> t/Node...........ok
> t/ProteinNet.....ok
>
> Test Summary Report
> -------------------
> t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
>  Non-zero exit status: 9
> t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
>  Non-zero exit status: 255
> Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33  
> cusr
> 0.37 csys =  1.72 CPU)
> Result: FAIL
> Failed 2/8 test programs. 0/292 subtests failed.
>
> Please advise on the issue.
>
> Sincerely,
>
> - Mohammad
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Thu Aug 21 17:57:17 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 21 Aug 2008 17:57:17 -0400
Subject: [Bioperl-l] bioperl-network test error
In-Reply-To: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
References: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
Message-ID: <AC71DAAF-C324-4D6E-8D56-EE19B1957A27@verizon.net>

Mohammad,

Take a look at this:

http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2007-06/msg00365.html

It discusses that Heap::Elem error when using Graph.

Brian O.


On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote:

> To whom it may concern:
>
> Encountered the following error while running ./Build test on
> bioperl-network-1.5.2_100
>
> Can't stat scripts: No such file or directory
> at /usr/share/perl5/Module/Build/Base.pm line 3836
> t/Edge...........ok
> t/Graph-MD5......1/19 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
> All 19 subtests passed
> t/Graph-Seq......1/16 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
> All 16 subtests passed
> t/IO_dip_tab.....ok
> t/IO_psi.........ok
> t/Interaction....ok
> t/Node...........ok
> t/ProteinNet.....ok
>
> Test Summary Report
> -------------------
> t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
>  Non-zero exit status: 9
> t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
>  Non-zero exit status: 255
> Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33  
> cusr
> 0.37 csys =  1.72 CPU)
> Result: FAIL
> Failed 2/8 test programs. 0/292 subtests failed.
>
> Please advise on the issue.
>
> Sincerely,
>
> - Mohammad
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dominic at bioinf.uni-leipzig.de  Fri Aug 22 09:35:21 2008
From: dominic at bioinf.uni-leipzig.de (Dominic Rose)
Date: Fri, 22 Aug 2008 15:35:21 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
Message-ID: <48AEC099.20106@bioinf.uni-leipzig.de>

Hi,

just a short suggestion to improve the code:

in function

sub _build_nt_matrix()

one finds the following lines:

         my $ti_index = $NucleotideIndexes{$ti};
         my $tj_index = $NucleotideIndexes{$tj};

         if( ! defined $ti_index ) {
             print "ti_index not defined for $ti\n";
             next;
         }

However, it should be possible to stop/silence the printing of that 
error message. Many alignments contain N's what causes many many
"ti_index not defined for N" messages. That should be avoidable.

Thanks,
Dominic

-- 
Dominic Rose
Professur f?r Bioinformatik
Institut f?r Informatik
Universit?t Leipzig
H?rtelstr. 16-18
D-04107 Leipzig
WWW    http://www.bioinf.uni-leipzig.de
Phone: +49 341 97-16698
Fax:   +49 341 97-16679


From heikki at sanbi.ac.za  Wed Aug 27 02:23:39 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 27 Aug 2008 08:23:39 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
In-Reply-To: <48AEC099.20106@bioinf.uni-leipzig.de>
References: <48AEC099.20106@bioinf.uni-leipzig.de>
Message-ID: <200808270823.39852.heikki@sanbi.ac.za>

Dominic,

You are absolutely right. 

I've changed 'print' into '$self->warn' in the SVN. Now it is possible to set 
$object->verbose(-1) to silence the warning or, if deemed necessary, set 
$object->verbose(2) and catch the error with an eval statement.

Thanks for reporting this,

    -Heikki

On Friday 22 August 2008 15:35:21 Dominic Rose wrote:
> Hi,
>
> just a short suggestion to improve the code:
>
> in function
>
> sub _build_nt_matrix()
>
> one finds the following lines:
>
>          my $ti_index = $NucleotideIndexes{$ti};
>          my $tj_index = $NucleotideIndexes{$tj};
>
>          if( ! defined $ti_index ) {
>              print "ti_index not defined for $ti\n";
>              next;
>          }
>
> However, it should be possible to stop/silence the printing of that
> error message. Many alignments contain N's what causes many many
> "ti_index not defined for N" messages. That should be avoidable.
>
> Thanks,
> Dominic

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From giles.weaver at googlemail.com  Wed Aug 27 06:39:19 2008
From: giles.weaver at googlemail.com (Giles Weaver)
Date: Wed, 27 Aug 2008 11:39:19 +0100
Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined
	primer
In-Reply-To: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com>
References: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com>
Message-ID: <1d06cd5d0808270339p4432f91bo72f48364696caf5a@mail.gmail.com>

Hi Tony,

It isn't well documented, but Primer3 includes a program called oligotm,
which is used to calculate the Tm of short sequences (up to 32bp). You can
run it directly by typing something like "oligotm ACGTACGTACGTACGT" in the
terminal. Just typing oligotm will give you the options.

If you are using Linux, these snippets of code may help you call oligotm
from within a perl script:

use IPC::Open3;

sub _run_oligotm
{
    my ($class, $sequence) = @_;

    my $run = "oligotm -tp 1 -sc 1 $sequence";
    my $pid = open3(\*WTRFH, \*RDRFH, \*ERRFH, $run);
    close (WTRFH);

    my ($tm, $errors);
    while (<RDRFH>) { $tm .= $_;}
    while (<ERRFH>) { $errors .= $_;}
    chomp $tm;
    return ($tm, $errors);
}

You'll need to put this in a package or edit out the $class bit for it to
work.

This is my first post to this list. I'm receiving the digest so replying to
posts is a bit of a faff. Can anyone recommend a better way of replying to
posts than replying to the digest, editing it and pasting the subject into
the subject field?

Giles Weaver
Unilever R&D


>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 19 Aug 2008 09:57:41 -0700
> From: "XQ Xu" <xxq.t.xu at gmail.com>
> Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined
>        primer
> To: bioperl-l at lists.open-bio.org
> Message-ID:
>        <3fde82050808190957y271aa52eh30e39a438cc8a8e3 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi all,
> I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use
> Primer3 to calculate Tm for some pre-defined primers; however there is no
> direct way to calculate Tm with Primer3. I have to call Primer3 and supply
> a
> pre-defined primer, a template, etc to let it run and hopefully Primer3
> finds a pair of primers for me, then I have to open the output and find out
> what the Tm is for my pre-defined primer. Do I miss any function that can
> do
> this quickly for me?
> I know there's another module (Bio::SeqFeature::Primer) can do this
> quickly, but the Tm is calculated with different parameters; therefore it's
> not good to use it while I use Primer3 to design primers.
> Any input?
> Thanks!
> -Tony
>


From awitney at sgul.ac.uk  Wed Aug 27 12:21:12 2008
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 27 Aug 2008 17:21:12 +0100
Subject: [Bioperl-l] adding HSP information to BLAST output graphic
	(Bio::Graphics)
In-Reply-To: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>
References: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>
Message-ID: <8939216F-7AFC-40BB-A416-8508E3E5D871@sgul.ac.uk>


after some digging around myself, this seems to do the trick, although  
i don't know if it will always work, as i am having to add a separate  
sort function.

On 21 Aug 2008, at 15:41, Adam Witney wrote:

>
> Hi,
>
> I am going through the Bio::Graphics HOWTO on the wiki.
>
> Looking at render_blast4.pl, the description text describes the  
> whole hit and is set for the whole track, but i would like to be  
> able to add HSP information such as the identity matches onto the  
> picture, this is stored in the $hsp object. How would i go about  
> adding that to the picture?
>
> the relevant piece of code is:
>
>      my $track = $panel->add_track(
>                                    -glyph       => 'graded_segments',
>                                    -label       => 1,
>                                    -connector   => 'dashed',
>                                    -bgcolor     => 'blue',
>                                    -font2color  => 'red',
>                                    -sort_order  => 'high_score',

                                     -part_labels => sub {
                                       my ($feature,undef,$partno) = @_;
                                       my @features =  
sort_features($feature->get_SeqFeatures());
                                       return $features[$partno]- 
 >num_identical.'/'.$features[$partno]->length.' (score='. 
$features[$partno]->score.')' if $features[$partno];
                                      },

>                                    -description => sub {
>                                      my $feature = shift;
>                                      return unless $feature- 
> >has_tag('description');
>                                      my ($description) = $feature- 
> >each_tag_value('description');
>                                      my $score = $feature->score;
>                                      "$description, score=$score";
>                          #            "score=$score";
>                                     },
>                                   );


sub sort_features {
   my @array = @_;

   if(@array < 2){return @array}

   my @sorted = sort {$a->start <=> $b->start} @array;

   return @sorted;
}


is this the best way to achieve this?

thanks

adam


From mauricio at open-bio.org  Thu Aug 28 13:43:04 2008
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Thu, 28 Aug 2008 12:43:04 -0500
Subject: [Bioperl-l] Pdoc updates
Message-ID: <48B6E3A8.3050305@open-bio.org>

For those who use the online Pdoc documentation (doc.bioperl.org), this 
is to let you know that the CvsWeb links at the top of any module page 
now link to the proper place in the SVN web interface.

Cheers,
Mauricio.


From jaudall at gmail.com  Fri Aug 29 02:46:20 2008
From: jaudall at gmail.com (Joshua Udall)
Date: Thu, 28 Aug 2008 23:46:20 -0700
Subject: [Bioperl-l] DB_File and assembly IO
Message-ID: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>

Bioperl -

I'm trying to read/parse a single cap3 ace file with several thousand
contigs.  I get a DB_File error at Contig247.  Here's the error:

------------- EXCEPTION -------------
MSG: Unable to tie DB_File handle
STACK Bio::SeqFeature::Collection::new
/Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
STACK Bio::Assembly::Contig::new
/Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
STACK Bio::Assembly::IO::ace::next_assembly
/Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
STACK toplevel /Users/jaudall/bin/read_ace.pl:214
-------------------------------------

Looking at the Collection::new, the error is on the middle line:

  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
$self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die "Cannot open
file: $!\n" ;
  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
  return $self;

If I uncomment out the $! die statement that I inserted, I get this:

'Cannot open file tree: Too many open files'

Apparently the Collection constructor is creating a new index file for each
one and the handles for each are sticking around?  That confuses me because
reading more about the Collection.pm and DB_File, it appeared to me that no
files were written by default (as I'm doing), rather the Collection objects
are all stored in memory.  I'm pretty sure the error is not a permission
error, and if it is not the open file-handles, what else should I look for?


If I 'warn' the error instead of throwing it, I get:

Can't call method "get_dup" on an undefined value at
/Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360

This kind of makes sense because the index appears not be be created and it
can't look stuff up in an undefined tied hash.  I'm stuck.

Thanks for any help and suggestions.

OSX, perl 5.8.8, bioperl-live (svn last week)

-- 
Joshua Udall
Assistant Professor
295 WIDB
Plant and Wildlife Science Dept.
Brigham Young University
Provo, UT 84602
801-422-9307
Fax: 801-422-0008
USA


From florent.angly at gmail.com  Fri Aug 29 04:40:25 2008
From: florent.angly at gmail.com (Florent Angly)
Date: Fri, 29 Aug 2008 18:40:25 +1000
Subject: [Bioperl-l] DB_File and assembly IO
In-Reply-To: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
Message-ID: <48B7B5F9.1050608@gmail.com>

Hi Joshua,

I don't know the specifics of DB_File, but the 'Cannot open file tree: 
Too many open files' is pretty explicit.
If you're on Unix/Linux you can check the files that are open by your 
program by typing:
    lsof | grep name_of_program
There is probably a filehandle that in not closed somewhere in your code 
or the BioPerl code.
Best,

Florent


Joshua Udall wrote:
> Bioperl -
>
> I'm trying to read/parse a single cap3 ace file with several thousand
> contigs.  I get a DB_File error at Contig247.  Here's the error:
>
> ------------- EXCEPTION -------------
> MSG: Unable to tie DB_File handle
> STACK Bio::SeqFeature::Collection::new
> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
> STACK Bio::Assembly::Contig::new
> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
> STACK Bio::Assembly::IO::ace::next_assembly
> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
> -------------------------------------
>
> Looking at the Collection::new, the error is on the middle line:
>
>   $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die "Cannot open
> file: $!\n" ;
>   $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>   return $self;
>
> If I uncomment out the $! die statement that I inserted, I get this:
>
> 'Cannot open file tree: Too many open files'
>
> Apparently the Collection constructor is creating a new index file for each
> one and the handles for each are sticking around?  That confuses me because
> reading more about the Collection.pm and DB_File, it appeared to me that no
> files were written by default (as I'm doing), rather the Collection objects
> are all stored in memory.  I'm pretty sure the error is not a permission
> error, and if it is not the open file-handles, what else should I look for?
>
>
> If I 'warn' the error instead of throwing it, I get:
>
> Can't call method "get_dup" on an undefined value at
> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360
>
> This kind of makes sense because the index appears not be be created and it
> can't look stuff up in an undefined tied hash.  I'm stuck.
>
> Thanks for any help and suggestions.
>
> OSX, perl 5.8.8, bioperl-live (svn last week)
>
>   


From cjfields at illinois.edu  Fri Aug 29 10:30:49 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 29 Aug 2008 09:30:49 -0500
Subject: [Bioperl-l] DB_File and assembly IO
In-Reply-To: <48B7B5F9.1050608@gmail.com>
References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
	<48B7B5F9.1050608@gmail.com>
Message-ID: <5717CE96-EC24-46D9-A922-88702B1647A1@illinois.edu>

This is a known problem with Bio::Assembly and stems from having a  
DB_File tied (opened) for each Bio::Assembly::Contig (via a retained  
Bio::SeqFeature::Collection).  You can extend the number of open  
filehandles on UNIX'y flavors using ulimit (see following link), but  
I'm not sure about Win32.

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

The general bug is reproducible using the following simple script.  If  
needed adjust the range end in the for loop to exceed the ulimit (via  
'ulimit -n);  Mac OS X 10.5 is set to 2560.

---------------------------
use Bio::Assembly::Contig;

my @contigs;

push @contigs, Bio::Assembly::Contig->new() for (1..10000);
---------------------------

I'll open a bug report on this for tracking (for release 1.7, along  
with any other Bio::Assembly issues).  That doesn't mean it won't get  
fixed sooner, just that we aren't under pressure with the next  
release, which already has a full plate.  IMO, I don't think there  
needs to be one SF::Collection per contig; one instance should work do  
for the entire assembly, using the same SF::Collection passed in to  
each contig and distinguishing the contig using the SeqFeature  
seq_id.  It would also be nice if we could change that to also allow  
other SeqFeature::CollectionI (i.e. Bio::DB::SeqFeature::Store and the  
like, for instance).

chris

On Aug 29, 2008, at 3:40 AM, Florent Angly wrote:

> Hi Joshua,
>
> I don't know the specifics of DB_File, but the 'Cannot open file  
> tree: Too many open files' is pretty explicit.
> If you're on Unix/Linux you can check the files that are open by  
> your program by typing:
>   lsof | grep name_of_program
> There is probably a filehandle that in not closed somewhere in your  
> code or the BioPerl code.
> Best,
>
> Florent
>
>
>
> Joshua Udall wrote:
>> Bioperl -
>>
>> I'm trying to read/parse a single cap3 ace file with several thousand
>> contigs.  I get a DB_File error at Contig247.  Here's the error:
>>
>> ------------- EXCEPTION -------------
>> MSG: Unable to tie DB_File handle
>> STACK Bio::SeqFeature::Collection::new
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
>> STACK Bio::Assembly::Contig::new
>> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
>> STACK Bio::Assembly::IO::ace::next_assembly
>> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
>> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
>> -------------------------------------
>>
>> Looking at the Collection::new, the error is on the middle line:
>>
>>  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
>> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die  
>> "Cannot open
>> file: $!\n" ;
>>  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>>  return $self;
>>
>> If I uncomment out the $! die statement that I inserted, I get this:
>>
>> 'Cannot open file tree: Too many open files'
>>
>> Apparently the Collection constructor is creating a new index file  
>> for each
>> one and the handles for each are sticking around?  That confuses me  
>> because
>> reading more about the Collection.pm and DB_File, it appeared to me  
>> that no
>> files were written by default (as I'm doing), rather the Collection  
>> objects
>> are all stored in memory.  I'm pretty sure the error is not a  
>> permission
>> error, and if it is not the open file-handles, what else should I  
>> look for?
>>
>>
>> If I 'warn' the error instead of throwing it, I get:
>>
>> Can't call method "get_dup" on an undefined value at
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm  
>> line 360
>>
>> This kind of makes sense because the index appears not be be  
>> created and it
>> can't look stuff up in an undefined tied hash.  I'm stuck.
>>
>> Thanks for any help and suggestions.
>>
>> OSX, perl 5.8.8, bioperl-live (svn last week)
>>


From milan.gilic at st.t-com.hr  Sat Aug 23 17:15:39 2008
From: milan.gilic at st.t-com.hr (Milan)
Date: Sat, 23 Aug 2008 23:15:39 +0200
Subject: [Bioperl-l]  Count or weight matrix in bioperl?
Message-ID: <48B07DFB.3040201@st.t-com.hr>


From dominic at bioinf.uni-leipzig.de  Mon Aug 25 06:40:42 2008
From: dominic at bioinf.uni-leipzig.de (Dominic Rose)
Date: Mon, 25 Aug 2008 12:40:42 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
Message-ID: <48B28C2A.8050803@bioinf.uni-leipzig.de>

Hi,

just a short suggestion to improve the code:

in function

sub _build_nt_matrix()

one finds the following lines:

         my $ti_index = $NucleotideIndexes{$ti};
         my $tj_index = $NucleotideIndexes{$tj};

         if( ! defined $ti_index ) {
             print "ti_index not defined for $ti\n";
             next;
         }

However, it should be possible to avoid the printing of that error 
message. Many alignments contain N's what causes many many
"ti_index not defined for N" messages. It should be possible to switch 
that message off.

Thanks,
Dominic

-- 
Dominic Rose
Professur f?r Bioinformatik
Institut f?r Informatik
Universit?t Leipzig
H?rtelstr. 16-18
D-04107 Leipzig
WWW    http://www.bioinf.uni-leipzig.de
Phone: +49 341 97-16698
Fax:   +49 341 97-16679


From brunovecchi at yahoo.com.ar  Fri Aug  1 00:16:16 2008
From: brunovecchi at yahoo.com.ar (Bruno Vecchi)
Date: Fri, 01 Aug 2008 01:16:16 -0300
Subject: [Bioperl-l] Bio::Biblio doesn't find articles [SOLVED]
Message-ID: <48928E10.7090903@yahoo.com.ar>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080801/f000f7b8/attachment-0003.html>

From Kevin.Clancy at invitrogen.com  Fri Aug  1 18:30:30 2008
From: Kevin.Clancy at invitrogen.com (Clancy, Kevin)
Date: Fri, 1 Aug 2008 15:30:30 -0700
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
Message-ID: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>

Hi Folks
I am using the windows version of Bioperl 1.5.2_100. I recently was
compiling a tool using ActiveState's PerlApp which included Bioperl
modules. I received an error for the Bio::SeqIO module, which was
calling for the Bio::SeqIO::staden::read method(?) on line 312 - 314 of
the Bio::SeqIO.pm module. I don't appear to have a copy of the staden
module under the Bio::SeqIO directory and it doesn't appear to be
present in the current BioPerl trunk. I simply commented this out of my
SeqIO.pm file to perform my build and its all running normally. Was this
simply a reference to a non existent module or am I missing something?
Thank you for your help.
kevin

Kevin Clancy, PhD
Senior Scientist, Informatic Sciences
Invitrogen Corp
Carlsbad, CA 92008
Phone: (768) 268 8356
Email: kevin.clancy at invitrogen.com 


From jason at bioperl.org  Sat Aug  2 08:58:05 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 2 Aug 2008 07:58:05 -0500
Subject: [Bioperl-l] Inframe stop codon
In-Reply-To: <516747.39380.qm@web36405.mail.mud.yahoo.com>
References: <516747.39380.qm@web36405.mail.mud.yahoo.com>
Message-ID: <F3DAE939-82AC-4EB8-8667-F6612478477E@bioperl.org>

[regarding PAML analyses]

You would need to translate the cDNA sequence and identify where the  
stop codon is, then remove that codon or remove that sequence from  
your bulk analyses.  it depends on why you think the stop codon is in  
the sequence - mis-annotation, this is a pseudogene, or what?  If  
this is a small percentage of a lot of sequences I would probably  
just skip these, if this is the terminal stop codon that being  
included in the sequences, you just need to remove the last codon  
from the sequences before providing it to PAML. There Seq HOWTO has  
many examples of how to manipulate a sequence object with substr,  
trunc, as well as just the simple seq() method that gives you the  
sequence as a string, which you can manipulate, then update the  
sequence object afterwards. As in
my $str = $seq->seq;
# remove the last codon from this cDNA sequence
substr($str, -3, 3,'');
$seq->seq($str);

Alternatively you can use trunc to truncate the sequence
my $trunc = $seq->trunc(1,$seq->length -3);
$seq = $trunc;


You can translate the sequence with the $seq->translate command, then  
test for presence of a stop codon (This is exactly the code that is  
running in the pairwise_kaks script that is in the scripts/utilities/  
directory).  If you have a stop codon you need to figure out where it  
is at the end of the sequence or not.  If it is the terminal codon,  
you can just lop off the last codon on all your sequences, but if it  
is internal, you need to decide what you want to do with this sequence.

If there are multiple stop codons, I am not sure it is appropriate to  
run PAML here, unless you are interested in some sort of pseudo-rate  
calculation that has many of the codons omitted.  Otherwise you may  
just want to calculate a DNA substitution rate for the sequences to  
make comparison.

I suggest working a single file by hand to get the appropriate steps  
down and then coding it up will be easier.

I am sure folks on the list can help too so it is important to post  
to the mailing list - I don't see any messages from you on the list  
about this query.

-jason
On Aug 2, 2008, at 5:42 AM, Tannistha wrote:

>
> Hi Jason,
>
> Please suggest me how to filter the inframe stop codons,  
> aa_to_dna_aln returns the sequence with in-frame stop codons.
> I have posted my query along with the input files to the forum.
>
> Thanks for your earlier advice, runmode =0 is working for me.
>
> Look forward to your reply
>
> Best Regards
> Tannistha
>
>
> Dr. Tannistha Nandi
> email: tannistha3 at yahoo.com
>
>
>


From David.Messina at sbc.su.se  Sun Aug  3 15:10:18 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 3 Aug 2008 21:10:18 +0200
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
In-Reply-To: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
References: <Acj0JjSSmx+wYhbUS0GAoF0WhwWjzg==>
	<28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
Message-ID: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>

Hi Kevin,

The staden module is a oddball one, to be sure.

A search on the BioPerl website turns up this FAQ entry:
http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F

Also the Windows install page
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

says:

> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden> and
> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs can only
> be installed on Windows by using Cygwin <http://www.cygwin.com/> and its gcc
> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>


In any case, the staden module (and associated external libraries) is used
only if you are trying to read the scf, abi, alf, pln, exp, ctf, or ztr
binary formats. So your edit shouldn't cause you any problems otherwise.

Dave


From cjfields at uiuc.edu  Sun Aug  3 16:20:52 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Aug 2008 15:20:52 -0500
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
In-Reply-To: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>
References: <Acj0JjSSmx+wYhbUS0GAoF0WhwWjzg==>
	<28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
	<628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>
Message-ID: <BA800345-8498-46CF-AD94-E8FD9098DEBB@uiuc.edu>

This seems to be a problem with PerlApp and eval{}; judging by a quick  
Google search this isn't the only module affected.  The line in  
question is wrapped in an eval{} to check for the availability of  
Bio::SeqIO::staden::read (but not die on it).

BTW, the eval was moved into the relevant plugin modules post-1.5.2,  
so the eval{} is checked when the module is loaded dynamically (i.e.  
when a format requiring it is passed in).  It was causing other issues  
with ActivePerl installations and was redundant, so it was removed.

http://bugzilla.open-bio.org/show_bug.cgi?id=2295

chris

On Aug 3, 2008, at 2:10 PM, Dave Messina wrote:

> Hi Kevin,
>
> The staden module is a oddball one, to be sure.
>
> A search on the BioPerl website turns up this FAQ entry:
> http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F
>
> Also the Windows install page
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> says:
>
>> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden 
>> > and
>> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs  
>> can only
>> be installed on Windows by using Cygwin <http://www.cygwin.com/>  
>> and its gcc
>> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>>
>
>
> In any case, the staden module (and associated external libraries)  
> is used
> only if you are trying to read the scf, abi, alf, pln, exp, ctf, or  
> ztr
> binary formats. So your edit shouldn't cause you any problems  
> otherwise.
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From btemperton at googlemail.com  Sat Aug  2 16:05:37 2008
From: btemperton at googlemail.com (Benbo)
Date: Sat, 2 Aug 2008 13:05:37 -0700 (PDT)
Subject: [Bioperl-l]  Finding possible primers regex
Message-ID: <18792782.post@talk.nabble.com>


Hi there, 
I'm trying to write a perl script to scan an aligned multiple entry fasta
file and find possible primers. So far I've produced a string which contains
bases which match all sequences and * where they don't match e.g.
1) TTAGCCTAA
2) TTAGCAGAA
3) TTACCCTAA

would give TTA*C**AA.

I want to parse this string and pull out all sequences which are 18-21 bp in
length and have no more than 4 * in them.

So far, I've got this:

while($fragment_match =~ /([GTAC*]{18,21})/g){
print "$1\n";
}

hoping to match all fragments 18-21 characters in length. However even that
doesn't work as it has essentially chunked it into 21 char blocks, rather
than what I hoped for of
0-18
0-19
0-20
0-21
1-19
1-20
1-21
1-22

etc.

Can anyone let me know if this is already possible in BioPerl, or how one
would go about it with regex. Sadly I'm fairly new to perl and getting to
grips with BioPerl, so please treat me gently :).

Many thanks,

Ben


-- 
View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Aug  4 00:08:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Aug 2008 23:08:51 -0500
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <18792782.post@talk.nabble.com>
References: <18792782.post@talk.nabble.com>
Message-ID: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>

On Aug 2, 2008, at 3:05 PM, Benbo wrote:

>
> Hi there,
> I'm trying to write a perl script to scan an aligned multiple entry  
> fasta
> file and find possible primers. So far I've produced a string which  
> contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
>
> would give TTA*C**AA.
>
> I want to parse this string and pull out all sequences which are  
> 18-21 bp in
> length and have no more than 4 * in them.
>
> So far, I've got this:
>
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
>
> hoping to match all fragments 18-21 characters in length. However  
> even that
> doesn't work as it has essentially chunked it into 21 char blocks,  
> rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
>
> etc.
>
> Can anyone let me know if this is already possible in BioPerl, or  
> how one
> would go about it with regex. Sadly I'm fairly new to perl and  
> getting to
> grips with BioPerl, so please treat me gently :).
>
> Many thanks,
>
> Ben

There is a trick to this which is discussed more extensively in  
'Mastering Regular Expressions'.  Essentially you have to embed code  
into the regex and trick the parser into backtracking using a negative  
lookahead.  The match itself fails (i.e. no match is returned), but  
the embedded code is executed for each match attempt,

The following script is a slight modification of one I used which  
checks the consensus string from the input alignment (in aligned FASTA  
format here), extracts the alignment slice using that match, then spit  
the alignment out to STDOUT in clustalw format.  This should work for  
perl 5.8 and up, but it's only been tested on perl 5.10.  You should  
be able to use this to fit what you want.

my $in = Bio::AlignIO->new(-file => $file,
                            -format => 'fasta');
my $out = Bio::AlignIO->new(-fh => \*STDOUT,
                            -format => 'clustalw');

while (my $aln = $in->next_aln) {
     my $c = $aln->consensus_string(100);
     my @matches;
     $c =~ m/
         ([GTAC?]{18,21})
         (?{my $match = check_match($1);
            push @matches, [$match,
                            pos(),
                            length($match)]
               if defined $match;})
         (?!)
         /xig;
     for my $match (@matches) {
         my ($hit, $st, $end) = ($match->[0],
                                 $match->[1] - $match->[2] + 1,
                                 $match->[1]);
         my $newaln = $aln->slice($st, $end);
         $out->write_aln($newaln);
     }
}

sub check_match {
     my $match = shift;
     return unless $match;
     my $ct = $match =~ tr/?/?/;
     return $match if $ct <= 4;
}


chris


From heikki at sanbi.ac.za  Mon Aug  4 02:42:57 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Mon, 4 Aug 2008 08:42:57 +0200
Subject: [Bioperl-l] Bio::Coordinate::Pair
In-Reply-To: <C4B4F0DB.498%pchanpa@emory.edu>
References: <C4B4F0DB.498%pchanpa@emory.edu>
Message-ID: <200808040842.57466.heikki@sanbi.ac.za>

Prashanth,

Your example coordinates do not do the conversion but more or less report the 
locations of your features in some third coordinates.

The way to think coordinates pairs is to use them as HSPs. You tell the pair 
object what is the matching segment in the pair of sequences.


The synopsis in Bio::Coordinate::Pair class file gives the following example:


use Bio::Location::Simple;
use Bio::Coordinate::Pair;

my $match1 = Bio::Location::Simple->new
    (-seq_id => 'propeptide', -start => 21, -end => 40, -strand=>1 );
my $match2 = Bio::Location::Simple->new
    (-seq_id => 'peptide', -start => 1, -end => 20, -strand=>1 );
my $pair = Bio::Coordinate::Pair->new(-in => $match1,
				      -out => $match2
    );
# location to match
$pos = Bio::Location::Simple->new
    (-start => 25, -end => 25, -strand=> -1 );

$res = $pair->map($pos);
print $res->match->start; # 5


In other words, region 25-40 in the propeptide matches locations 1-20 in the 
final peptide. Therefore conversion from 25 gives 5:

     signalp        21  25             40
--------------------|---|--------------|
                    1   5  pep         20

I hope this clarifies it.

The advantage of using these objects over manual conversion is that the code 
has been debugged (no all too easy +/-1 errors) and that they can be chained 
together.

Yours,

      -Heikki


On Tuesday 29 July 2008 22:07:55 Prashanth Athri wrote:
> Dear Professor Lehvaslaiho:
>
> I had a quick question about the module- Bio::Coordinate::Pair
>
> The BioPerl tutorial has the following example:
>
> $input_coordinates = Bio::Location::Simple->new
> (-seq_id => 'propeptide', -start => 1000, -end => 2000, -strand=>1 );
>
> $output_coordinates = Bio::Location::Simple->new
> (-seq_id => 'peptide', -start => 1100, -end => 2100, -strand=>1 );
>
> $pair = Bio::Coordinate::Pair->new
> (-in => $input_coordinates , -out => $output_coordinates );
>
> $pos = Bio::Location::Simple->new (-start => 500, -end => 500 );
>
> $res = $pair->map($pos);
> $converted_start = $res->start;
>
> The way I understand it, $converted_start should return ?1600?. But when I
> run this snippet, it returns ?500?. Could you please let me know how
> $pair->map($pos) is processed?
>
> I appreciate your time and thanks in advance.
>
> Regards,
> Prashanth


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From lengjingmao at gmail.com  Tue Aug  5 03:36:23 2008
From: lengjingmao at gmail.com (Shaohua Fan)
Date: Tue, 5 Aug 2008 15:36:23 +0800
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
References: <18792782.post@talk.nabble.com>
Message-ID: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>

Hi, there ,

I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?

thanks a lot. 
yours,
shaohua
----- Original Message ----- 
From: "Benbo" <btemperton at googlemail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Sunday, August 03, 2008 4:05 AM
Subject: [Bioperl-l] Finding possible primers regex


> 
> Hi there, 
> I'm trying to write a perl script to scan an aligned multiple entry fasta
> file and find possible primers. So far I've produced a string which contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
> 
> would give TTA*C**AA.
> 
> I want to parse this string and pull out all sequences which are 18-21 bp in
> length and have no more than 4 * in them.
> 
> So far, I've got this:
> 
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
> 
> hoping to match all fragments 18-21 characters in length. However even that
> doesn't work as it has essentially chunked it into 21 char blocks, rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
> 
> etc.
> 
> Can anyone let me know if this is already possible in BioPerl, or how one
> would go about it with regex. Sadly I'm fairly new to perl and getting to
> grips with BioPerl, so please treat me gently :).
> 
> Many thanks,
> 
> Ben
> 
> 
> 
> -- 
> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Tue Aug  5 05:49:55 2008
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 5 Aug 2008 11:49:55 +0200
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
References: <18792782.post@talk.nabble.com>
	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
Message-ID: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>

Hi,

There is a BioPerl Utility script doing this.
See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header.

" scripts/utilities/bp_nrdb.PLS
    Make a non-redundant database based on sequence, not id. Requires
Digest::MD5."

Alternatively, you can make a hash using the sequences as keys.


Regards,
Bernd

On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com> wrote:
> Hi, there ,
>
> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?
>
> thanks a lot.
> yours,
> shaohua
> ----- Original Message -----
> From: "Benbo" <btemperton at googlemail.com>
> To: <Bioperl-l at lists.open-bio.org>
> Sent: Sunday, August 03, 2008 4:05 AM
> Subject: [Bioperl-l] Finding possible primers regex
>
>
>>
>> Hi there,
>> I'm trying to write a perl script to scan an aligned multiple entry fasta
>> file and find possible primers. So far I've produced a string which contains
>> bases which match all sequences and * where they don't match e.g.
>> 1) TTAGCCTAA
>> 2) TTAGCAGAA
>> 3) TTACCCTAA
>>
>> would give TTA*C**AA.
>>
>> I want to parse this string and pull out all sequences which are 18-21 bp in
>> length and have no more than 4 * in them.
>>
>> So far, I've got this:
>>
>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>> print "$1\n";
>> }
>>
>> hoping to match all fragments 18-21 characters in length. However even that
>> doesn't work as it has essentially chunked it into 21 char blocks, rather
>> than what I hoped for of
>> 0-18
>> 0-19
>> 0-20
>> 0-21
>> 1-19
>> 1-20
>> 1-21
>> 1-22
>>
>> etc.
>>
>> Can anyone let me know if this is already possible in BioPerl, or how one
>> would go about it with regex. Sadly I'm fairly new to perl and getting to
>> grips with BioPerl, so please treat me gently :).
>>
>> Many thanks,
>>
>> Ben
>>
>>
>>
>> --
>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From diriano at uni-potsdam.de  Tue Aug  5 06:28:58 2008
From: diriano at uni-potsdam.de (Diego Mauricio Riano Pachon)
Date: Tue, 05 Aug 2008 12:28:58 +0200
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
Message-ID: <48982B6A.4050304@uni-potsdam.de>

Hi all,

Or you might try a non-bioperl solution that works pretty well, check:

http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86

Best,

Diego

Bernd Web wrote:
> Hi,
> 
> There is a BioPerl Utility script doing this.
> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header.
> 
> " scripts/utilities/bp_nrdb.PLS
>     Make a non-redundant database based on sequence, not id. Requires
> Digest::MD5."
> 
> Alternatively, you can make a hash using the sequences as keys.
> 
> 
> Regards,
> Bernd
> 
> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com> wrote:
>> Hi, there ,
>>
>> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?
>>
>> thanks a lot.
>> yours,
>> shaohua
>> ----- Original Message -----
>> From: "Benbo" <btemperton at googlemail.com>
>> To: <Bioperl-l at lists.open-bio.org>
>> Sent: Sunday, August 03, 2008 4:05 AM
>> Subject: [Bioperl-l] Finding possible primers regex
>>
>>
>>> Hi there,
>>> I'm trying to write a perl script to scan an aligned multiple entry fasta
>>> file and find possible primers. So far I've produced a string which contains
>>> bases which match all sequences and * where they don't match e.g.
>>> 1) TTAGCCTAA
>>> 2) TTAGCAGAA
>>> 3) TTACCCTAA
>>>
>>> would give TTA*C**AA.
>>>
>>> I want to parse this string and pull out all sequences which are 18-21 bp in
>>> length and have no more than 4 * in them.
>>>
>>> So far, I've got this:
>>>
>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>> print "$1\n";
>>> }
>>>
>>> hoping to match all fragments 18-21 characters in length. However even that
>>> doesn't work as it has essentially chunked it into 21 char blocks, rather
>>> than what I hoped for of
>>> 0-18
>>> 0-19
>>> 0-20
>>> 0-21
>>> 1-19
>>> 1-20
>>> 1-21
>>> 1-22
>>>
>>> etc.
>>>
>>> Can anyone let me know if this is already possible in BioPerl, or how one
>>> would go about it with regex. Sadly I'm fairly new to perl and getting to
>>> grips with BioPerl, so please treat me gently :).
>>>
>>> Many thanks,
>>>
>>> Ben
>>>
>>>
>>>
>>> --
>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
___________________________________
Diego Mauricio Ria?o Pach?n
Biologist - PhD student
AG Mueller-Roeber
Institute for Biochemistry and Biology
University of Potsdam

Address: Karl-Liebknecht-Str. 24-25
	 Haus 20
	 14476 Golm
	 Germany

Tel:	 +49 331 977 2809
Fax:	 +49 331 977 2512

web:	http://www.geocities.com/dmrp.geo


From cjfields at uiuc.edu  Tue Aug  5 11:19:54 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Aug 2008 10:19:54 -0500
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <48982B6A.4050304@uni-potsdam.de>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>

Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> 	 Haus 20
> 	 14476 Golm
> 	 Germany
>
> Tel:	 +49 331 977 2809
> Fax:	 +49 331 977 2512
>
> web:	http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Aug  5 11:19:54 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Aug 2008 10:19:54 -0500
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <48982B6A.4050304@uni-potsdam.de>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>

Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> 	 Haus 20
> 	 14476 Golm
> 	 Germany
>
> Tel:	 +49 331 977 2809
> Fax:	 +49 331 977 2512
>
> web:	http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From lengjingmao at gmail.com  Tue Aug  5 11:24:22 2008
From: lengjingmao at gmail.com (Shaohua Fan)
Date: Tue, 5 Aug 2008 23:24:22 +0800
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
	<4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>
Message-ID: <3A95AD6D18A749F3B73C135CCC8E7C90@6B2F7FFC298C46F>

hi,
thanks a lot for the help!
cheers,
shaohua
----- Original Message ----- 
From: "Chris Fields" <cjfields at uiuc.edu>
To: "Diego Mauricio Riano Pachon" <diriano at uni-potsdam.de>
Cc: "Bernd Web" <bernd.web at gmail.com>; <Bioperl-l at lists.open-bio.org>; "Shaohua Fan" <lengjingmao at gmail.com>
Sent: Tuesday, August 05, 2008 11:19 PM
Subject: Re: [Bioperl-l] how to remove indentical sequences from a dataset


Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> Haus 20
> 14476 Golm
> Germany
>
> Tel: +49 331 977 2809
> Fax: +49 331 977 2512
>
> web: http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From martin.senger at gmail.com  Tue Aug  5 22:53:07 2008
From: martin.senger at gmail.com (Martin Senger)
Date: Wed, 6 Aug 2008 10:53:07 +0800
Subject: [Bioperl-l] Bio::Biblio doesn't find articles
Message-ID: <4d93f07c0808051953k4cb7511cg5ec4cd93f53cfd0f@mail.gmail.com>

I am afraid that the server that serves the MEDLINE database to the
Bio::Biblio module (using the SOAP protocol), and that is running at EBI,
may be not fully supported. I am not working at EBI anymore and I have
stopped to monitor their servers. I am still their collaborator - but I am
not, unfortunately, involved in the MEDLINE tools anymore.

I would be happy to continue to maintain the Bio::Biblio module but it
relies on a server that I do not anymore control.

Cheers,
Martin

-- 
Martin Senger
email: martin.senger at gmail.com,m.senger at cgiar.org
skype: martinsenger


From Russell.Smithies at agresearch.co.nz  Wed Aug  6 17:20:04 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 7 Aug 2008 09:20:04 +1200
Subject: [Bioperl-l] not BioPerl
Message-ID: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>

Has anyone taken a look at the new Perl interface to the NCBI C++
Toolkit?
Unfortunately, I can't even get their examples working as I'm behind a
firewall and documentation on setting proxy stuff is virtually
non-existant  :-(


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Wed Aug  6 17:33:27 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 6 Aug 2008 16:33:27 -0500
Subject: [Bioperl-l] not BioPerl
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
Message-ID: <E4A73CE7-9288-4245-99D5-90F488698D6C@illinois.edu>

Looks like they're binary releases for 32- and 64-bit linux (quite  
large, at 25 MB).  Would be nice to have the C++ bindings for other  
OS's (my guess is this was set up via swig).

I have access to a linux cluster, so I may give this a try soon.

chris

On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:

> Has anyone taken a look at the new Perl interface to the NCBI C++
> Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>
>
> Russell Smithies
>
> Bioinformatics Applications Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
>
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
>
>
>
>
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From vinaykmittal at gatech.edu  Wed Aug  6 16:56:22 2008
From: vinaykmittal at gatech.edu (Mittal, Vinay K)
Date: Wed, 6 Aug 2008 16:56:22 -0400 (EDT)
Subject: [Bioperl-l] Error installing Biopel Module
Message-ID: <469631287.3995201218056182383.JavaMail.root@mail5.gatech.edu>

Hi,

I just installed Active perl 5.10.0 and was trying to install Bioperl Modules. While installing Bioperl through package manager(ppm), I am getting following errors:

ppm install failed: Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core

I don't know what the problem is. I have never used Bioperl Modules before.

Thanks.


-- 
--------
Vinay Kumar Mittal
MS,Bioinformatics
Georgia Institute of Technology


From rfrancis at ichr.uwa.edu.au  Wed Aug  6 21:11:28 2008
From: rfrancis at ichr.uwa.edu.au (Richard Francis)
Date: Thu, 07 Aug 2008 09:11:28 +0800
Subject: [Bioperl-l] AlignIO::clustalw match_line query
Message-ID: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>

Dear List,

I wonder if you can help.

I?m having trouble finding out on which criteria the conserved and
semi-conserved substitution decisions for a match line produced from the
match_line function in AlignIO are based.

I note that match_line produces the same output as an alignment match
line would from ClustalW and indeed is used in the AlignIO::clustalw
module, but are the substitution decisions based on the same Venn
diagram at http://www.ebi.ac.uk/Tools/clustalw2/clustalw_help.html#color
ie are they faithful to the generation of the match line from within
ClustalW itself?

I need to know this as part of a paper I?m writing so I would really
appreciate your help with this.

Kind regards and thanks in advance,

Richard Francis
#####################################################################################
This e-mail message has been scanned for Viruses and Content and cleared 
by MailMarshal
#####################################################################################


From jason at bioperl.org  Wed Aug  6 22:26:06 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 6 Aug 2008 19:26:06 -0700
Subject: [Bioperl-l] AlignIO::clustalw match_line query
In-Reply-To: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>
References: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>
Message-ID: <A7F598FF-C34F-466E-A73B-D714872201C8@bioperl.org>

Implemented independently, but it was based on what the clustalw  
documentation says.
The main code is in the match_line function in Bio::SimpleAlign.

See the CONSERVATION_GROUPS Hash which looks like this: So a  
'strong' (":") on the match line would be coded where the residues  
seen in a column are only 'S', 'T', or 'A' (for example).

            'strong' => [ qw(
				STA
				NEQK
				NHQK
				NDEQ
				QHRK
				MILV
				MILF
				HY
				FYW )],
             'weak' => [ qw(
				CSA
				ATV
				SAG
				STNK
				STPA
				SGND
				SNDEQK
				NDEQHK
				NEQHRK
				FVLIM
				HFY )],);
}

It was checked against clustalw output by hand when it was  
implemented. If you know of any inconsistencies, let use know.

-jason
On Aug 6, 2008, at 6:11 PM, Richard Francis wrote:

> Dear List,
>
> I wonder if you can help.
>
> I?m having trouble finding out on which criteria the conserved and
> semi-conserved substitution decisions for a match line produced  
> from the
> match_line function in AlignIO are based.
>
> I note that match_line produces the same output as an alignment match
> line would from ClustalW and indeed is used in the AlignIO::clustalw
> module, but are the substitution decisions based on the same Venn
> diagram at http://www.ebi.ac.uk/Tools/clustalw2/ 
> clustalw_help.html#color
> ie are they faithful to the generation of the match line from within
> ClustalW itself?
>
> I need to know this as part of a paper I?m writing so I would really
> appreciate your help with this.
>
> Kind regards and thanks in advance,
>
> Richard Francis
> ###################################################################### 
> ###############
> This e-mail message has been scanned for Viruses and Content and  
> cleared
> by MailMarshal
> ###################################################################### 
> ###############
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From betts at embl.de  Thu Aug  7 08:42:59 2008
From: betts at embl.de (Matthew Betts)
Date: Thu, 7 Aug 2008 14:42:59 +0200 (CEST)
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
Message-ID: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>


Hi,

Has any one tried to draw secondary structure with Bio::Graphics? i.e. two 
different types of glyph with different colours on the same track.

Could use a hash reference to get the different glyph types (would be nice 
if there was a cylinder glyph and a thick arrow glyph), or heterogeneous 
segments to get the different colours, but I can't see how to do both at 
the same time.

Any example code or suggestions on how I could implement it would be 
great.

Thanks,

Matthew

-- 
Matthew Betts PhD, Russell Group (Structural Bioinformatics)
EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
phone: +49 (0)6221 387 8305; mailto:betts at embl.de


From cain.cshl at gmail.com  Thu Aug  7 10:08:39 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 7 Aug 2008 10:08:39 -0400
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
In-Reply-To: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
References: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
Message-ID: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>

Hi Matthew,

I don't have any code examples, but people have used GBrowse for
protein secondary structure, which uses Bio::Graphics underneath the
hood.

If you want to put more than one glyph and/or more than one color in a
track, it is fairly easy.  You just need to provide a callback for
each option when you create the track, like this:

  $panel->add_track($features_array_ref,
                    -glyph       => sub { #code to set the glyph
according the attributes of the feature  },
                    -bgcolor     => sub { #code to set the color },
                    -fgcolor     => 'black',
                 ...etc...
  );

For more information, see the biographics howto:

  http://www.bioperl.org/wiki/HOWTO:Graphics

Scott


On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts <betts at embl.de> wrote:
>
> Hi,
>
> Has any one tried to draw secondary structure with Bio::Graphics? i.e. two
> different types of glyph with different colours on the same track.
>
> Could use a hash reference to get the different glyph types (would be nice
> if there was a cylinder glyph and a thick arrow glyph), or heterogeneous
> segments to get the different colours, but I can't see how to do both at
> the same time.
>
> Any example code or suggestions on how I could implement it would be
> great.
>
> Thanks,
>
> Matthew
>
> --
> Matthew Betts PhD, Russell Group (Structural Bioinformatics)
> EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
> phone: +49 (0)6221 387 8305; mailto:betts at embl.de
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


From betts at embl.de  Thu Aug  7 12:27:28 2008
From: betts at embl.de (Matthew Betts)
Date: Thu, 7 Aug 2008 18:27:28 +0200 (CEST)
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
In-Reply-To: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>
References: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
	<536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0808071823310.8169@pc-russell13>


Hi Scott,

Thanks for that, was a great help - I didn't realise I could use a code 
ref for anything other than the glyph name.

I'm now doing this:

$panel->add_track(
                  '-bgcolor' => sub {
				     my($feature) = @_;
                                     $feature->display_name eq 'strand' ? 'cyan' : 'magenta';
                                    },
                  '-strand_arrow' => sub {
                                          my($feature) = @_;
                                          $feature->display_name eq 'strand' ? 1 : 0;
                                         },
);

Matthew


On Thu, 7 Aug 2008, Scott Cain wrote:

> Hi Matthew,
> 
> I don't have any code examples, but people have used GBrowse for
> protein secondary structure, which uses Bio::Graphics underneath the
> hood.
> 
> If you want to put more than one glyph and/or more than one color in a
> track, it is fairly easy.  You just need to provide a callback for
> each option when you create the track, like this:
> 
>   $panel->add_track($features_array_ref,
>                     -glyph       => sub { #code to set the glyph
> according the attributes of the feature  },
>                     -bgcolor     => sub { #code to set the color },
>                     -fgcolor     => 'black',
>                  ...etc...
>   );
> 
> For more information, see the biographics howto:
> 
>   http://www.bioperl.org/wiki/HOWTO:Graphics
> 
> Scott
> 
> 
> 
> On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts <betts at embl.de> wrote:
> >
> > Hi,
> >
> > Has any one tried to draw secondary structure with Bio::Graphics? i.e. two
> > different types of glyph with different colours on the same track.
> >
> > Could use a hash reference to get the different glyph types (would be nice
> > if there was a cylinder glyph and a thick arrow glyph), or heterogeneous
> > segments to get the different colours, but I can't see how to do both at
> > the same time.
> >
> > Any example code or suggestions on how I could implement it would be
> > great.
> >
> > Thanks,
> >
> > Matthew
> >
> > --
> > Matthew Betts PhD, Russell Group (Structural Bioinformatics)
> > EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
> > phone: +49 (0)6221 387 8305; mailto:betts at embl.de
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> 


From jay at jays.net  Thu Aug  7 12:32:29 2008
From: jay at jays.net (Jay Hannah)
Date: Thu, 07 Aug 2008 11:32:29 -0500
Subject: [Bioperl-l] not BioPerl
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
Message-ID: <489B239D.8060305@jays.net>

Smithies, Russell wrote:
> Has anyone taken a look at the new Perl interface to the NCBI C++ Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>   

Do people actually use the NCBI C++ Toolkit for things outside of NCBI?  
What?

I tried to leverage it a year or so ago for an Entrez/sequence/search 
project and got nowhere.

j


From jcherry at ncbi.nlm.nih.gov  Thu Aug  7 13:06:28 2008
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 7 Aug 2008 13:06:28 -0400 (EDT)
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
Message-ID: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>

For those who may be wondering what this is about, a Perl interface to the 
NCBI C++ Toolkit is available at 
ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/.  The C++ Toolkit is 
the main code base that we develop and use at NCBI.  It includes many 
things that may be of interest to BioPerl users, such as sequence analysis 
algorithms, means for interacting with NCBI databases, and facilities for 
reading, writing, and manipulating NCBI data model objects (usually 
defined by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and 
readable from ASN.1 and XML).

Russell, I think you can make things work from behind a firewall by 
setting some environment variables: set CONN_FIREWALL to 1, possibly set 
CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT 
as appropriate.  Please email me if you can't get things to work.  I'll 
see that decent instructions for this are included in the next release.

Josh Cherry


On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:

> Has anyone taken a look at the new Perl interface to the NCBI C++
> Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>
>
> Russell Smithies


From tristan.lefebure at gmail.com  Thu Aug  7 13:35:24 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Thu, 7 Aug 2008 13:35:24 -0400
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
Message-ID: <200808071335.24668.tristan.lefebure@gmail.com>

Hi list,

I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
BioPerl (with the only difference that I'm using taxids instead of taxon 
names). Basically, the script generates a taxonomic tree given a list of 
taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
object, and then merge this object to a tree object that keeps growing. It 
runs very well with a small number of taxa, but with many taxa (>1000), it is 
very very very slow (about a week for 3000 taxa).

The slowness is due to the  function merge_lineage (line 65), which merges the 
existing tree object with a new taxon object. I guess that the difficulty 
with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
between the tree and the new taxon object...

Would you have any idea of how to get around the problem? Should I look under 
the hood of merge_lineage to try to improve it for large trees?

Thanks!

Version: bioperl-1.5.2_102
OS: GNU/Linux

-Tristan


From cjfields at illinois.edu  Thu Aug  7 13:38:53 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Aug 2008 12:38:53 -0500
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
In-Reply-To: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
References: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
Message-ID: <E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>

Josh,

Thanks for the update.  I saw that these are only binaries for linux  
32/64-bit.  Are there plans to either support other OS's (OS X, Win,  
etc) or to maybe make a release with the XS-bindings so users can work  
towards that?  With additional support I can see this easily fitting  
into several spots in BioPerl, but otherwise I'm unsure.

chris

On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote:

> For those who may be wondering what this is about, a Perl interface  
> to the NCBI C++ Toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/ 
> .  The C++ Toolkit is the main code base that we develop and use at  
> NCBI.  It includes many things that may be of interest to BioPerl  
> users, such as sequence analysis algorithms, means for interacting  
> with NCBI databases, and facilities for reading, writing, and  
> manipulating NCBI data model objects (usually defined by ASN.1  
> specifications; writeable as ASN.1, XML, and JSON, and readable from  
> ASN.1 and XML).
>
> Russell, I think you can make things work from behind a firewall by  
> setting some environment variables: set CONN_FIREWALL to 1, possibly  
> set CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and  
> CONN_HTTP_PROXY_PORT as appropriate.  Please email me if you can't  
> get things to work.  I'll see that decent instructions for this are  
> included in the next release.
>
> Josh Cherry
>
>
> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:
>
>> Has anyone taken a look at the new Perl interface to the NCBI C++
>> Toolkit?
>> Unfortunately, I can't even get their examples working as I'm  
>> behind a
>> firewall and documentation on setting proxy stuff is virtually
>> non-existant  :-(
>>
>>
>> Russell Smithies
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From jcherry at ncbi.nlm.nih.gov  Thu Aug  7 14:04:17 2008
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 7 Aug 2008 14:04:17 -0400 (EDT)
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
In-Reply-To: <E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>
References: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
	<E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>
Message-ID: <Pine.LNX.4.64.0808071359430.13868@widget0.be-md.ncbi.nlm.nih.gov>

Chris,

Support for other OS's is definitely a possibility, depending on community 
feedback (how useful are the wrappers in general, and how much demand is 
there for them on other platforms?).  I wish I could magically make them 
available for Windows and OS X, but there are some technical issues to 
work out.

Josh


On Thu, 7 Aug 2008, Chris Fields wrote:

> Josh,
>
> Thanks for the update.  I saw that these are only binaries for linux 
> 32/64-bit.  Are there plans to either support other OS's (OS X, Win, etc) or 
> to maybe make a release with the XS-bindings so users can work towards that? 
> With additional support I can see this easily fitting into several spots in 
> BioPerl, but otherwise I'm unsure.
>
> chris
>
> On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote:
>
>> For those who may be wondering what this is about, a Perl interface to the 
>> NCBI C++ Toolkit is available at 
>> ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/.  The C++ Toolkit is 
>> the main code base that we develop and use at NCBI.  It includes many 
>> things that may be of interest to BioPerl users, such as sequence analysis 
>> algorithms, means for interacting with NCBI databases, and facilities for 
>> reading, writing, and manipulating NCBI data model objects (usually defined 
>> by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and readable 
>> from ASN.1 and XML).
>> 
>> Russell, I think you can make things work from behind a firewall by setting 
>> some environment variables: set CONN_FIREWALL to 1, possibly set 
>> CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT 
>> as appropriate.  Please email me if you can't get things to work.  I'll see 
>> that decent instructions for this are included in the next release.
>> 
>> Josh Cherry
>> 
>> 
>> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:
>> 
>>> Has anyone taken a look at the new Perl interface to the NCBI C++
>>> Toolkit?
>>> Unfortunately, I can't even get their examples working as I'm behind a
>>> firewall and documentation on setting proxy stuff is virtually
>>> non-existant  :-(
>>> 
>>> 
>>> Russell Smithies
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>


From bix at sendu.me.uk  Thu Aug  7 18:20:29 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Aug 2008 23:20:29 +0100
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
 large trees
In-Reply-To: <200808071335.24668.tristan.lefebure@gmail.com>
References: <200808071335.24668.tristan.lefebure@gmail.com>
Message-ID: <489B752D.2080209@sendu.me.uk>

Tristan Lefebure wrote:
> I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
> BioPerl (with the only difference that I'm using taxids instead of taxon 
> names). Basically, the script generates a taxonomic tree given a list of 
> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
> object, and then merge this object to a tree object that keeps growing. It 
> runs very well with a small number of taxa, but with many taxa (>1000), it is 
> very very very slow (about a week for 3000 taxa).
> 
> The slowness is due to the  function merge_lineage (line 65), which merges the 
> existing tree object with a new taxon object. I guess that the difficulty 
> with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
> between the tree and the new taxon object...
> 
> Would you have any idea of how to get around the problem? Should I look under 
> the hood of merge_lineage to try to improve it for large trees?

Yes, please do. It might have been me that wrote that, in which case I 
didn't do anything fancy or consider the above problem.


From cjfields at illinois.edu  Thu Aug  7 20:42:16 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Aug 2008 19:42:16 -0500
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
In-Reply-To: <489B752D.2080209@sendu.me.uk>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
Message-ID: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>


On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:

> Tristan Lefebure wrote:
>> I'm using a script very similar to bp_taxonomy2tree.pl distributed  
>> with BioPerl (with the only difference that I'm using taxids  
>> instead of taxon names). Basically, the script generates a  
>> taxonomic tree given a list of taxids using the NCBI taxonomy db.  
>> For each taxon, it generates a taxon object, and then merge this  
>> object to a tree object that keeps growing. It runs very well with  
>> a small number of taxa, but with many taxa (>1000), it is very very  
>> very slow (about a week for 3000 taxa).
>> The slowness is due to the  function merge_lineage (line 65), which  
>> merges the existing tree object with a new taxon object. I guess  
>> that the difficulty with a big tree (i.e. more than 1000 leaf) is  
>> to find the nodes in common between the tree and the new taxon  
>> object...
>> Would you have any idea of how to get around the problem? Should I  
>> look under the hood of merge_lineage to try to improve it for large  
>> trees?
>
> Yes, please do. It might have been me that wrote that, in which case  
> I didn't do anything fancy or consider the above problem.

Actually I thought that was fixed; wasn't some caching added for that  
script at one point?

chris


From bix at sendu.me.uk  Fri Aug  8 03:50:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 08 Aug 2008 08:50:50 +0100
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
 large trees
In-Reply-To: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
	<7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
Message-ID: <489BFADA.1060308@sendu.me.uk>

Chris Fields wrote:
> 
> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
> 
>> Tristan Lefebure wrote:
>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed 
>>> with BioPerl (with the only difference that I'm using taxids instead 
>>> of taxon names). Basically, the script generates a taxonomic tree 
>>> given a list of taxids using the NCBI taxonomy db. For each taxon, it 
>>> generates a taxon object, and then merge this object to a tree object 
>>> that keeps growing. It runs very well with a small number of taxa, 
>>> but with many taxa (>1000), it is very very very slow (about a week 
>>> for 3000 taxa).
>>> The slowness is due to the  function merge_lineage (line 65), which 
>>> merges the existing tree object with a new taxon object. I guess that 
>>> the difficulty with a big tree (i.e. more than 1000 leaf) is to find 
>>> the nodes in common between the tree and the new taxon object...
>>> Would you have any idea of how to get around the problem? Should I 
>>> look under the hood of merge_lineage to try to improve it for large 
>>> trees?
>>
>> Yes, please do. It might have been me that wrote that, in which case I 
>> didn't do anything fancy or consider the above problem.
> 
> Actually I thought that was fixed;

Oh yeah. Looks like I did something related to 'speedup for 
merge_lineage()' on the 18th Dec 2006. Tristan, checkout 
Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.


From tristan.lefebure at gmail.com  Fri Aug  8 12:02:32 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Fri, 8 Aug 2008 12:02:32 -0400
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
In-Reply-To: <489BFADA.1060308@sendu.me.uk>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
	<7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
	<489BFADA.1060308@sendu.me.uk>
Message-ID: <d31f7c40808080902o70fbf78dub50c9ba660355ba0@mail.gmail.com>

Yes indeed, with the svn code it took 10 minutes (compared to one week!)
Thanks, -Tristan

On Fri, Aug 8, 2008 at 3:50 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Chris Fields wrote:
>
>>
>> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
>>
>>  Tristan Lefebure wrote:
>>>
>>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed with
>>>> BioPerl (with the only difference that I'm using taxids instead of taxon
>>>> names). Basically, the script generates a taxonomic tree given a list of
>>>> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon
>>>> object, and then merge this object to a tree object that keeps growing. It
>>>> runs very well with a small number of taxa, but with many taxa (>1000), it
>>>> is very very very slow (about a week for 3000 taxa).
>>>> The slowness is due to the  function merge_lineage (line 65), which
>>>> merges the existing tree object with a new taxon object. I guess that the
>>>> difficulty with a big tree (i.e. more than 1000 leaf) is to find the nodes
>>>> in common between the tree and the new taxon object...
>>>> Would you have any idea of how to get around the problem? Should I look
>>>> under the hood of merge_lineage to try to improve it for large trees?
>>>>
>>>
>>> Yes, please do. It might have been me that wrote that, in which case I
>>> didn't do anything fancy or consider the above problem.
>>>
>>
>> Actually I thought that was fixed;
>>
>
> Oh yeah. Looks like I did something related to 'speedup for
> merge_lineage()' on the 18th Dec 2006. Tristan, checkout
> Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.
>


From rvos at interchange.ubc.ca  Fri Aug  8 19:59:20 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Fri, 8 Aug 2008 16:59:20 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
Message-ID: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>

Hi,

while going through a large genbank file
(ftp://ftp.ncbi.nlm.nih.gov/genbank/gbpri21.seq.gz) I ran into malloc
errors. Just for the record (I doubt this does anyone any good), I
got:

perl(391) malloc: *** vm_allocate(size=8421376) failed (error code=3)
perl(391) malloc: *** error: can't allocate region
perl(391) malloc: *** set a breakpoint in szone_error to debug
Out of memory!

What I was trying to do is go through the file, and only write out
those seq objects that aren't human, and that have CDS features, i.e.:

################################################

#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

my $dir = shift @ARGV; # the directory with *.gz files
my $out = shift @ARGV; # the directory to write to...
mkdir $out if not -d $out; # ...which may need to be created

opendir my $dirhandle, $dir or die $!;
for my $archive ( readdir $dirhandle ) {
    next if $archive !~ /\.gz$/;
    my $file = $archive;
    $file =~ s/\.gz$//;

    # external call to the gunzip utility,
    # such that we keep the archive
    system( "gunzip -c \"${dir}/${archive}\" > \"${dir}/${file}\"" );

    # object that parses genbank files,
    # returns Bio::Seq objects
    my $reader = Bio::SeqIO->new(
        '-format' => 'genbank',
        '-file'   => "${dir}/${file}"
    );

    # object that receives Bio::Seq objects,
    # writes genbank files
    my $writer = Bio::SeqIO->new(
        '-format' => 'genbank',
        '-file'   => ">${out}/${file}",
    );
    while ( my $seq = $reader->next_seq ) {
        my $name = $seq->species->binomial;
        if ( $name ne 'Homo sapiens' ) {

            # search for coding sequences among the features
            my $HasCDS = 0;
            FEATURE: for my $f ( $seq->get_SeqFeatures ) {
                if ( $f->primary_tag eq 'CDS' ) {
                    $HasCDS++;
                    last FEATURE;
                }
            }

            # write the sequence to file
            if ( $HasCDS ) {
                $writer->write_seq( $seq );
            }
        }
    }

    # delete the extracted, unfiltered file
    unlink "${dir}/${file}";
}

################################################

Okay, so it runs out of memory. Can I do something to fix that? Should
I flush on either of the I/O objects after each $seq? Could there be
memory leaks in the Bio::Seq objects? Should I $seq->DESTROY them
explicitly or something like that?

Thanks,

Rutger

-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From David.Messina at sbc.su.se  Sat Aug  9 07:04:04 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 9 Aug 2008 13:04:04 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
Message-ID: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>

Hi Rutger,
I ran your script on the same genbank file and, while I did not run out of
memory, I did see what appears to be a memory leak. Even when I manually
undef'd the reader and writer object every 1000 records, memory usage
continued to grow.

I can't quite figure out what's going on, though.

If I run a different program using SeqIO (the simple sequence converter from
the SeqIO HOWTO) on the same input file, I don't see this same runaway
growth.

Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the
sequence converter holds steady at about 12MB of real memory, whereas on
5.10 it grows, albeit slowly, for as long as the program is executing. When
I killed it about 20% of the way through the file, it was up to  about 44MB
of real memory.

Anyone else have a chance to look at this?


Dave


From rvos at interchange.ubc.ca  Sat Aug  9 07:36:20 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Sat, 9 Aug 2008 04:36:20 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
Message-ID: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>

Hi Dave,

thanks for the reply. The memory usage is in fact much more atrocious
than just 44 mb: I'm actually looping over all 36 such archives (the
genbank primates), and on my macbook it steadily increase to over 1gb
of memory. What seemed to be helping somewhat is to call
$reader->flush and $writer->flush after each seq, at least to the
extent that I make it through that one file, but last time I tried I
didn't get much further: the whole terminal process died shortly after
instead. I seem to vaguely recall that even if perl free()'s memory,
that doesn't necessarily mean that the memory is returned to the OS
for the runtime of the program - depending on the OS and perl version.
What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.

Rutger

On Sat, Aug 9, 2008 at 4:04 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Rutger,
> I ran your script on the same genbank file and, while I did not run out of
> memory, I did see what appears to be a memory leak. Even when I manually
> undef'd the reader and writer object every 1000 records, memory usage
> continued to grow.
>
> I can't quite figure out what's going on, though.
> If I run a different program using SeqIO (the simple sequence converter from
> the SeqIO HOWTO) on the same input file, I don't see this same runaway
> growth.
> Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the
> sequence converter holds steady at about 12MB of real memory, whereas on
> 5.10 it grows, albeit slowly, for as long as the program is executing. When
> I killed it about 20% of the way through the file, it was up to  about 44MB
> of real memory.
> Anyone else have a chance to look at this?
>
> Dave
>


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From David.Messina at sbc.su.se  Sat Aug  9 08:58:56 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 9 Aug 2008 14:58:56 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
Message-ID: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>

>
>  I seem to vaguely recall that even if perl free()'s memory that doesn't
> necessarily mean that the memory is returned to the OS for the runtime of
> the program


I believe that's correct.


> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>

perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.


Dave


From cjfields at illinois.edu  Sat Aug  9 09:56:19 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Aug 2008 08:56:19 -0500
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
Message-ID: <57147D88-ABE6-44E0-8D76-790B0C735438@illinois.edu>

There is definitely a memory leak.  I can confirm it on OSX 10.4/10.5  
using bioperl-live.

I'll try looking into it this weekend, but I can't promise when it'll  
be fixed; my laptop is on the fritz.

chris

On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:

>>
>> I seem to vaguely recall that even if perl free()'s memory that  
>> doesn't
>> necessarily mean that the memory is returned to the OS for the  
>> runtime of
>> the program
>
>
> I believe that's correct.
>
>
>
>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>
>
> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>
>
> Dave


From cjfields at illinois.edu  Sat Aug  9 10:15:23 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Aug 2008 09:15:23 -0500
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
Message-ID: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>

Forgot to mention, maybe we can file this as a bug?  It's a pretty  
serious one but it should be easy to narrow down; the change had to be  
introduced fairly recently.

chris

On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:

>>
>> I seem to vaguely recall that even if perl free()'s memory that  
>> doesn't
>> necessarily mean that the memory is returned to the OS for the  
>> runtime of
>> the program
>
>
> I believe that's correct.
>
>
>
>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>
>
> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat Aug  9 12:00:46 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 9 Aug 2008 12:00:46 -0400
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
	<9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
Message-ID: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>

This smells of circular references somewhere. I think the first point  
I would go looking is the species storing - does the problem go away  
if you turn that off? Maybe the version of weaken() is at play here?

	-hilmar

On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:

> Forgot to mention, maybe we can file this as a bug?  It's a pretty  
> serious one but it should be easy to narrow down; the change had to  
> be introduced fairly recently.
>
> chris
>
> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>
>>>
>>> I seem to vaguely recall that even if perl free()'s memory that  
>>> doesn't
>>> necessarily mean that the memory is returned to the OS for the  
>>> runtime of
>>> the program
>>
>>
>> I believe that's correct.
>>
>>
>>
>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>
>>
>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>
>>
>> Dave
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Aug  9 12:07:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 9 Aug 2008 12:07:30 -0400
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>
References: <18792782.post@talk.nabble.com>
	<33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>
Message-ID: <591AE8EB-4D45-4859-A93E-EA9BF01CA9C6@gmx.net>

This looks like a neat trick. Do you think it's worth including as a  
SimpleAlign method (obviously w/o the printing to STDOUT)? I can  
imagine that a lot of people might appreciate it.

	-hilmar

On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:

> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>
>>
>> Hi there,
>> I'm trying to write a perl script to scan an aligned multiple entry  
>> fasta
>> file and find possible primers. So far I've produced a string which  
>> contains
>> bases which match all sequences and * where they don't match e.g.
>> 1) TTAGCCTAA
>> 2) TTAGCAGAA
>> 3) TTACCCTAA
>>
>> would give TTA*C**AA.
>>
>> I want to parse this string and pull out all sequences which are  
>> 18-21 bp in
>> length and have no more than 4 * in them.
>>
>> So far, I've got this:
>>
>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>> print "$1\n";
>> }
>>
>> hoping to match all fragments 18-21 characters in length. However  
>> even that
>> doesn't work as it has essentially chunked it into 21 char blocks,  
>> rather
>> than what I hoped for of
>> 0-18
>> 0-19
>> 0-20
>> 0-21
>> 1-19
>> 1-20
>> 1-21
>> 1-22
>>
>> etc.
>>
>> Can anyone let me know if this is already possible in BioPerl, or  
>> how one
>> would go about it with regex. Sadly I'm fairly new to perl and  
>> getting to
>> grips with BioPerl, so please treat me gently :).
>>
>> Many thanks,
>>
>> Ben
>
> There is a trick to this which is discussed more extensively in  
> 'Mastering Regular Expressions'.  Essentially you have to embed code  
> into the regex and trick the parser into backtracking using a  
> negative lookahead.  The match itself fails (i.e. no match is  
> returned), but the embedded code is executed for each match attempt,
>
> The following script is a slight modification of one I used which  
> checks the consensus string from the input alignment (in aligned  
> FASTA format here), extracts the alignment slice using that match,  
> then spit the alignment out to STDOUT in clustalw format.  This  
> should work for perl 5.8 and up, but it's only been tested on perl  
> 5.10.  You should be able to use this to fit what you want.
>
> my $in = Bio::AlignIO->new(-file => $file,
>                           -format => 'fasta');
> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>                           -format => 'clustalw');
>
> while (my $aln = $in->next_aln) {
>    my $c = $aln->consensus_string(100);
>    my @matches;
>    $c =~ m/
>        ([GTAC?]{18,21})
>        (?{my $match = check_match($1);
>           push @matches, [$match,
>                           pos(),
>                           length($match)]
>              if defined $match;})
>        (?!)
>        /xig;
>    for my $match (@matches) {
>        my ($hit, $st, $end) = ($match->[0],
>                                $match->[1] - $match->[2] + 1,
>                                $match->[1]);
>        my $newaln = $aln->slice($st, $end);
>        $out->write_aln($newaln);
>    }
> }
>
> sub check_match {
>    my $match = shift;
>    return unless $match;
>    my $ct = $match =~ tr/?/?/;
>    return $match if $ct <= 4;
> }
>
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From rvos at interchange.ubc.ca  Sat Aug  9 13:47:33 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Sat, 9 Aug 2008 10:47:33 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
	<9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
	<897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>
Message-ID: <2bb9b24a0808091047t46a6bfa8r7e11a3a1537180@mail.gmail.com>

I am sure my version of weaken() works as advertised. Is there a way
to turn off species storing from outside the code base or do you mean
I go and start commenting bits out in Bio::SeqIO::genbank (or
wherever)?

On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
> This smells of circular references somewhere. I think the first point I
> would go looking is the species storing - does the problem go away if you
> turn that off? Maybe the version of weaken() is at play here?
>
>        -hilmar
>
> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:
>
>> Forgot to mention, maybe we can file this as a bug?  It's a pretty serious
>> one but it should be easy to narrow down; the change had to be introduced
>> fairly recently.
>>
>> chris
>>
>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>>
>>>>
>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't
>>>> necessarily mean that the memory is returned to the OS for the runtime
>>>> of
>>>> the program
>>>
>>>
>>> I believe that's correct.
>>>
>>>
>>>
>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>>
>>>
>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>>
>>>
>>> Dave
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Marie-Claude Hofmann
>> College of Veterinary Medicine
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From hartzell at alerce.com  Sat Aug  9 14:33:51 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 9 Aug 2008 11:33:51 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
Message-ID: <18589.58127.57270.352974@almost.alerce.com>


I'm pretty sure that this fixes the problem:

g.

Index: Bio/Species.pm
===================================================================
--- Bio/Species.pm	(revision 14791)
+++ Bio/Species.pm	(working copy)
@@ -340,6 +340,7 @@
 		}
 		
 		$self->{_species} = $species;
+		weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
     }
 	return $self->{_species};
 }


From cjfields at illinois.edu  Sat Aug  9 15:08:27 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Sat,  9 Aug 2008 14:08:27 -0500 (CDT)
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
Message-ID: <20080809140827.BHN28056@expms6.cites.uiuc.edu>

I'm pretty sure it's not due to a particular version of weaken(), though it does 
sound like a circular references issue.  I have tried this with perl 5.8.6, 5.8.8, 
and 5.10 (all Mac OS, either 10.4 or 10.5); all have the same memory leak 
issues.  

You can try using SeqBuilder to get rid of the Bio::Species object.  I'll give that a 
try when I can.  Unfortunately my laptop is now with the local Apple geniuses 
awaiting a motherboard, so I can't get to it right away (I'll give it a try on my 
wife's laptop).

chris

---- Original message ----
>Date: Sat, 9 Aug 2008 10:47:33 -0700
>From: "Rutger Vos" <rvos at interchange.ubc.ca>  
>Subject: Re: [Bioperl-l] malloc errors while using Bio::SeqIO?  
>To: "Hilmar Lapp" <hlapp at gmx.net>
>Cc: Chris Fields <cjfields at illinois.edu>, bioperl list <bioperl-l at lists.open-
bio.org>
>
>I am sure my version of weaken() works as advertised. Is there a way
>to turn off species storing from outside the code base or do you mean
>I go and start commenting bits out in Bio::SeqIO::genbank (or
>wherever)?
>
>On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> This smells of circular references somewhere. I think the first point I
>> would go looking is the species storing - does the problem go away if you
>> turn that off? Maybe the version of weaken() is at play here?
>>
>>        -hilmar
>>
>> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:
>>
>>> Forgot to mention, maybe we can file this as a bug?  It's a pretty serious
>>> one but it should be easy to narrow down; the change had to be 
introduced
>>> fairly recently.
>>>
>>> chris
>>>
>>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>>>
>>>>>
>>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't
>>>>> necessarily mean that the memory is returned to the OS for the runtime
>>>>> of
>>>>> the program
>>>>
>>>>
>>>> I believe that's correct.
>>>>
>>>>
>>>>
>>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>>>
>>>>
>>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>>>
>>>>
>>>> Dave
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Marie-Claude Hofmann
>>> College of Veterinary Medicine
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> 
=================================================
==========
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> 
=================================================
==========
>>
>>
>>
>>
>
>
>
>-- 
>Dr. Rutger A. Vos
>Department of zoology
>University of British Columbia
>http://www.nexml.org
>http://rutgervos.blogspot.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Sat Aug  9 20:17:52 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 9 Aug 2008 17:17:52 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <18589.58127.57270.352974@almost.alerce.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
Message-ID: <18590.13232.892714.952555@almost.alerce.com>


George Hartzell writes:
 > 
 > I'm pretty sure that this fixes the problem:
 > 
 > g.
 > 
 > Index: Bio/Species.pm
 > ===================================================================
 > --- Bio/Species.pm	(revision 14791)
 > +++ Bio/Species.pm	(working copy)
 > @@ -340,6 +340,7 @@
 >  		}
 >  		
 >  		$self->{_species} = $species;
 > +		weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
 >      }
 >  	return $self->{_species};
 >  }

Actually, it's a bit clearer with the weaken moved up in the block so
that it's closer to where the new tree is allocated.

Chris suggested that I go ahead and I commit it.

g.


From David.Messina at sbc.su.se  Sun Aug 10 05:57:07 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 10 Aug 2008 11:57:07 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <18590.13232.892714.952555@almost.alerce.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
	<18590.13232.892714.952555@almost.alerce.com>
Message-ID: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>

Nice, George -- holds steady at about 32MB now.
Much better. :)


Dave


From hartzell at alerce.com  Sun Aug 10 12:51:39 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sun, 10 Aug 2008 09:51:39 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
	<18590.13232.892714.952555@almost.alerce.com>
	<628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>
Message-ID: <18591.7323.244987.436383@almost.alerce.com>

Dave Messina writes:
 > Nice, George -- holds steady at about 32MB now.
 > Much better. :)

Good to hear.

Bonus points go to rvos@ for providing such a nice clean bug report and
test case, it made running it down much more appealing.

g.


From valiente at lsi.upc.edu  Mon Aug 11 04:09:37 2008
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Mon, 11 Aug 2008 11:09:37 +0300
Subject: [Bioperl-l] get_lca method very slow on many nodes
In-Reply-To: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
References: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
Message-ID: <B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>

Despite the speedup for merge_lineage, the get_lca method still runs  
very slow on a large number of nodes (say, 1500 nodes) and it does  
not rely on merge_lineage. In the get_lca method, all the lineages  
are first collected in @paths in order to later find their $lca,  
while it might be faster to process each $path as soon as it is  
obtained with the get_lineage_nodes method. Any other ideas how to  
speedup the get_lca method? Thanks,

Gabriel


From lmanchon at univ-montp2.fr  Mon Aug 11 12:32:20 2008
From: lmanchon at univ-montp2.fr (Laurent Manchon)
Date: Mon, 11 Aug 2008 18:32:20 +0200
Subject: [Bioperl-l] protein pattern scan
Message-ID: <5.0.2.1.2.20080811182952.00bebff0@pop.univ-montp2.fr>

Hi,

do you know if it's possible to search protein motif in a multifasta 
protein file
using bioperl to return the motif, the position and the name of the 
corresponding sequence ?

thank you for your help.


+---------------------------------------------+
  Laurent Manchon
  Email: lmanchon at univ-montp2.fr
+---------------------------------------------+


From cjfields at illinois.edu  Mon Aug 11 13:32:05 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Mon, 11 Aug 2008 12:32:05 -0500 (CDT)
Subject: [Bioperl-l] protein pattern scan
Message-ID: <20080811123205.BHO45474@expms6.cites.uiuc.edu>

This is covered the FAQ:

http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F

chris

---- Original message ----
>Date: Mon, 11 Aug 2008 18:32:20 +0200
>From: Laurent Manchon <lmanchon at univ-montp2.fr>  
>Subject: [Bioperl-l] protein pattern scan  
>To: bioperl-l at lists.open-bio.org
>
>Hi,
>
>do you know if it's possible to search protein motif in a multifasta 
>protein file
>using bioperl to return the motif, the position and the name of the 
>corresponding sequence ?
>
>thank you for your help.
>
>
>+---------------------------------------------+
>  Laurent Manchon
>  Email: lmanchon at univ-montp2.fr
>+---------------------------------------------+
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Mon Aug 11 13:44:37 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Aug 2008 18:44:37 +0100
Subject: [Bioperl-l] get_lca method very slow on many nodes
In-Reply-To: <B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>
References: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
	<B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>
Message-ID: <48A07A85.6050601@sendu.me.uk>

Gabriel Valiente wrote:
> Despite the speedup for merge_lineage, the get_lca method still runs 
> very slow on a large number of nodes (say, 1500 nodes) and it does not 
> rely on merge_lineage. In the get_lca method, all the lineages are first 
> collected in @paths in order to later find their $lca, while it might be 
> faster to process each $path as soon as it is obtained with the 
> get_lineage_nodes method.

If you try that idea out and it works, please do commit it. I've no 
further suggestions atm, but I haven't had a chance to look at it to 
remind myself what happens.


From cjfields at illinois.edu  Mon Aug 11 15:50:38 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Mon, 11 Aug 2008 14:50:38 -0500 (CDT)
Subject: [Bioperl-l] Finding possible primers regex
Message-ID: <20080811145038.BHO59267@expms6.cites.uiuc.edu>

When I can I could try generating a method which accepts a regex/Bio::Tools::SeqPattern and returns an AlignIO stream or array of SimpleAlign instances (the former could be attached to a temp file for iteration).  Any preference?

chris

---- Original message ----
>Date: Sat, 9 Aug 2008 12:07:30 -0400
>From: Hilmar Lapp <hlapp at gmx.net>  
>Subject: Re: [Bioperl-l] Finding possible primers regex  
>To: Chris Fields <cjfields at uiuc.edu>
>Cc: Benbo <btemperton at googlemail.com>, Bioperl-l at lists.open-bio.org
>
>This looks like a neat trick. Do you think it's worth including as a  
>SimpleAlign method (obviously w/o the printing to STDOUT)? I can  
>imagine that a lot of people might appreciate it.
>
>	-hilmar
>
>On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:
>
>> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>>
>>>
>>> Hi there,
>>> I'm trying to write a perl script to scan an aligned multiple entry  
>>> fasta
>>> file and find possible primers. So far I've produced a string which  
>>> contains
>>> bases which match all sequences and * where they don't match e.g.
>>> 1) TTAGCCTAA
>>> 2) TTAGCAGAA
>>> 3) TTACCCTAA
>>>
>>> would give TTA*C**AA.
>>>
>>> I want to parse this string and pull out all sequences which are  
>>> 18-21 bp in
>>> length and have no more than 4 * in them.
>>>
>>> So far, I've got this:
>>>
>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>> print "$1\n";
>>> }
>>>
>>> hoping to match all fragments 18-21 characters in length. However  
>>> even that
>>> doesn't work as it has essentially chunked it into 21 char blocks,  
>>> rather
>>> than what I hoped for of
>>> 0-18
>>> 0-19
>>> 0-20
>>> 0-21
>>> 1-19
>>> 1-20
>>> 1-21
>>> 1-22
>>>
>>> etc.
>>>
>>> Can anyone let me know if this is already possible in BioPerl, or  
>>> how one
>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>> getting to
>>> grips with BioPerl, so please treat me gently :).
>>>
>>> Many thanks,
>>>
>>> Ben
>>
>> There is a trick to this which is discussed more extensively in  
>> 'Mastering Regular Expressions'.  Essentially you have to embed code  
>> into the regex and trick the parser into backtracking using a  
>> negative lookahead.  The match itself fails (i.e. no match is  
>> returned), but the embedded code is executed for each match attempt,
>>
>> The following script is a slight modification of one I used which  
>> checks the consensus string from the input alignment (in aligned  
>> FASTA format here), extracts the alignment slice using that match,  
>> then spit the alignment out to STDOUT in clustalw format.  This  
>> should work for perl 5.8 and up, but it's only been tested on perl  
>> 5.10.  You should be able to use this to fit what you want.
>>
>> my $in = Bio::AlignIO->new(-file => $file,
>>                           -format => 'fasta');
>> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>>                           -format => 'clustalw');
>>
>> while (my $aln = $in->next_aln) {
>>    my $c = $aln->consensus_string(100);
>>    my @matches;
>>    $c =~ m/
>>        ([GTAC?]{18,21})
>>        (?{my $match = check_match($1);
>>           push @matches, [$match,
>>                           pos(),
>>                           length($match)]
>>              if defined $match;})
>>        (?!)
>>        /xig;
>>    for my $match (@matches) {
>>        my ($hit, $st, $end) = ($match->[0],
>>                                $match->[1] - $match->[2] + 1,
>>                                $match->[1]);
>>        my $newaln = $aln->slice($st, $end);
>>        $out->write_aln($newaln);
>>    }
>> }
>>
>> sub check_match {
>>    my $match = shift;
>>    return unless $match;
>>    my $ct = $match =~ tr/?/?/;
>>    return $match if $ct <= 4;
>> }
>>
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>-- 
>===========================================================
>: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>===========================================================
>
>
>


From hlapp at gmx.net  Mon Aug 11 22:35:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 11 Aug 2008 22:35:13 -0400
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <20080811145038.BHO59267@expms6.cites.uiuc.edu>
References: <20080811145038.BHO59267@expms6.cites.uiuc.edu>
Message-ID: <B79B827A-E66C-417D-A5E8-B4FE96791089@gmx.net>

Actually, now that you ask I'm wondering whether one wouldn't  
sometimes want to retain the relationship between the match and the  
resulting spliced alignment? If so, neither AlignIO nor array would  
accomplish that, right?

Other than that I myself don't have a strong preference either way. I  
suppose AlignIO stream is somewhat more extensible, since as you say  
it could be coupled to a file if the resulting set of alignments is  
really large.

	-hilmar

On Aug 11, 2008, at 3:50 PM, Christopher Fields wrote:

> When I can I could try generating a method which accepts a regex/ 
> Bio::Tools::SeqPattern and returns an AlignIO stream or array of  
> SimpleAlign instances (the former could be attached to a temp file  
> for iteration).  Any preference?
>
> chris
>
> ---- Original message ----
>> Date: Sat, 9 Aug 2008 12:07:30 -0400
>> From: Hilmar Lapp <hlapp at gmx.net>
>> Subject: Re: [Bioperl-l] Finding possible primers regex
>> To: Chris Fields <cjfields at uiuc.edu>
>> Cc: Benbo <btemperton at googlemail.com>, Bioperl-l at lists.open-bio.org
>>
>> This looks like a neat trick. Do you think it's worth including as a
>> SimpleAlign method (obviously w/o the printing to STDOUT)? I can
>> imagine that a lot of people might appreciate it.
>>
>> 	-hilmar
>>
>> On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:
>>
>>> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>>>
>>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple entry
>>>> fasta
>>>> file and find possible primers. So far I've produced a string which
>>>> contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char blocks,
>>>> rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>
>>> There is a trick to this which is discussed more extensively in
>>> 'Mastering Regular Expressions'.  Essentially you have to embed code
>>> into the regex and trick the parser into backtracking using a
>>> negative lookahead.  The match itself fails (i.e. no match is
>>> returned), but the embedded code is executed for each match attempt,
>>>
>>> The following script is a slight modification of one I used which
>>> checks the consensus string from the input alignment (in aligned
>>> FASTA format here), extracts the alignment slice using that match,
>>> then spit the alignment out to STDOUT in clustalw format.  This
>>> should work for perl 5.8 and up, but it's only been tested on perl
>>> 5.10.  You should be able to use this to fit what you want.
>>>
>>> my $in = Bio::AlignIO->new(-file => $file,
>>>                          -format => 'fasta');
>>> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>>>                          -format => 'clustalw');
>>>
>>> while (my $aln = $in->next_aln) {
>>>   my $c = $aln->consensus_string(100);
>>>   my @matches;
>>>   $c =~ m/
>>>       ([GTAC?]{18,21})
>>>       (?{my $match = check_match($1);
>>>          push @matches, [$match,
>>>                          pos(),
>>>                          length($match)]
>>>             if defined $match;})
>>>       (?!)
>>>       /xig;
>>>   for my $match (@matches) {
>>>       my ($hit, $st, $end) = ($match->[0],
>>>                               $match->[1] - $match->[2] + 1,
>>>                               $match->[1]);
>>>       my $newaln = $aln->slice($st, $end);
>>>       $out->write_aln($newaln);
>>>   }
>>> }
>>>
>>> sub check_match {
>>>   my $match = shift;
>>>   return unless $match;
>>>   my $ct = $match =~ tr/?/?/;
>>>   return $match if $ct <= 4;
>>> }
>>>
>>>
>>> chris
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mirhan at indiana.edu  Mon Aug 11 23:46:35 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Mon, 11 Aug 2008 23:46:35 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <D8FD887C-86A9-4E53-9A59-10FDE295C067@duke.edu>
Message-ID: <C4C67FDB.5282%mirhan@indiana.edu>


Hi,
Yes it is true that it's similar to get_all_Annotations, it's basically a recursive version of it.
I wanted to provide a method to get at nested annotations without going through all the if(isa collection) do recursive call.. etc. everytime, because most of the xml elements are implemented as nested annotation collections to the nodes.
( I am contemplating on using tagtrees instead of nested annotation collections in the future, but as of now, Annotation::tagtrees was documented as a temporary implementation, so I passed on that option. )
I forgot about the interface part.
At least for my purpose I would think it's a good function to have in the interface.
I agree that adding a recursive option to the get_all_Annotation would be better.

Mira


On 8/11/08 11:28 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:33 AM, Han, Mira wrote:

> Added get_deep_Annotations in Annotation::Collection
> in order to get annotations that are within nested collections.


I hope I'm not contradicting Chris here, but we will probably want to
think about this a bit more. Your implementation won't work as it is
assuming an interface function that isn't defined on the interface
(both get_deep_Annotations() and _deep_annotation_helper()). Also, it
does nearly the same as get_all_Annotations(), and passing on the keys
to nested collections should maybe simply be an option to that method.
Alternatively, one could add an option -recurse to get_Annotation.

The other difference you note is that your method does not flatten the
nested annotations, but unless I am missing something your
implementation does flatten annotations from nested collections.

So even if we need a separate method for this, something like
get_nested_Annotations() would probably be a more appropriate name,
and if we do need a separate method, it should be compelling enough to
add it to the interface too (as otherwise your code will only work
with certain implementation classes).

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From mirhan at indiana.edu  Tue Aug 12 00:00:28 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Tue, 12 Aug 2008 00:00:28 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu>
Message-ID: <C4C6831C.5287%mirhan@indiana.edu>


Oh yes,
I meant get_Annotations,
I want a get_Annotations that is recursive and passes the keys to the recursive calls.


On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is
already recursive)

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From hlapp at duke.edu  Mon Aug 11 23:54:43 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 11 Aug 2008 23:54:43 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4C67FDB.5282%mirhan@indiana.edu>
References: <C4C67FDB.5282%mirhan@indiana.edu>
Message-ID: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu>

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's  
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is  
already recursive)

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From mrphysh at juno.com  Tue Aug 12 10:30:36 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Tue, 12 Aug 2008 14:30:36 GMT
Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem
Message-ID: <20080812.083036.25924.0@webmail02.vgs.untd.com>

I am studying bioperl and making progress.  I have been struggling with the database retrieval from 
on-line databases.  This is an example................

#!/usr/bin/perl -w
 
use Bio::Perl;

 
$seq_object = get_sequence('swiss',"ROA1_HUMAN");
 
write_sequence(">roa1.fasta",'fasta',$seq_object);

exit;

This script gives

Can't locate IO/String.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at ee_bpo.pl line 12.
BEGIN failed--compilation aborted at ee_bpo.pl line 12.

I have chased around with the paths in @INC, using "use lib'.

This is an install problem.  The original installation was with perl Makefile.pl.  I reinstalled over the old with cpan.  stuff like this:
cpan> o conf prerequisites_policy follow
cpan> i /bioperl/
cpan> install Bundle::BioPerl
cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
This all seemed to proceed smoothly.

this guy did not produce an error.
use Bio::Perl;

I am almost thinking that the problem is with the perl. But regular ftp through perl works:

use Net::FTP;#I found this in usr/share/perl/5.8.8/Net
As a perl command this module seems to work.

I looked in the archives and found nothing.  I think I have done my homework.  any ideas?

I run Ubuntu on a pentium III (and love it).  the version of Ubuntu is new.  the Perl (and MySQL) came with the OS: perl 5.8.8

John Brigham in Denver.


____________________________________________________________
Click to get a free auto insurance quotes from top companies.
http://thirdpartyoffers.juno.com/TGL2141/fc/Ioyw6i3m2nsox4VCjepKpyEFCMEzNF4I2x42PAQjIIwUwo0E7h1wL0/


From jay at jays.net  Tue Aug 12 11:08:59 2008
From: jay at jays.net (Jay Hannah)
Date: Tue, 12 Aug 2008 10:08:59 -0500
Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem
In-Reply-To: <20080812.083036.25924.0@webmail02.vgs.untd.com>
References: <20080812.083036.25924.0@webmail02.vgs.untd.com>
Message-ID: <CA21F550-DF97-41EE-BE8C-CBDFE67B0148@jays.net>

On Aug 12, 2008, at 2:30 PM, mrphysh at juno.com wrote:
> Can't locate IO/String.pm in @INC
...
> cpan> install Bundle::BioPerl
> cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> This all seemed to proceed smoothly

bioperl-1.2.1 is very old. Apparently Bundle::BioPerl is out of date?

Here's lots of info about installing BioPerl:

    http://www.bioperl.org/wiki/Getting_BioPerl

I recommend using bioperl-live directly from SVN, but I'm sort of a  
rebel like that.  :)

Alternately, you could try just doing a

    cpan> install IO::String

HTH,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From heikki at sanbi.ac.za  Thu Aug 14 09:14:48 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 14 Aug 2008 15:14:48 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
Message-ID: <200808141514.49124.heikki@sanbi.ac.za>

A generic method for retrieving nodes from a Bio::Tree::TreeI objects is 
Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the 'id' 
attribute unless a field is given. I can retrieve nodes based on internal id 
like this:

   $tree->find_node(-internal_id => $internal_id);

I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that retrieves by id. 
However, the POD documentation claims that it retrieves by internal id.

What needs to be done?

A. Fix the doc to speak about id
B. Fix to code to retrieve by internal_id
C. Fix the doc and create findnode_by_internal_id()
C. Remove findnode_by_id() as redundant and confusing
D. Deprecate findnode_by_id() as redundant and confusing

There are no tests for findnode_by_id() which to me tilts selection to D and A 
for now.

Any other opinions?

	-Heikki


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From hlapp at gmx.net  Thu Aug 14 18:28:20 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 14 Aug 2008 18:28:20 -0400
Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API?
In-Reply-To: <48A448DD.4000206@psb.ugent.be>
References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com>
	<48A448DD.4000206@psb.ugent.be>
Message-ID: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>

Hi Erick,

how did you determine that go-perl is specific to GO? I've found it to  
work quite well for any kind of OBO-formatted ontology.

Also, you note that BioPerl doesn't have the ability to write in  
certain formats, and to intersect and "unify" (would you mind  
explaining what you mean by that?) ontologies. It seems that your  
implementation of RDF etc export isn't really reusable or modular in  
any way, but I'd love to bring the intersection function over to  
BioPerl (BTW when you decided to roll your own ontology API, did you  
get the impression that BioPerl isn't receptive to you adding to it?).  
Would you mind pointing me to the place in the code where I would find  
that, as I can't seem to find it.

	-hilmar

On Aug 14, 2008, at 11:01 AM, Erick Antezana wrote:

> Hi Arne,
>
> if you plan to work with PERL, you might take a look at ONTO-PERL :
>
> http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn042
> http://search.cpan.org/dist/ONTO-PERL/
> http://search.cpan.org/src/EASR/ONTO-PERL-1.13/doc/example00.html
>
> ONTO-PERL has been used intensively to build the Cell Cycle Ontology.
>
> cheers,
> Erick
>
> Arne Muller wrote:
>> Dear All,
>>
>> I'm new to this list and don't know much about ontologies in general
>> (I worked a bit with GO some time ago).
>>
>> Let me explain my problem: We have several related vocabularies
>> (non-hierarchical and redundant because of different spellings etc
>> ...) to describe organs and tissues in our department, and we need to
>> map each of these vocabs to all of our other legacy vocabs that
>> describe similar concepts. We'd like to use the adult mouse anatomy
>> ontology and modify/extend it with additional terms (if necessary),
>> synonyms and dbXrefs. Most of our vocabs should be mapped as dbXrefs
>> to existing terms in the MA ontology. The goal is that different  
>> units
>> in our department use slightly different vocabulary to describe
>> samples, and we now need link these different system (always the same
>> old story ... ;-).
>>
>> For the moment I'm not planning to turn our messy legacy vocabs into
>> OBO formated ontologies and to map them via cross products and the  
>> OBO
>> relation ontology - though this might be the most  proper way to do
>> ... (comments are welcome).
>>
>> I'll have to write an "easy to use" tool that allows our data curator
>> to easily map the legacy vocabs as dbXrefs of terms in the MA
>> ontology. The question is, how am I gonna do this? I've a fairly good
>> idea of how my software (java webapp) should look like, but are there
>> any APIs and implementations of the OBO model as well as a DB schema
>> and mappings between the model and the schema?
>>
>> I've had a look into the OLS from the EBI that seems to be fairly
>> simple (which is good ;-) and that uses the oboedit.jar somewhere at
>> the back-end. I've also found something like an obo api on
>> http://wiki.geneontology.org/index.php/OBO-Edit:_Getting_the_Source_Code#.28Optional.29_Getting_the_OBO_API_from_Subclipse
>> but so far I've not found any documentation nor examples on how to  
>> get
>> started.
>>
>> I'd be happy to hear how developers and bioinformatics people use obo
>> in their own tools (I better ask before going DIY ...).
>>
>>   thanks a lot for your comments and help
>>   +kind regards,
>>
>>   Arne
>>
>> ------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's  
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win  
>> great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in  
>> the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Obo-discuss mailing list
>> Obo-discuss at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/obo-discuss
>>
>
> -- 
> ==================================================================
> Erick Antezana                    http://www.cellcycleontology.org
> PhD student
> Tel:+32 (0)9 331 38 24                        fax:+32 (0)9 3313809
> VIB Department of Plant Systems Biology, Ghent University
> Technologiepark 927, 9052 Gent, BELGIUM
> erant at psb.ugent.be                  http://www.psb.ugent.be/~erant
> ==================================================================
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Obo-discuss mailing list
> Obo-discuss at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/obo-discuss

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mjanis at chem.ucla.edu  Thu Aug 14 19:37:05 2008
From: mjanis at chem.ucla.edu (Michael Janis)
Date: Thu, 14 Aug 2008 16:37:05 -0700
Subject: [Bioperl-l] Code to contribute
Message-ID: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>

Hi,

 
I've had some perl code lying around for what seems like forever and I'd
like to contribute it to bioperl, if such facilities don't already exist in
bioperl.  The code implements shuffling (DNA or RNA) keeping the
dinucleotide composition (and codon usage) intact through a Eularian path
approach as described in Altschul and Erickson (1985).  The code seeds the
Eularian paths by keeping the first and last nucleotide invariant in the
shuffle - which has minimal detrimental effects to the purpose of the
algorithm, in my experience.

 
A quick search on the bioperl website shows that there is a mutation.pls
script, and facilities for using Sean Eddy's SQUID C library, which
implements the same function (I wrote this particular function before I knew
how to use C).  As such, it's probably not as elegant as Sean Eddy's
implementation, but it works - and it's entirely in perl.

 
The bioperl developer pages suggest a post to the mailing list as the best
place to start contributing to bioperl.  Is this a useful function to add to
the project?

 
Best Regards,

 
Michael

 
-------------------------------

Michael Janis

 <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu

-------------------------------

 
From rvos at interchange.ubc.ca  Thu Aug 14 19:51:43 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Thu, 14 Aug 2008 16:51:43 -0700
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>
	<2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
Message-ID: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>

Sounds exciting! I bet the general advice you'll get is to i) check
out the latest code from svn ii) see which bioperl objects/interfaces
(e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
iii) write a class that performs the algorithm as some sort of
analysis factory taking the sequence object (or ideally object
interface) as an input iv) run that class by the mailing list v) check
it into svn.

On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu> wrote:
> Hi,
>
>
>
> I've had some perl code lying around for what seems like forever and I'd
> like to contribute it to bioperl, if such facilities don't already exist in
> bioperl.  The code implements shuffling (DNA or RNA) keeping the
> dinucleotide composition (and codon usage) intact through a Eularian path
> approach as described in Altschul and Erickson (1985).  The code seeds the
> Eularian paths by keeping the first and last nucleotide invariant in the
> shuffle - which has minimal detrimental effects to the purpose of the
> algorithm, in my experience.
>
>
>
> A quick search on the bioperl website shows that there is a mutation.pls
> script, and facilities for using Sean Eddy's SQUID C library, which
> implements the same function (I wrote this particular function before I knew
> how to use C).  As such, it's probably not as elegant as Sean Eddy's
> implementation, but it works - and it's entirely in perl.
>
>
>
> The bioperl developer pages suggest a post to the mailing list as the best
> place to start contributing to bioperl.  Is this a useful function to add to
> the project?
>
>
>
> Best Regards,
>
>
>
> Michael
>
>
>
> -------------------------------
>
> Michael Janis
>
>  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>
> -------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From mjanis at chem.ucla.edu  Thu Aug 14 19:55:04 2008
From: mjanis at chem.ucla.edu (Michael Janis)
Date: Thu, 14 Aug 2008 16:55:04 -0700
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>
	<2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
Message-ID: <008701c8fe69$2cee6020$86cb2060$@ucla.edu>

Thanks, Rutger, I'll do exactly that!  (give me a few days)


Best Regards,

Michael

-------------------------------
Michael Janis
mjanis at chem.ucla.edu
-------------------------------


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rutger Vos
Sent: Thursday, August 14, 2008 4:52 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Fwd: Code to contribute

Sounds exciting! I bet the general advice you'll get is to i) check
out the latest code from svn ii) see which bioperl objects/interfaces
(e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
iii) write a class that performs the algorithm as some sort of
analysis factory taking the sequence object (or ideally object
interface) as an input iv) run that class by the mailing list v) check
it into svn.

On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu> wrote:
> Hi,
>
>
>
> I've had some perl code lying around for what seems like forever and I'd
> like to contribute it to bioperl, if such facilities don't already exist
in
> bioperl.  The code implements shuffling (DNA or RNA) keeping the
> dinucleotide composition (and codon usage) intact through a Eularian path
> approach as described in Altschul and Erickson (1985).  The code seeds the
> Eularian paths by keeping the first and last nucleotide invariant in the
> shuffle - which has minimal detrimental effects to the purpose of the
> algorithm, in my experience.
>
>
>
> A quick search on the bioperl website shows that there is a mutation.pls
> script, and facilities for using Sean Eddy's SQUID C library, which
> implements the same function (I wrote this particular function before I
knew
> how to use C).  As such, it's probably not as elegant as Sean Eddy's
> implementation, but it works - and it's entirely in perl.
>
>
>
> The bioperl developer pages suggest a post to the mailing list as the best
> place to start contributing to bioperl.  Is this a useful function to add
to
> the project?
>
>
>
> Best Regards,
>
>
>
> Michael
>
>
>
> -------------------------------
>
> Michael Janis
>
>  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>
> -------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Thu Aug 14 21:17:23 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 15 Aug 2008 13:17:23 +1200
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>

You forgot 2 points,

vi) write documentation/examples, and  vii) write tests

;-)


Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Rutger Vos
> Sent: Friday, 15 August 2008 11:52 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Fwd: Code to contribute
> 
> Sounds exciting! I bet the general advice you'll get is to i) check
> out the latest code from svn ii) see which bioperl objects/interfaces
> (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
> iii) write a class that performs the algorithm as some sort of
> analysis factory taking the sequence object (or ideally object
> interface) as an input iv) run that class by the mailing list v) check
> it into svn.
> 
> On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu>
wrote:
> > Hi,
> >
> >
> >
> > I've had some perl code lying around for what seems like forever and
I'd
> > like to contribute it to bioperl, if such facilities don't already
exist in
> > bioperl.  The code implements shuffling (DNA or RNA) keeping the
> > dinucleotide composition (and codon usage) intact through a Eularian
path
> > approach as described in Altschul and Erickson (1985).  The code
seeds the
> > Eularian paths by keeping the first and last nucleotide invariant in
the
> > shuffle - which has minimal detrimental effects to the purpose of
the
> > algorithm, in my experience.
> >
> >
> >
> > A quick search on the bioperl website shows that there is a
mutation.pls
> > script, and facilities for using Sean Eddy's SQUID C library, which
> > implements the same function (I wrote this particular function
before I knew
> > how to use C).  As such, it's probably not as elegant as Sean Eddy's
> > implementation, but it works - and it's entirely in perl.
> >
> >
> >
> > The bioperl developer pages suggest a post to the mailing list as
the best
> > place to start contributing to bioperl.  Is this a useful function
to add to
> > the project?
> >
> >
> >
> > Best Regards,
> >
> >
> >
> > Michael
> >
> >
> >
> > -------------------------------
> >
> > Michael Janis
> >
> >  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
> >
> > -------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> --
> Dr. Rutger A. Vos
> Department of zoology
> University of British Columbia
> http://www.nexml.org
> http://rutgervos.blogspot.com
> 
> 
> 
> --
> Dr. Rutger A. Vos
> Department of zoology
> University of British Columbia
> http://www.nexml.org
> http://rutgervos.blogspot.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mirhan at indiana.edu  Fri Aug 15 02:11:46 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Fri, 15 Aug 2008 02:11:46 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4C6831C.5287%mirhan@indiana.edu>
Message-ID: <C4CA9662.52BD%mirhan@indiana.edu>


Hi,
I've fixed the get_deep_Annotations() to get_nested_Annotations()
It has arguments -keys and -recursive,
And behaves exactly like get_Annotations() when recursive is not set (tested by replacing the get_Annotations() in the Annotation.t)
I made it a new function instead of modifying get_Annotations() because I wasn't sure how to modify it to be backwards compatible.
I thought of adding the function to the interface AnnotationCollectionI,
But it seemed like get_all_Annotations() was missing from the interface as well,
So decided to ask if it should be added to the interface at all.
Isn't it possible that certain implementations of the interface has functions that are only specific to that implementation?

Mira


On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:


Oh yes,
I meant get_Annotations,
I want a get_Annotations that is recursive and passes the keys to the recursive calls.


On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is
already recursive)

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at illinois.edu  Fri Aug 15 09:59:42 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Aug 2008 08:59:42 -0500
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>
Message-ID: <BF55B9C8-B43A-404A-8085-4C8200B9B4AA@illinois.edu>

Agreed!

We're hoping to move to a more structured core after 1.6 which will  
require decent documentation and tests for inclusion.  My feeling is  
one should free to add code samples to relevant pages/sections in the  
BioPerl wiki, or write up your own HOWTO (it's not terribly hard to  
do, and it adds to your karma).

chris

On Aug 14, 2008, at 8:17 PM, Smithies, Russell wrote:

> You forgot 2 points,
>
> vi) write documentation/examples, and  vii) write tests
>
> ;-)
>
>
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Rutger Vos
>> Sent: Friday, 15 August 2008 11:52 a.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Fwd: Code to contribute
>>
>> Sounds exciting! I bet the general advice you'll get is to i) check
>> out the latest code from svn ii) see which bioperl objects/interfaces
>> (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
>> iii) write a class that performs the algorithm as some sort of
>> analysis factory taking the sequence object (or ideally object
>> interface) as an input iv) run that class by the mailing list v)  
>> check
>> it into svn.
>>
>> On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu>
> wrote:
>>> Hi,
>>>
>>>
>>>
>>> I've had some perl code lying around for what seems like forever and
> I'd
>>> like to contribute it to bioperl, if such facilities don't already
> exist in
>>> bioperl.  The code implements shuffling (DNA or RNA) keeping the
>>> dinucleotide composition (and codon usage) intact through a Eularian
> path
>>> approach as described in Altschul and Erickson (1985).  The code
> seeds the
>>> Eularian paths by keeping the first and last nucleotide invariant in
> the
>>> shuffle - which has minimal detrimental effects to the purpose of
> the
>>> algorithm, in my experience.
>>>
>>>
>>>
>>> A quick search on the bioperl website shows that there is a
> mutation.pls
>>> script, and facilities for using Sean Eddy's SQUID C library, which
>>> implements the same function (I wrote this particular function
> before I knew
>>> how to use C).  As such, it's probably not as elegant as Sean Eddy's
>>> implementation, but it works - and it's entirely in perl.
>>>
>>>
>>>
>>> The bioperl developer pages suggest a post to the mailing list as
> the best
>>> place to start contributing to bioperl.  Is this a useful function
> to add to
>>> the project?
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> Michael
>>>
>>>
>>>
>>> -------------------------------
>>>
>>> Michael Janis
>>>
>>> <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>>>
>>> -------------------------------
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> --
>> Dr. Rutger A. Vos
>> Department of zoology
>> University of British Columbia
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>>
>>
>>
>> --
>> Dr. Rutger A. Vos
>> Department of zoology
>> University of British Columbia
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at illinois.edu  Fri Aug 15 10:12:10 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Aug 2008 09:12:10 -0500
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4CA9662.52BD%mirhan@indiana.edu>
References: <C4CA9662.52BD%mirhan@indiana.edu>
Message-ID: <3371D481-1416-4976-9846-83CF83395039@illinois.edu>

The method get_all_annotation_keys() is present in  
AnnotationCollectionI but not get_all_Annotations(), though I doubt it  
is set up for recursive retrieval (something that might be worth  
testing).  I don't have a problem adding this in.  Hilmar, thoughts?

chris

On Aug 15, 2008, at 1:11 AM, Han, Mira wrote:

>
> Hi,
> I've fixed the get_deep_Annotations() to get_nested_Annotations()
> It has arguments -keys and -recursive,
> And behaves exactly like get_Annotations() when recursive is not set  
> (tested by replacing the get_Annotations() in the Annotation.t)
> I made it a new function instead of modifying get_Annotations()  
> because I wasn't sure how to modify it to be backwards compatible.
> I thought of adding the function to the interface  
> AnnotationCollectionI,
> But it seemed like get_all_Annotations() was missing from the  
> interface as well,
> So decided to ask if it should be added to the interface at all.
> Isn't it possible that certain implementations of the interface has  
> functions that are only specific to that implementation?
>
> Mira
>
>
> On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:
>
>
> Oh yes,
> I meant get_Annotations,
> I want a get_Annotations that is recursive and passes the keys to  
> the recursive calls.
>
>
>
> On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
>
> Hi Mira -
>
> On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:
>
>> Yes it is true that it's similar to get_all_Annotations, it's
>> basically a recursive version of it.
>
>
> I suppose you mean get_Annotations(), right? (get_all_Annotations() is
> already recursive)
>
>        -hilmar
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From jorvis at gmail.com  Fri Aug 15 15:45:23 2008
From: jorvis at gmail.com (Joshua Orvis)
Date: Fri, 15 Aug 2008 15:45:23 -0400
Subject: [Bioperl-l] help creating de novo GFF3
Message-ID: <a58187640808151245o48af6311hb8afee40341b7ef2@mail.gmail.com>

I don't have a lot of experience with Bioperl and have used it mostly for
simple format conversions or parsing Genbank files.  I need to create a
quick script to create GFF3 and decided to give bioperl a try again instead
of just printing the columns myself but have had a few problems.  My
apologies for the narrative here but I know it can sometimes be informative
to hear 'how' a user arrived at a problem rather than just knowing the
problem itself.

Is there a documented explicit mapping between the GFF3 columns and the
predefined tags (ID, Name, etc.) and their Bioperl object attribute
equivalents?  Is it preferrable to create Bio::SeqFeature::Generic objects
and pass them to Bio::Tools::GFF->write_feature or rather to create
Bio::SeqFeature::Annotated and pass them to Bio::FeatureIO::gff ?   I may be
overlooking it, but a simple tutorial showing how to create and define a new
sequence object, attach annotations to it and dump in GFF format seems to be
missing.  This seems like a basic thing to do - most of the documentation I
find is about converting between formats rather than creating new
annotation.

Here are some of the problems I (a typical naive user?) ran into when
adventuring with bioperl here.   My first attempt resulted in the string
"SEQ" as column 0 in all my GFF output.  I thought that maybe this was
because my features weren't 'attached' to a sequence, so I created a
Bio::Seq::RichSeq object and tried both (separately):

$seq->add_SeqFeature( $feat );

and

$feat->attach_seq( $seq );

Neither changed the first column of output.  Looking at the
docs.bioperl.orgmethods for Bio::SeqFeature::Generic I found the
seq_id attribute, which
came with the warning: "This attribute should *not* be used in GFF dumping"
- but since it's the only thing I did that worked, I used it anyway.

Next I wanted to have ID tags within my last column.  I first tried setting
all the relevant attributes I could see on my features (id, primary_tag,
display_name, display_id, etc.) but none of these caused ID=? to be
created.  Next, I tried something like this:


my $feat = new Bio::SeqFeature::Annotated (
        -start        => $start,
        -end          => $end,
        -strand       => $strand,
        -primary      => 'gene',
        -seq_id       => $asmbl_id,  ## this works but is discouraged
        -tag          => { ID => $transcript->{pub_locus},
                           product_name => $transcript->{com_name},
                           ec_number => $transcript->{'ec#'},
                           gene_symbol => $transcript->{gene_sym}
                         }
);

My hopes that passing it via the -tag option would do the trick failed, as
it created a line like this instead:

10263   .       gene    58512   56983   .       +       .       iD=AN9220.4;

Notice the 'ID' -> 'iD' transformation (without any command-line warnings).

I'm still stuck on this one (Parent would be next) but overall guidance or
pointers to a tutorial/documentation I'm overlooking would be great.

JO


From jason at bioperl.org  Fri Aug 15 19:00:04 2008
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Aug 2008 16:00:04 -0700
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <200808141514.49124.heikki@sanbi.ac.za>
References: <200808141514.49124.heikki@sanbi.ac.za>
Message-ID: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>

I think D.

There should probably only be one find_node function.

findnode_by_id was written by Ramiro for the re-rooting code and I  
guess it wasn't checked to reduce unneeded functions.  I don't have  
any problems removing/deprecating it but will need to update the code  
that depends on it to use find_node properly.

-jason
On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:

> A generic method for retrieving nodes from a Bio::Tree::TreeI  
> objects is
> Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the  
> 'id'
> attribute unless a field is given. I can retrieve nodes based on  
> internal id
> like this:
>
>   $tree->find_node(-internal_id => $internal_id);
>
> I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that  
> retrieves by id.
> However, the POD documentation claims that it retrieves by internal  
> id.
>
> What needs to be done?
>
> A. Fix the doc to speak about id
> B. Fix to code to retrieve by internal_id
> C. Fix the doc and create findnode_by_internal_id()
> C. Remove findnode_by_id() as redundant and confusing
> D. Deprecate findnode_by_id() as redundant and confusing
>
> There are no tests for findnode_by_id() which to me tilts selection  
> to D and A
> for now.
>
> Any other opinions?
>
> 	-Heikki
>
>
> -- 
> ______ _/      _/_____________________________________________________
>      _/      _/
>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>   _/  _/  _/  SANBI, South African National Bioinformatics Institute
>  _/  _/  _/  University of Western Cape, South Africa
>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason at bioperl.org


From schmidtc at udel.edu  Fri Aug 15 19:42:44 2008
From: schmidtc at udel.edu (Carl Schmidt)
Date: Fri, 15 Aug 2008 19:42:44 -0400
Subject: [Bioperl-l] lazy symbol binding
Message-ID: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu>

When I attempt  bp_load_gff.pl
I get the following error:

dyld: lazy symbol binding failed: Symbol not found: _mysql_init
   Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ 
auto/DBD/mysql/mysql.bundle
   Expected in: dynamic lookup

dyld: Symbol not found: _mysql_init
   Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ 
auto/DBD/mysql/mysql.bundle
   Expected in: dynamic lookup

Trace/BPT trap

Any suggestions?  I apologize if this is the wrong place for posting  
such a question.

Thanks
Carl

Carl J. Schmidt
Department of Animal & Food Sciences
University of Delaware
Newark, DE 19716
schmidtc at udel.edu
http://copland.udel.edu/~schmidtc


From rvos at interchange.ubc.ca  Fri Aug 15 20:11:48 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Fri, 15 Aug 2008 17:11:48 -0700
Subject: [Bioperl-l] Fwd:  lazy symbol binding
In-Reply-To: <2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com>
References: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu>
	<2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com>
Message-ID: <2bb9b24a0808151711m5eba5500k468a0effc711f3c@mail.gmail.com>

---------- Forwarded message ----------
From: Rutger Vos <rutgeraldo at gmail.com>
Date: Fri, Aug 15, 2008 at 5:11 PM
Subject: Re: [Bioperl-l] lazy symbol binding
To: Carl Schmidt <schmidtc at udel.edu>


This looks like a faulty install of the combo of DBI/DBD::mysql

Specifically, the perl driver (DBD::mysql) for the mysql database
builds a dynamic library on installation, which the DBI interface
tries to load dynamically, but fails.

Unfortunately, this is by no means bioperl related. Try a reinstall as
per the instructions at:

http://search.cpan.org/~capttofu/DBD-mysql-4.008/lib/DBD/mysql/INSTALL.pod

On Fri, Aug 15, 2008 at 4:42 PM, Carl Schmidt <schmidtc at udel.edu> wrote:
> When I attempt  bp_load_gff.pl
> I get the following error:
>
> dyld: lazy symbol binding failed: Symbol not found: _mysql_init
>  Referenced from:
> /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle
>  Expected in: dynamic lookup
>
> dyld: Symbol not found: _mysql_init
>  Referenced from:
> /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle
>  Expected in: dynamic lookup
>
> Trace/BPT trap
>
> Any suggestions?  I apologize if this is the wrong place for posting such a
> question.
>
> Thanks
> Carl
>
> Carl J. Schmidt
> Department of Animal & Food Sciences
> University of Delaware
> Newark, DE 19716
> schmidtc at udel.edu
> http://copland.udel.edu/~schmidtc
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From hlapp at duke.edu  Sat Aug 16 13:43:46 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Sat, 16 Aug 2008 13:43:46 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <3371D481-1416-4976-9846-83CF83395039@illinois.edu>
References: <C4CA9662.52BD%mirhan@indiana.edu>
	<3371D481-1416-4976-9846-83CF83395039@illinois.edu>
Message-ID: <D096DA37-D36A-4A3E-A6C3-98254CAC6B8B@duke.edu>


On Aug 15, 2008, at 10:12 AM, Chris Fields wrote:

> The method get_all_annotation_keys() is present in  
> AnnotationCollectionI but not get_all_Annotations()

Interesting. I wonder whether that was a result of the forward and  
reverse surgeries done to the Annotation* interfaces. (I'm off-line as  
I write this so can't check.)

> , though I doubt it is set up for recursive retrieval (something  
> that might be worth testing). I don't have a problem adding this  
> in.  Hilmar, thoughts?

get_all_Annotations() has always been recursive (similarly as  
get_all_SeqFeatures() is for SeqI). However, the recursive behavior is  
different from the behavior that Mira wants.

Specifically, if get_all_Annotations() finds a nested collection under  
a matching tag, it will consider the entire nested collection as  
match, and returns the recursively flattened out annotation objects it  
contains. What Mira needs (if I am understanding her implementation  
correctly) is recursively retrieving annotations if their tag matches  
the query key or set of keys. I.e., a nested collection would be  
searched for matching tags even if the tag of the collection itself  
does not match, and if it does match, only those of the contained  
annotations would be returned that have matching tags.

I'm not sure whether it's better to fold both behaviors into one  
method which has an optional argument to control which one is desired,  
or to have two methods. I'm leaning towards having two methods,  
because support for an added optional argument in external  
implementations of the interface is hard to test for, as opposed to  
testing for the presence/absence of a new method. On the other hand,  
if the existing method wasn't even on the interface to begin with  
(which I'm not yet convinced about) then that shouldn't really be an  
issue.

It seems Chris you are also for adding a second method (and putting  
get_all_Annotations() (back) into the interface)? Does anyone else  
have thoughts or preferences on this?

	-hilmar

>
>
> chris
>
> On Aug 15, 2008, at 1:11 AM, Han, Mira wrote:
>
>>
>> Hi,
>> I've fixed the get_deep_Annotations() to get_nested_Annotations()
>> It has arguments -keys and -recursive,
>> And behaves exactly like get_Annotations() when recursive is not  
>> set (tested by replacing the get_Annotations() in the Annotation.t)
>> I made it a new function instead of modifying get_Annotations()  
>> because I wasn't sure how to modify it to be backwards compatible.
>> I thought of adding the function to the interface  
>> AnnotationCollectionI,
>> But it seemed like get_all_Annotations() was missing from the  
>> interface as well,
>> So decided to ask if it should be added to the interface at all.
>> Isn't it possible that certain implementations of the interface has  
>> functions that are only specific to that implementation?
>>
>> Mira
>>
>>
>> On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:
>>
>>
>> Oh yes,
>> I meant get_Annotations,
>> I want a get_Annotations that is recursive and passes the keys to  
>> the recursive calls.
>>
>>
>>
>> On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
>>
>> Hi Mira -
>>
>> On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:
>>
>>> Yes it is true that it's similar to get_all_Annotations, it's
>>> basically a recursive version of it.
>>
>>
>> I suppose you mean get_Annotations(), right? (get_all_Annotations()  
>> is
>> already recursive)
>>
>>       -hilmar
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From heikki at sanbi.ac.za  Sun Aug 17 03:02:31 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 17 Aug 2008 09:02:31 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
References: <200808141514.49124.heikki@sanbi.ac.za>
	<A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
Message-ID: <200808170902.32485.heikki@sanbi.ac.za>


Done.

findnode_by_id() was not and is not used anywhere in BioPerl core code.

Thanks for input,

   -Heikki

On Saturday 16 August 2008 01:00:04 Jason Stajich wrote:
> I think D.
>
> There should probably only be one find_node function.
>
> findnode_by_id was written by Ramiro for the re-rooting code and I
> guess it wasn't checked to reduce unneeded functions.  I don't have
> any problems removing/deprecating it but will need to update the code
> that depends on it to use find_node properly.
>
> -jason
>
> On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:
> > A generic method for retrieving nodes from a Bio::Tree::TreeI
> > objects is
> > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the
> > 'id'
> > attribute unless a field is given. I can retrieve nodes based on
> > internal id
> > like this:
> >
> >   $tree->find_node(-internal_id => $internal_id);
> >
> > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that
> > retrieves by id.
> > However, the POD documentation claims that it retrieves by internal
> > id.
> >
> > What needs to be done?
> >
> > A. Fix the doc to speak about id
> > B. Fix to code to retrieve by internal_id
> > C. Fix the doc and create findnode_by_internal_id()
> > C. Remove findnode_by_id() as redundant and confusing
> > D. Deprecate findnode_by_id() as redundant and confusing
> >
> > There are no tests for findnode_by_id() which to me tilts selection
> > to D and A
> > for now.
> >
> > Any other opinions?
> >
> > 	-Heikki
> >
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Sun Aug 17 03:02:31 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 17 Aug 2008 09:02:31 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
References: <200808141514.49124.heikki@sanbi.ac.za>
	<A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
Message-ID: <200808170902.32485.heikki@sanbi.ac.za>


Done.

findnode_by_id() was not and is not used anywhere in BioPerl core code.

Thanks for input,

   -Heikki

On Saturday 16 August 2008 01:00:04 Jason Stajich wrote:
> I think D.
>
> There should probably only be one find_node function.
>
> findnode_by_id was written by Ramiro for the re-rooting code and I
> guess it wasn't checked to reduce unneeded functions.  I don't have
> any problems removing/deprecating it but will need to update the code
> that depends on it to use find_node properly.
>
> -jason
>
> On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:
> > A generic method for retrieving nodes from a Bio::Tree::TreeI
> > objects is
> > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the
> > 'id'
> > attribute unless a field is given. I can retrieve nodes based on
> > internal id
> > like this:
> >
> >   $tree->find_node(-internal_id => $internal_id);
> >
> > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that
> > retrieves by id.
> > However, the POD documentation claims that it retrieves by internal
> > id.
> >
> > What needs to be done?
> >
> > A. Fix the doc to speak about id
> > B. Fix to code to retrieve by internal_id
> > C. Fix the doc and create findnode_by_internal_id()
> > C. Remove findnode_by_id() as redundant and confusing
> > D. Deprecate findnode_by_id() as redundant and confusing
> >
> > There are no tests for findnode_by_id() which to me tilts selection
> > to D and A
> > for now.
> >
> > Any other opinions?
> >
> > 	-Heikki
> >
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From mike.thon at gmail.com  Mon Aug 18 01:01:43 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Mon, 18 Aug 2008 07:01:43 +0200
Subject: [Bioperl-l] Build.PL options?
Message-ID: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>

Hi all - I am trying to write a port of bioperl 1.5.2 to enable its  
installation with the macports system (www.macports.org).  I'm not too  
familiar with the Build.PL build system - is there any way to disable  
the dependency resolution that the build script does (i.e. without  
patching the script)?
Thanks
Mike


From David.Messina at sbc.su.se  Mon Aug 18 03:37:53 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 18 Aug 2008 09:37:53 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
Message-ID: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>

Hi Mike,

Great to hear you're planning add BioPerl to MacPorts. Thanks!

I'm afraid I don't know how to disable the dependency resolution, but while
you're waiting for others to chime in , here are a couple of things you
might try if you haven't already:

- The Build.PL for BioPerl is configured to automatically write out a
Makefile.PL for you. If you're more familiar with MakeFile.PL, then you
could work with that.

Module::Build tries to maintain a certain level of cross-compatibility with
ExtUtils::MakeMaker; how do you disable dependency resolution with the
Makefile.PL system?


- The Build.PL API is pretty flexible, and it's easy to change just about
any behavior by passing parameters to Build.PL on the command line. See


http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/API.pod

and


http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/Cookbook.pm


I'm curious -- it seems to me that a major purpose of Build.PL and
Makefile.PL is specifying dependencies (and installing them where
necessary). Does MacPorts override that and do its own dependency-checking
for Perl modules?


Dave


From neetisomaiya at gmail.com  Mon Aug 18 07:45:38 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 18 Aug 2008 17:15:38 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
Message-ID: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>

Hi,

I am fetching data from the ent gene file of KEGG which is available here :
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent

I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
gene names and pathways in which they participate. I am getting the gene
names fine. But this method

"for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
}"

does'nt seem to be working. I am not able to get the data of the pathways in
which the gene is involved.

Can someone please suggest how I can get the pathway data of genes from the
KEGG ent file??

Thanks.

-- 
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Mon Aug 18 08:13:08 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 18 Aug 2008 17:43:08 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
Message-ID: <764978cf0808180513k75388671if72ca7913f6e7fc2@mail.gmail.com>

Hi,

I am fetching data from the ent gene file of KEGG which is available here :
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent

I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
gene names and pathways in which they participate. I am getting the gene
names fine. But this method

"for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
}"

does'nt seem to be working. I am not able to get the data of the pathways in
which the gene is involved.

Can someone please suggest how I can get the pathway data of genes from the
KEGG ent file??

Thanks.

-- 
-Neeti
Even my blood says, B positive


From johnsonm at gmail.com  Mon Aug 18 09:26:26 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 08:26:26 -0500
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
Message-ID: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>

On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya <neetisomaiya at gmail.com> wrote:

> I am fetching data from the ent gene file of KEGG which is available here :
> ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent
>
> I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
> gene names and pathways in which they participate. I am getting the gene
> names fine. But this method
>
> "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
> }"
>
> does'nt seem to be working. I am not able to get the data of the pathways in
> which the gene is involved.
>
> Can someone please suggest how I can get the pathway data of genes from the
> KEGG ent file??

What exactly do you mean by "doesn't seem to be working" and what
version of BioPerl are you using?  The code below seems to function as
expected with BioPerl 1.5.2, producing output like this:

hsa04612  Antigen processing and presentation
hsa01430  Cell Communication
hsa04020  Calcium signaling pathway
hsa04080  Neuroactive ligand-receptor interaction
hsa04540  Gap junction
...
...
...

#!/wherever/bin/perl

use strict;
use warnings;

use Bio::SeqIO;


my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]);

while (my $seq = $seqio->next_seq()) {

    foreach my $pathway ($seq->annotation->get_Annotations('pathway')) {

        ## $pathway should be a Bio::Annotation::Comment
        print $pathway->text(), "\n";

    }

}


From johnsonm at gmail.com  Mon Aug 18 09:29:38 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 08:29:38 -0500
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
	<ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
Message-ID: <ebf5eb170808180629j7813a2d7w2a9b238bd455b962@mail.gmail.com>

On Mon, Aug 18, 2008 at 8:26 AM, Mark Johnson <johnsonm at gmail.com> wrote:

> What exactly do you mean by "doesn't seem to be working" and what
> version of BioPerl are you using?  The code below seems to function as
> expected with BioPerl 1.5.2, producing output like this:

Note that I downloaded
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent and
provided the path to it as the argument to the script.


From mike.thon at gmail.com  Mon Aug 18 15:48:02 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Mon, 18 Aug 2008 21:48:02 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
	<628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>
Message-ID: <5BE575B1-67E1-41B1-AB2F-728558377DDA@gmail.com>


On Aug 18, 2008, at 9:37 AM, Dave Messina wrote:

>
> I'm curious -- it seems to me that a major purpose of Build.PL and  
> Makefile.PL is specifying dependencies (and installing them where  
> necessary). Does MacPorts override that and do its own dependency- 
> checking for Perl modules?
>
Hi Dave - Thanks for the links- I will check them out.  MacPorts can  
resolve dependencies that are specified for each package, much like  
rpm and other linux packaging systems, so its probably better to  
disable the dependency resolution in the bioperl build script and let  
macports handle them.  It looks like I can patch the build.PL script  
pretty easily, unless I can find a better way.

Mike


From johnsonm at gmail.com  Mon Aug 18 16:53:48 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 15:53:48 -0500
Subject: [Bioperl-l] Bio::Annotation issues with BioSQL
Message-ID: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>

    I'm presently refactoring an in-house protein annotation pipeline
and converting it to use BioSQL as a data store.  I've noticed some
slightly screwy behavior with regard to how some of the
Bio::Annotation classes are handled:

-Instances of Bio::Annotation::SimpleValue and
Bio::Annotation::StructuredValue attached to the annotation collection
for a sequence feature (Bio::SeqFeature::Generic) are converted to
tags/values on the feature.
-Instances of Bio::AnnotationDBLink with attached comments loose the comment.

    I'm storing and retrieving things thusly:

my $dbadp = Bio::DB::BioDB->new(
                                                   -database => 'biosql',
                                                   -user        => $user',
                                                   -pass       => $pass,
                                                   -dbname  => $ora_instance,
                                                   -driver      => 'Oracle'
                            );

my $adp = $dbadp->get_object_adaptor("Bio::SeqI");

my $seq = Bio::Seq->new(
                        -id                         => 'DEBUG001',
                        -accession_number => 'DBG001',
                        -desc                     => 'Debug Sequence',
                        -seq                      => 'GATTACA',
                        -namespace           => 'DEBUG',
                       );

my $feature = Bio::SeqFeature::Generic->new(
                                            -seq_id           => 'DEBUG001',
                                            -display_name => 'FEAT0001',
                                            -primary          => 'debug',
                                            -source           => 'test',
                                            -start              => 3,
                                            -end               => 5,
                                            -strand           => 1,
                                           );

my $dblink = Bio::Annotation::DBLink->new(

-database   => 'FAKE001',

-primary_id => 'FK1234567890'',

-comment    => 'This is a fake comment',
                                                              );

$feature->annotation->add_Annotation('ANNO0001, $dblink);
$seq->add_SeqFeature($feature);

my $pseq = $dbadp->create_persistent($seq);

$pseq->store();
$adp->commit();


my $dbadp = Bio::DB::BioDB->new(
                                                    ...
                                                   );

my $adp = $dbadp->get_object_adaptor("Bio::SeqI");

my $query = Bio::DB::Query::BioQuery->new();

$query->datacollections([
                         "Bio::PrimarySeqI s",
                        ]);

$query->where(["s.display_id like DEBUG%'"]);

my $result = $adp->find_by_query($query);

while (my $seq = $result->next_object()) {

    my @features = $seq->get_SeqFeatures();

    foreach my $feature (@features) {

        ## Contents of Bio::Annotation::SimpleValue and
Bio::Annotation::StructeredValue have
        ## migrated to tag/value pairs on $feature and are missing
from $annotation_collection.
        ##
        ## Comments have gone missing from Bio::Annotation::DBLink,
but DBLinks are otherwise intact and present.
        my $annotation_collection = $feature->annotation();
        ...
        ...

    }

}

    Is bioperl-db / BioSQL trying to tell me that I shouldn't be using
Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue?  Is
there even a place in the BioSQL schema for a comment to be attached
to a DBLink?


From neetisomaiya at gmail.com  Tue Aug 19 00:31:28 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 19 Aug 2008 10:01:28 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
	<ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
Message-ID: <764978cf0808182131p620a2dedu40b651be50be5b3c@mail.gmail.com>

Thanks a lot for the reply. It was a problem of the bioperl version.

~Neeti.

On Mon, Aug 18, 2008 at 6:56 PM, Mark Johnson <johnsonm at gmail.com> wrote:

> On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya <neetisomaiya at gmail.com>
> wrote:
>
> > I am fetching data from the ent gene file of KEGG which is available here
> :
> > ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent
> >
> > I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
> > gene names and pathways in which they participate. I am getting the gene
> > names fine. But this method
> >
> > "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
> > }"
> >
> > does'nt seem to be working. I am not able to get the data of the pathways
> in
> > which the gene is involved.
> >
> > Can someone please suggest how I can get the pathway data of genes from
> the
> > KEGG ent file??
>
> What exactly do you mean by "doesn't seem to be working" and what
> version of BioPerl are you using?  The code below seems to function as
> expected with BioPerl 1.5.2, producing output like this:
>
> hsa04612  Antigen processing and presentation
> hsa01430  Cell Communication
> hsa04020  Calcium signaling pathway
> hsa04080  Neuroactive ligand-receptor interaction
> hsa04540  Gap junction
> ...
> ...
> ...
>
> #!/wherever/bin/perl
>
> use strict;
> use warnings;
>
> use Bio::SeqIO;
>
>
> my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]);
>
> while (my $seq = $seqio->next_seq()) {
>
>    foreach my $pathway ($seq->annotation->get_Annotations('pathway')) {
>
>        ## $pathway should be a Bio::Annotation::Comment
>        print $pathway->text(), "\n";
>
>    }
>
> }
>


-- 
-Neeti
Even my blood says, B positive


From wgallin at ualberta.ca  Tue Aug 19 02:25:27 2008
From: wgallin at ualberta.ca (Warren Gallin)
Date: Tue, 19 Aug 2008 00:25:27 -0600
Subject: [Bioperl-l] EUtilities help
Message-ID: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>

Hi,

	Is there a cogent document on using Bio::DB::EUtilities with Bioperl  
1.5.2 around somewhere?

	We upgraded and now my scripts are broken when invoking it.

	Any pointers appreciated.

Thanks,

Warren Gallin


From David.Messina at sbc.su.se  Tue Aug 19 03:30:26 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 19 Aug 2008 09:30:26 +0200
Subject: [Bioperl-l] EUtilities help
In-Reply-To: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
References: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
Message-ID: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>

Hi Warren,

Are you upgrading to 1.5.2 or downgrading from bioperl-live?

If the former, you might consider going all the way to bioperl-live, whose
EUtilities support is improved significantly and documented extensively
here:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

If the latter, I don't believe there is, but Chris Fields will know for sure
and will probably chime in.


Dave

To get bioperl-live
as a tarball: http://www.bioperl.org/DIST/nightly_builds/
or via Subversion: http://www.bioperl.org/wiki/Using_Subversion


From bix at sendu.me.uk  Tue Aug 19 04:34:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Aug 2008 09:34:50 +0100
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
Message-ID: <48AA85AA.7010405@sendu.me.uk>

Michael Thon wrote:
> Hi all - I am trying to write a port of bioperl 1.5.2 to enable its 
> installation with the macports system (www.macports.org).  I'm not too 
> familiar with the Build.PL build system - is there any way to disable 
> the dependency resolution that the build script does (i.e. without 
> patching the script)?

How else will you be doing the dependency resolution? If your system 
just installs all dependencies itself beforehand, then dependency 
resolution won't be invoked in Build.PL :)

Otherwise, don't run Build.PL at all (why do you need to?), or have your 
system answer default to all questions:

echo | perl Build.PL


From mike.thon at gmail.com  Tue Aug 19 11:16:03 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Tue, 19 Aug 2008 17:16:03 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <48AA85AA.7010405@sendu.me.uk>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
	<48AA85AA.7010405@sendu.me.uk>
Message-ID: <E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>

> How else will you be doing the dependency resolution? If your system  
> just installs all dependencies itself beforehand, then dependency  
> resolution won't be invoked in Build.PL :)
> Otherwise, don't run Build.PL at all (why do you need to?),
Good point.  Now, why didn't I think of that? :)
Does Build.PL do anything other than copy Perl modules (and install  
dependencies)?
> or have your system answer default to all questions:
> echo | perl Build.PL
>


From downloadondemand at gmail.com  Thu Aug 14 19:06:04 2008
From: downloadondemand at gmail.com (N)
Date: Fri, 15 Aug 2008 02:06:04 +0300
Subject: [Bioperl-l] HOWTO:Graphics/BLAST output
Message-ID: <923c9ce30808141606k61d9cc23nb18e55dec5112ac4@mail.gmail.com>

Hello again!
Followed HOWTO:Graphics and encountered problem.
While parsing blast output i have clearly offending for me part of file. The
problem is that in this hit there are two HSPs, but the second one is in
"not right" strand orientation (Plus/Plus vs. Plus/Minus). How can i tell
bioperl to use only HSPs oriented along with the best-scored HSP? Or better,
althrough not related to this mailing list: How can i tell BLAST to put this
second HSP to a separate hit? What am i doing/thinking wrong?
Attached small png with problem. The second HSP is in white, but it is
biologically without sense, isn't it?
Thanks.


BLASTN 2.2.18 [Mar-02-2008]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|145226209
         (471 letters)

Database: ../database/1000
           24 sequences; 19,482 total letters

Searching..................................................done


                                                                 Score    E
Sequences producing significant alignments:                      (bits)
Value

gi|145226176 hypothetical_protein                                     599
e-174
gi|145226174 hypothetical_protein                                      30
0.007
gi|145226175 ATP-dependent_exoDNAse_(exonuclease_V)_alpha_subuni...    26
0.11
gi|145226190 hypothetical_protein                                      24
0.43
gi|145226195 protein_of_unknown_function_DUF1526                       22
1.7
gi|145226193 hypothetical_protein                                      22
1.7
gi|145226187 aminoglycoside_nucleotidyltransferase                     22
1.7
gi|145226181 diguanylate_cyclase                                       22
1.7
gi|145226179 hypothetical_protein                                      22
1.7
gi|145226178 hypothetical_protein                                      22
1.7
gi|145226177 putative_methyl-accepting_chemotaxis_sensory_transd...    22
1.7

>gi|145226176 hypothetical_protein
          Length = 477

 Score =  599 bits (302), Expect = e-174
 Identities = 428/470 (91%)
 Strand = Plus / Plus


Query: 1   atgaatgcagacgtgtcagtcaaccagtggaatccgttagaggtagccgctgaggcgaca 60
           |||||||| ||||||||||||||||||||||||||||||||||| |||||||||||||||
Sbjct: 1   atgaatgcggacgtgtcagtcaaccagtggaatccgttagaggtggccgctgaggcgaca 60


Query: 61  atcgctgccgccacagccgcgctggtgtgggaaggcccagacagctacggggtgctggaa 120
           || ||||||||||| |||||  |||||||||||||||||||||||||||||||| ||||
Sbjct: 61  attgctgccgccacggccgcattggtgtgggaaggcccagacagctacggggtgttggag 120


Query: 121 cgggtcgccggggccacagcgaaaggcatagcaacagctcggatagccgccgaaatcatg 180
           ||||| ||||||||||||||||||||||| ||||||||||||||| ||||||||||||||
Sbjct: 121 cgggtagccggggccacagcgaaaggcatggcaacagctcggataaccgccgaaatcatg 180


Query: 181 gctgacgtcaccacctcagttcagttcactgcggccaccgaacatgcgcgcggcggcgct 240
           ||||||||||||||||||||||||||||||||||||  |||| |||||||||||||||||
Sbjct: 181 gctgacgtcaccacctcagttcagttcactgcggccgacgaagatgcgcgcggcggcgct 240


Query: 241 gtagcgggacttccggggtggctggcgccgcggtgggcggcgtccgtgcgtgccgcactg 300
           |||||||| ||||||||||||||||||||||||||||||||||||||||| | |||||||
Sbjct: 241 gtagcggggcttccggggtggctggcgccgcggtgggcggcgtccgtgcgcggcgcactg 300


Query: 301 gacgaactcgaagccgccgggcggcccggctacgccatggtcaaggcgatcacctggcct 360
           ||||| ||||||||||||||||| |||||| ||  |||| | || ||    ||  ||||
Sbjct: 301 gacgagctcgaagccgccgggcgccccggcgacatcatgatgaaagcccggacacggccg 360


Query: 361 gccttgcgcagcgtcgcggggtggacccaagacgggccgctgcaaacatggcagacggct 420
           ||  |||||||||| |||| ||||||||||||||| |||||||| |||||||||||||||
Sbjct: 361 gcactgcgcagcgtggcggtgtggacccaagacggaccgctgcagacatggcagacggct 420


Query: 421 ctaattgtgagcgaagcacggactgctctggctcaccgcgtaggcgtctg 470
           || || |||  |||||||||||||||||||||||||||||||||||||||
Sbjct: 421 ctgatcgtggacgaagcacggactgctctggctcaccgcgtaggcgtctg 470


 Score = 22.3 bits (11), Expect = 1.7
 Identities = 11/11 (100%)
 Strand = Plus / Minus


Query: 124 gtcgccggggc 134
           |||||||||||
Sbjct: 333 gtcgccggggc 323


The rest of output truncated...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tst.png
Type: image/png
Size: 1668 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080815/24a3f555/attachment-0003.png>

From UKaraoz at lbl.gov  Thu Aug 14 20:03:51 2008
From: UKaraoz at lbl.gov (Ulas Karaoz)
Date: Thu, 14 Aug 2008 17:03:51 -0700
Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for
	blasttable
Message-ID: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>

Hi,

I found out that the save_output routine in RemoteBlast.pm doesn't  
save the output properly into a file when the Blast output is a  
hittable(blasttable). Might the reason be the fact that the tabular  
output has a line that starts with a # while the parser is looking for  
a line starting with just BLASTN, as in the section pasted below:

if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i ||
			 $l =~/^RPS-BLAST\s*.+$/i ) {
			$seentop=1;
}


From erant at psb.ugent.be  Fri Aug 15 08:25:59 2008
From: erant at psb.ugent.be (Erick Antezana)
Date: Fri, 15 Aug 2008 14:25:59 +0200
Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API?
In-Reply-To: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>
References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com>
	<48A448DD.4000206@psb.ugent.be>
	<1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>
Message-ID: <48A575D7.7010709@psb.ugent.be>

Hi Hilmar,

Hilmar Lapp wrote:
> Hi Erick,
>
> how did you determine that go-perl is specific to GO? I've found it to 
> work quite well for any kind of OBO-formatted ontology.
we have used go-perl during the gestation (2005) of the ontologies we 
built/handled/etc. In particular while dealing with GO (as it was 
originally conceived for that purpose). It is extremely useful indeed. 
However; as new extensions were needed by our team, new modules were 
developed on top of the existing ones and that code was actually 
inducing a new development branch which ended up in something called 
onto-perl. Onto-perl, as you might have realized, is strongly influenced 
by go-perl. On the other hand,  we had had some communications with 
Chris Mungall at that time, and he manifested he might drop further 
development on go-perl. Nevertheless, some time later he told me he will 
continue, which is good since many systems are based on it. But, by then 
we already had a sort of independent package which has shown to be useful.
>
> Also, you note that BioPerl doesn't have the ability to write in 
> certain formats, and to intersect and "unify" (would you mind 
> explaining what you mean by that?) ontologies.
While working with several OBO ontologies, we needed to have them (or 
part of them) merged, intersected, join (=get one ontology=unify). It 
can be of course a bit subjective..since you can "unify" ontologies 
based on different features/approaches/etc and while building 
application ontologies (such as CCO) you might be confronted to identify 
identical terms coming from different ontologies and get only one in 
your integrated resource....
> It seems that your implementation of RDF etc export isn't really 
> reusable or modular in any way, 
The exports (RDF, OWL, ...) are part of the Ontology module's 
functionality. have you had any particular problems while exporting an 
ontology? Please let us know so that it can be fixed or improved. On the 
other hand, that module is nowadays undergoing a deep improvement (not 
released yet) to accommodate a huge set of "ontologiz-ed" resources into 
an RDF repository. I would be also interested in discussing about it so 
that we could improve it.
> but I'd love to bring the intersection function over to BioPerl 
excellent! I think this is an appropriate time to make the diverse 
developments converge so that the users community could have a standard 
set of tools. We are  interested in following up these discussions.
> (BTW when you decided to roll your own ontology API, did you get the 
> impression that BioPerl isn't receptive to you adding to it?).
as I mentioned, we took originally go-perl since it offered much more 
functionalities that BioPerl::Ontology::*
> Would you mind pointing me to the place in the code where I would find 
> that, as I can't seem to find it.
I forwarded your request to the developer who might give you more 
details about it.
>
>     -hilmar
>


cheers,
Erick


From bix at sendu.me.uk  Tue Aug 19 11:56:32 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Aug 2008 16:56:32 +0100
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>	<48AA85AA.7010405@sendu.me.uk>
	<E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>
Message-ID: <48AAED30.1090103@sendu.me.uk>

Michael Thon wrote:
>> How else will you be doing the dependency resolution? If your system 
>> just installs all dependencies itself beforehand, then dependency 
>> resolution won't be invoked in Build.PL :)
>> Otherwise, don't run Build.PL at all (why do you need to?),
>
> Good point.  Now, why didn't I think of that? :)
> Does Build.PL do anything other than copy Perl modules (and install 
> dependencies)?

It generates the 'Build' script, which provides a whole host of functions:

./Build help

The only one other than 'install' that might be relevant to you is

./Build test

to run the test suite, if testing is part of you own system. There are 
other standard ways to run the tests though, that don't need you to 
create the Build script.

Somewhere along the lines, probably during the install function, it also 
creates man files and other forms of documentation iirc, and installs 
those too.

If you handle dependencies, installation and testing (or don't do 
testing) yourself, and don't care about man files (perldoc is good 
enough?) you can ignore Build.PL.


From xxq.t.xu at gmail.com  Tue Aug 19 12:57:41 2008
From: xxq.t.xu at gmail.com (XQ Xu)
Date: Tue, 19 Aug 2008 09:57:41 -0700
Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined primer
Message-ID: <3fde82050808190957y271aa52eh30e39a438cc8a8e3@mail.gmail.com>

Hi all,
I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use
Primer3 to calculate Tm for some pre-defined primers; however there is no
direct way to calculate Tm with Primer3. I have to call Primer3 and supply a
pre-defined primer, a template, etc to let it run and hopefully Primer3
finds a pair of primers for me, then I have to open the output and find out
what the Tm is for my pre-defined primer. Do I miss any function that can do
this quickly for me?
I know there's another module (Bio::SeqFeature::Primer) can do this
quickly, but the Tm is calculated with different parameters; therefore it's
not good to use it while I use Primer3 to design primers.
Any input?
Thanks!
-Tony


From cjfields at illinois.edu  Tue Aug 19 13:39:03 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 19 Aug 2008 12:39:03 -0500
Subject: [Bioperl-l] EUtilities help
In-Reply-To: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>
References: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
	<628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>
Message-ID: <E67F3EE3-7D3B-497A-926F-101A06D34956@illinois.edu>

As mentioned in previous posts and in the original POD, the original  
Bio::DB::EUtilities was experimental (unstable API).  It is deprecated  
in favor of the bioperl-live interface, which splits the user agent  
and parameter handling (Bio::DB::EUtilities, EUtilParameters) from the  
parsers (Bio::Tools::EUtilities).  Unfortunately, the original design  
was too rushed so any XML changes broke the tools; I basically had to  
start from the ground up again.  Any changes to eutil output should  
now be easier to deal with (famous last words).

I am still planning on adding a few things to it (including tests for  
the parser and user agent) but it shouldn't change substantially from  
what is in the cookbook.

chris

On Aug 19, 2008, at 2:30 AM, Dave Messina wrote:

> Hi Warren,
>
> Are you upgrading to 1.5.2 or downgrading from bioperl-live?
>
> If the former, you might consider going all the way to bioperl-live,  
> whose
> EUtilities support is improved significantly and documented  
> extensively
> here:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook
>
> If the latter, I don't believe there is, but Chris Fields will know  
> for sure
> and will probably chime in.
>
>
> Dave
>
> To get bioperl-live
> as a tarball: http://www.bioperl.org/DIST/nightly_builds/
> or via Subversion: http://www.bioperl.org/wiki/Using_Subversion
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at illinois.edu  Tue Aug 19 14:00:42 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 19 Aug 2008 13:00:42 -0500
Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for
	blasttable
In-Reply-To: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>
References: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>
Message-ID: <4B8B416E-2DA7-4B1E-9DE2-E1301B467637@illinois.edu>

Saving tabular BLAST to a file is working for me using bioperl-live.   
NCBI recently changed tabular BLAST output which broke parsing (there  
is an extra column now, can't remember what), but it is now fixed.

chris

On Aug 14, 2008, at 7:03 PM, Ulas Karaoz wrote:

> Hi,
>
> I found out that the save_output routine in RemoteBlast.pm doesn't  
> save the output properly into a file when the Blast output is a  
> hittable(blasttable). Might the reason be the fact that the tabular  
> output has a line that starts with a # while the parser is looking  
> for a line starting with just BLASTN, as in the section pasted below:
>
> if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i ||
> 			 $l =~/^RPS-BLAST\s*.+$/i ) {
> 			$seentop=1;
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Tue Aug 19 13:56:42 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 19 Aug 2008 13:56:42 -0400
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
Message-ID: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>


On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote:

>    I'm presently refactoring an in-house protein annotation pipeline
> and converting it to use BioSQL as a data store.  I've noticed some
> slightly screwy behavior with regard to how some of the
> Bio::Annotation classes are handled:
>
> -Instances of Bio::Annotation::SimpleValue and
> Bio::Annotation::StructuredValue attached to the annotation collection
> for a sequence feature (Bio::SeqFeature::Generic) are converted to
> tags/values on the feature.
>
> -Instances of Bio::Annotation::DBLink with attached comments loose  
> the comment.
> [...]
> $query->where(["s.display_id like DEBUG%'"]);

There's a single quote missing here, but I'm assuming that's a result  
of copy/paste editing?

> [...]
>    Is bioperl-db / BioSQL trying to tell me that I shouldn't be using
> Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue?

Your example code doesn't contain an example for where you are getting  
the B::A::StructuredValue object from. If you didn't create that  
yourself, it would be good to know what you did to end up with that.  
Chris Fields has written B::A::Tagtree which would be way forward, and  
if you created the object yourself, can you take a look at that and  
see whether that class wouldn't serve your purpose as well or even  
better?

In order to be stored in BioSQL structured (hierarchical, nested)  
annotation is flattened into a string representation, because BioSQL  
can't store nested annotation collections natively. Right now if I am  
not mistaken upon retrieval this is not converted back into a  
B::A::Tagtree object but rather left flat. This is being worked on  
though, we've just discussed some issues connected with that.

I could make B::A::StructuredValue work the same way, but I'm not sure  
what it provides that B::A::Tagtree doesn't. The latter uses  
Data::Stag under the hood, which is much cleaner, and more extensible  
in the future.

As for SimpleValue annotation versus tag/value annotation for  
seqfeatures, yes right now these are treated interchangeably for the  
purposes of BioSQL and Bioperl-db. You can do this easily too on your  
end by using Bio::SeqFeature::AnnotationAdaptor.

> Is there even a place in the BioSQL schema for a comment to be  
> attached
> to a DBLink?

No there isn't. I thought it is but it turns out that this isn't yet  
one of the desirable extensions to BioSQL from 1.1.x onwards, as  
documented on the wiki:

http://www.biosql.org/wiki/Enhancement_Requests

I'll add it (but feel free to do so yourself, especially if you have  
other enhancmenets).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From johnsonm at gmail.com  Wed Aug 20 14:43:25 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 20 Aug 2008 13:43:25 -0500
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
	<2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
Message-ID: <ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>

On Tue, Aug 19, 2008 at 12:56 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote:
> There's a single quote missing here, but I'm assuming that's a result of
> copy/paste editing?

Yes, I was a bit sloppy with the example.

> Your example code doesn't contain an example for where you are getting the
> B::A::StructuredValue object from. If you didn't create that yourself, it
> would be good to know what you did to end up with that. Chris Fields has
> written B::A::Tagtree which would be way forward, and if you created the
> object yourself, can you take a look at that and see whether that class
> wouldn't serve your purpose as well or even better?

I created the B::A::StructuredValue myself.  I'm using it to store the
output from PSORTb, which gives a cellular localization and a score
for a protein sequence (gene), which I'm trying to keep paired
together, if possible.  I'll take a look at B::A::Tagtree, that's
probably a better fit.

> In order to be stored in BioSQL structured (hierarchical, nested) annotation
> is flattened into a string representation, because BioSQL can't store nested
> annotation collections natively. Right now if I am not mistaken upon
> retrieval this is not converted back into a B::A::Tagtree object but rather
> left flat. This is being worked on though, we've just discussed some issues
> connected with that.

The data I have isn't really deeply nested.  I just like to keep
related annotation in one object, if possible.

> I could make B::A::StructuredValue work the same way, but I'm not sure what
> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the
> hood, which is much cleaner, and more extensible in the future.

Perhaps B::A::StructuredValue should be deprecated?

> As for SimpleValue annotation versus tag/value annotation for seqfeatures,
> yes right now these are treated interchangeably for the purposes of BioSQL
> and Bioperl-db. You can do this easily too on your end by using
> Bio::SeqFeature::AnnotationAdaptor.

I'll check out the AnnotationAdaptor, but I'll probably just end using
seqfeature tags/values.  They're functionally equivalent to
B::A::SimpleValue.

>> Is there even a place in the BioSQL schema for a comment to be attached
>> to a DBLink?
>
> No there isn't. I thought it is but it turns out that this isn't yet one of
> the desirable extensions to BioSQL from 1.1.x onwards, as documented on the
> wiki:
>
> http://www.biosql.org/wiki/Enhancement_Requests
>
> I'll add it (but feel free to do so yourself, especially if you have other
> enhancmenets).

I'll take a look at the wiki....I'll file that as a feature request if
I get there before you do it.


From cjfields at illinois.edu  Wed Aug 20 16:25:55 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 20 Aug 2008 15:25:55 -0500
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
	<2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
	<ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>
Message-ID: <9872D07D-61AB-4F0A-A477-35AA87ABF72E@illinois.edu>


On Aug 20, 2008, at 1:43 PM, Mark Johnson wrote:

> ...
>
>> I could make B::A::StructuredValue work the same way, but I'm not  
>> sure what
>> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag  
>> under the
>> hood, which is much cleaner, and more extensible in the future.
>
> Perhaps B::A::StructuredValue should be deprecated?

Probably.  The only place it was used in core was SeqIO::swiss (and  
now that uses Tagtree in bioperl-live).

Let me know if you have any problems with Bio::Annotation::Tagtree.  I  
am planning on doing some more work with it soon.

chris


From cjfields at illinois.edu  Thu Aug 21 10:26:22 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Aug 2008 09:26:22 -0500
Subject: [Bioperl-l] Annotations issue (GenBank)
Message-ID: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>

I'm working on a GenBank patch and noticed a few cases where  
annotations are being stored as all uppercase strings (CONTIG, WGS,  
etc).  I'm planning on converting these to lowercase (e.g. 'wgs',  
'contig') for consistency with other annotation tag values.  I'm  
making sure output is consistent as well.

These are used fairly infrequently so I don't think it should cause  
problems, but just in case, does anyone have a problem with this change?

chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From awitney at sgul.ac.uk  Thu Aug 21 10:41:10 2008
From: awitney at sgul.ac.uk (Adam Witney)
Date: Thu, 21 Aug 2008 15:41:10 +0100
Subject: [Bioperl-l] adding HSP information to BLAST output graphic
	(Bio::Graphics)
Message-ID: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>


Hi,

I am going through the Bio::Graphics HOWTO on the wiki.

Looking at render_blast4.pl, the description text describes the whole  
hit and is set for the whole track, but i would like to be able to add  
HSP information such as the identity matches onto the picture, this is  
stored in the $hsp object. How would i go about adding that to the  
picture?

the relevant piece of code is:

       my $track = $panel->add_track(
                                     -glyph       => 'graded_segments',
                                     -label       => 1,
                                     -connector   => 'dashed',
                                     -bgcolor     => 'blue',
                                     -font2color  => 'red',
                                     -sort_order  => 'high_score',
                                     -description => sub {
                                       my $feature = shift;
                                       return unless $feature- 
 >has_tag('description');
                                       my ($description) = $feature- 
 >each_tag_value('description');
                                       my $score = $feature->score;
                                       "$description, score=$score";
                           #            "score=$score";
                                      },
                                    );

       next unless $hit->significance < 1E-20;

       my $feature = Bio::SeqFeature::Generic->new(
                                                   -score        =>  
$hit->raw_score,
                                                   -display_name =>  
$hit->name,
                                                   -tag          => {
                                                                      
description => $hit->description
                                                                    },
                                                  );

       while( my $hsp = $hit->next_hsp )
         {
          $feature->add_sub_SeqFeature($hsp,'EXPAND');
         }

       $track->add_feature($feature);

thanks for any help

adam


From cjfields at illinois.edu  Thu Aug 21 12:01:11 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Aug 2008 11:01:11 -0500
Subject: [Bioperl-l] Annotations issue (GenBank)
In-Reply-To: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>
References: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>
Message-ID: <0ACE20F0-43E5-4E12-9437-094871766083@illinois.edu>

I went ahead and committed this; if there are any disagreements about  
it I can back it out or modify as needed.

chris

On Aug 21, 2008, at 9:26 AM, Chris Fields wrote:

> I'm working on a GenBank patch and noticed a few cases where  
> annotations are being stored as all uppercase strings (CONTIG, WGS,  
> etc).  I'm planning on converting these to lowercase (e.g. 'wgs',  
> 'contig') for consistency with other annotation tag values.  I'm  
> making sure output is consistent as well.
>
> These are used fairly infrequently so I don't think it should cause  
> problems, but just in case, does anyone have a problem with this  
> change?
>
> chris


From mshafiullah at mail.unomaha.edu  Thu Aug 21 15:35:54 2008
From: mshafiullah at mail.unomaha.edu (Mohammad Shafiullah)
Date: Thu, 21 Aug 2008 14:35:54 -0500
Subject: [Bioperl-l] bioperl-network test error
Message-ID: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>

To whom it may concern: 

Encountered the following error while running ./Build test on 
bioperl-network-1.5.2_100

Can't stat scripts: No such file or directory
 at /usr/share/perl5/Module/Build/Base.pm line 3836
t/Edge...........ok 
t/Graph-MD5......1/19 Not an ARRAY reference at 
/usr/share/perl5/Heap/Elem.pm line 31.
t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
 All 19 subtests passed 
t/Graph-Seq......1/16 Not an ARRAY reference at 
/usr/share/perl5/Heap/Elem.pm line 31.
t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
 All 16 subtests passed 
t/IO_dip_tab.....ok 
t/IO_psi.........ok 
t/Interaction....ok 
t/Node...........ok 
t/ProteinNet.....ok 

Test Summary Report
-------------------
t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
  Non-zero exit status: 9
t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
  Non-zero exit status: 255
Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33 cusr 
0.37 csys =  1.72 CPU)
Result: FAIL
Failed 2/8 test programs. 0/292 subtests failed.

Please advise on the issue. 

Sincerely,

- Mohammad


From bosborne11 at verizon.net  Thu Aug 21 16:43:35 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 21 Aug 2008 16:43:35 -0400
Subject: [Bioperl-l] bioperl-network test error
In-Reply-To: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
References: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
Message-ID: <83EB2D5B-37F2-4163-8A14-F93ECE740197@verizon.net>

Mohammed,

Try replacing the ModuleBuildBioperl.pm file that you have with the  
one that's attached. Then do this inside the bioperl-network-1.5.2_100  
directory:

./Build clean
perl Build.PL
./Build test

What do you see? Also, what's your version of Perl?


Brian O.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ModuleBuildBioperl.pm
Type: text/x-perl-script
Size: 40360 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080821/17d19dd5/attachment-0003.bin>
-------------- next part --------------
:


On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote:

> To whom it may concern:
>
> Encountered the following error while running ./Build test on
> bioperl-network-1.5.2_100
>
> Can't stat scripts: No such file or directory
> at /usr/share/perl5/Module/Build/Base.pm line 3836
> t/Edge...........ok
> t/Graph-MD5......1/19 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
> All 19 subtests passed
> t/Graph-Seq......1/16 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
> All 16 subtests passed
> t/IO_dip_tab.....ok
> t/IO_psi.........ok
> t/Interaction....ok
> t/Node...........ok
> t/ProteinNet.....ok
>
> Test Summary Report
> -------------------
> t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
>  Non-zero exit status: 9
> t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
>  Non-zero exit status: 255
> Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33  
> cusr
> 0.37 csys =  1.72 CPU)
> Result: FAIL
> Failed 2/8 test programs. 0/292 subtests failed.
>
> Please advise on the issue.
>
> Sincerely,
>
> - Mohammad
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Thu Aug 21 17:57:17 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 21 Aug 2008 17:57:17 -0400
Subject: [Bioperl-l] bioperl-network test error
In-Reply-To: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
References: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
Message-ID: <AC71DAAF-C324-4D6E-8D56-EE19B1957A27@verizon.net>

Mohammad,

Take a look at this:

http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2007-06/msg00365.html

It discusses that Heap::Elem error when using Graph.

Brian O.


On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote:

> To whom it may concern:
>
> Encountered the following error while running ./Build test on
> bioperl-network-1.5.2_100
>
> Can't stat scripts: No such file or directory
> at /usr/share/perl5/Module/Build/Base.pm line 3836
> t/Edge...........ok
> t/Graph-MD5......1/19 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
> All 19 subtests passed
> t/Graph-Seq......1/16 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
> All 16 subtests passed
> t/IO_dip_tab.....ok
> t/IO_psi.........ok
> t/Interaction....ok
> t/Node...........ok
> t/ProteinNet.....ok
>
> Test Summary Report
> -------------------
> t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
>  Non-zero exit status: 9
> t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
>  Non-zero exit status: 255
> Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33  
> cusr
> 0.37 csys =  1.72 CPU)
> Result: FAIL
> Failed 2/8 test programs. 0/292 subtests failed.
>
> Please advise on the issue.
>
> Sincerely,
>
> - Mohammad
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dominic at bioinf.uni-leipzig.de  Fri Aug 22 09:35:21 2008
From: dominic at bioinf.uni-leipzig.de (Dominic Rose)
Date: Fri, 22 Aug 2008 15:35:21 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
Message-ID: <48AEC099.20106@bioinf.uni-leipzig.de>

Hi,

just a short suggestion to improve the code:

in function

sub _build_nt_matrix()

one finds the following lines:

         my $ti_index = $NucleotideIndexes{$ti};
         my $tj_index = $NucleotideIndexes{$tj};

         if( ! defined $ti_index ) {
             print "ti_index not defined for $ti\n";
             next;
         }

However, it should be possible to stop/silence the printing of that 
error message. Many alignments contain N's what causes many many
"ti_index not defined for N" messages. That should be avoidable.

Thanks,
Dominic

-- 
Dominic Rose
Professur f?r Bioinformatik
Institut f?r Informatik
Universit?t Leipzig
H?rtelstr. 16-18
D-04107 Leipzig
WWW    http://www.bioinf.uni-leipzig.de
Phone: +49 341 97-16698
Fax:   +49 341 97-16679


From heikki at sanbi.ac.za  Wed Aug 27 02:23:39 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 27 Aug 2008 08:23:39 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
In-Reply-To: <48AEC099.20106@bioinf.uni-leipzig.de>
References: <48AEC099.20106@bioinf.uni-leipzig.de>
Message-ID: <200808270823.39852.heikki@sanbi.ac.za>

Dominic,

You are absolutely right. 

I've changed 'print' into '$self->warn' in the SVN. Now it is possible to set 
$object->verbose(-1) to silence the warning or, if deemed necessary, set 
$object->verbose(2) and catch the error with an eval statement.

Thanks for reporting this,

    -Heikki

On Friday 22 August 2008 15:35:21 Dominic Rose wrote:
> Hi,
>
> just a short suggestion to improve the code:
>
> in function
>
> sub _build_nt_matrix()
>
> one finds the following lines:
>
>          my $ti_index = $NucleotideIndexes{$ti};
>          my $tj_index = $NucleotideIndexes{$tj};
>
>          if( ! defined $ti_index ) {
>              print "ti_index not defined for $ti\n";
>              next;
>          }
>
> However, it should be possible to stop/silence the printing of that
> error message. Many alignments contain N's what causes many many
> "ti_index not defined for N" messages. That should be avoidable.
>
> Thanks,
> Dominic

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From giles.weaver at googlemail.com  Wed Aug 27 06:39:19 2008
From: giles.weaver at googlemail.com (Giles Weaver)
Date: Wed, 27 Aug 2008 11:39:19 +0100
Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined
	primer
In-Reply-To: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com>
References: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com>
Message-ID: <1d06cd5d0808270339p4432f91bo72f48364696caf5a@mail.gmail.com>

Hi Tony,

It isn't well documented, but Primer3 includes a program called oligotm,
which is used to calculate the Tm of short sequences (up to 32bp). You can
run it directly by typing something like "oligotm ACGTACGTACGTACGT" in the
terminal. Just typing oligotm will give you the options.

If you are using Linux, these snippets of code may help you call oligotm
from within a perl script:

use IPC::Open3;

sub _run_oligotm
{
    my ($class, $sequence) = @_;

    my $run = "oligotm -tp 1 -sc 1 $sequence";
    my $pid = open3(\*WTRFH, \*RDRFH, \*ERRFH, $run);
    close (WTRFH);

    my ($tm, $errors);
    while (<RDRFH>) { $tm .= $_;}
    while (<ERRFH>) { $errors .= $_;}
    chomp $tm;
    return ($tm, $errors);
}

You'll need to put this in a package or edit out the $class bit for it to
work.

This is my first post to this list. I'm receiving the digest so replying to
posts is a bit of a faff. Can anyone recommend a better way of replying to
posts than replying to the digest, editing it and pasting the subject into
the subject field?

Giles Weaver
Unilever R&D


>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 19 Aug 2008 09:57:41 -0700
> From: "XQ Xu" <xxq.t.xu at gmail.com>
> Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined
>        primer
> To: bioperl-l at lists.open-bio.org
> Message-ID:
>        <3fde82050808190957y271aa52eh30e39a438cc8a8e3 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi all,
> I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use
> Primer3 to calculate Tm for some pre-defined primers; however there is no
> direct way to calculate Tm with Primer3. I have to call Primer3 and supply
> a
> pre-defined primer, a template, etc to let it run and hopefully Primer3
> finds a pair of primers for me, then I have to open the output and find out
> what the Tm is for my pre-defined primer. Do I miss any function that can
> do
> this quickly for me?
> I know there's another module (Bio::SeqFeature::Primer) can do this
> quickly, but the Tm is calculated with different parameters; therefore it's
> not good to use it while I use Primer3 to design primers.
> Any input?
> Thanks!
> -Tony
>


From awitney at sgul.ac.uk  Wed Aug 27 12:21:12 2008
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 27 Aug 2008 17:21:12 +0100
Subject: [Bioperl-l] adding HSP information to BLAST output graphic
	(Bio::Graphics)
In-Reply-To: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>
References: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>
Message-ID: <8939216F-7AFC-40BB-A416-8508E3E5D871@sgul.ac.uk>


after some digging around myself, this seems to do the trick, although  
i don't know if it will always work, as i am having to add a separate  
sort function.

On 21 Aug 2008, at 15:41, Adam Witney wrote:

>
> Hi,
>
> I am going through the Bio::Graphics HOWTO on the wiki.
>
> Looking at render_blast4.pl, the description text describes the  
> whole hit and is set for the whole track, but i would like to be  
> able to add HSP information such as the identity matches onto the  
> picture, this is stored in the $hsp object. How would i go about  
> adding that to the picture?
>
> the relevant piece of code is:
>
>      my $track = $panel->add_track(
>                                    -glyph       => 'graded_segments',
>                                    -label       => 1,
>                                    -connector   => 'dashed',
>                                    -bgcolor     => 'blue',
>                                    -font2color  => 'red',
>                                    -sort_order  => 'high_score',

                                     -part_labels => sub {
                                       my ($feature,undef,$partno) = @_;
                                       my @features =  
sort_features($feature->get_SeqFeatures());
                                       return $features[$partno]- 
 >num_identical.'/'.$features[$partno]->length.' (score='. 
$features[$partno]->score.')' if $features[$partno];
                                      },

>                                    -description => sub {
>                                      my $feature = shift;
>                                      return unless $feature- 
> >has_tag('description');
>                                      my ($description) = $feature- 
> >each_tag_value('description');
>                                      my $score = $feature->score;
>                                      "$description, score=$score";
>                          #            "score=$score";
>                                     },
>                                   );


sub sort_features {
   my @array = @_;

   if(@array < 2){return @array}

   my @sorted = sort {$a->start <=> $b->start} @array;

   return @sorted;
}


is this the best way to achieve this?

thanks

adam


From mauricio at open-bio.org  Thu Aug 28 13:43:04 2008
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Thu, 28 Aug 2008 12:43:04 -0500
Subject: [Bioperl-l] Pdoc updates
Message-ID: <48B6E3A8.3050305@open-bio.org>

For those who use the online Pdoc documentation (doc.bioperl.org), this 
is to let you know that the CvsWeb links at the top of any module page 
now link to the proper place in the SVN web interface.

Cheers,
Mauricio.


From jaudall at gmail.com  Fri Aug 29 02:46:20 2008
From: jaudall at gmail.com (Joshua Udall)
Date: Thu, 28 Aug 2008 23:46:20 -0700
Subject: [Bioperl-l] DB_File and assembly IO
Message-ID: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>

Bioperl -

I'm trying to read/parse a single cap3 ace file with several thousand
contigs.  I get a DB_File error at Contig247.  Here's the error:

------------- EXCEPTION -------------
MSG: Unable to tie DB_File handle
STACK Bio::SeqFeature::Collection::new
/Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
STACK Bio::Assembly::Contig::new
/Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
STACK Bio::Assembly::IO::ace::next_assembly
/Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
STACK toplevel /Users/jaudall/bin/read_ace.pl:214
-------------------------------------

Looking at the Collection::new, the error is on the middle line:

  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
$self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die "Cannot open
file: $!\n" ;
  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
  return $self;

If I uncomment out the $! die statement that I inserted, I get this:

'Cannot open file tree: Too many open files'

Apparently the Collection constructor is creating a new index file for each
one and the handles for each are sticking around?  That confuses me because
reading more about the Collection.pm and DB_File, it appeared to me that no
files were written by default (as I'm doing), rather the Collection objects
are all stored in memory.  I'm pretty sure the error is not a permission
error, and if it is not the open file-handles, what else should I look for?


If I 'warn' the error instead of throwing it, I get:

Can't call method "get_dup" on an undefined value at
/Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360

This kind of makes sense because the index appears not be be created and it
can't look stuff up in an undefined tied hash.  I'm stuck.

Thanks for any help and suggestions.

OSX, perl 5.8.8, bioperl-live (svn last week)

-- 
Joshua Udall
Assistant Professor
295 WIDB
Plant and Wildlife Science Dept.
Brigham Young University
Provo, UT 84602
801-422-9307
Fax: 801-422-0008
USA


From florent.angly at gmail.com  Fri Aug 29 04:40:25 2008
From: florent.angly at gmail.com (Florent Angly)
Date: Fri, 29 Aug 2008 18:40:25 +1000
Subject: [Bioperl-l] DB_File and assembly IO
In-Reply-To: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
Message-ID: <48B7B5F9.1050608@gmail.com>

Hi Joshua,

I don't know the specifics of DB_File, but the 'Cannot open file tree: 
Too many open files' is pretty explicit.
If you're on Unix/Linux you can check the files that are open by your 
program by typing:
    lsof | grep name_of_program
There is probably a filehandle that in not closed somewhere in your code 
or the BioPerl code.
Best,

Florent


Joshua Udall wrote:
> Bioperl -
>
> I'm trying to read/parse a single cap3 ace file with several thousand
> contigs.  I get a DB_File error at Contig247.  Here's the error:
>
> ------------- EXCEPTION -------------
> MSG: Unable to tie DB_File handle
> STACK Bio::SeqFeature::Collection::new
> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
> STACK Bio::Assembly::Contig::new
> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
> STACK Bio::Assembly::IO::ace::next_assembly
> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
> -------------------------------------
>
> Looking at the Collection::new, the error is on the middle line:
>
>   $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die "Cannot open
> file: $!\n" ;
>   $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>   return $self;
>
> If I uncomment out the $! die statement that I inserted, I get this:
>
> 'Cannot open file tree: Too many open files'
>
> Apparently the Collection constructor is creating a new index file for each
> one and the handles for each are sticking around?  That confuses me because
> reading more about the Collection.pm and DB_File, it appeared to me that no
> files were written by default (as I'm doing), rather the Collection objects
> are all stored in memory.  I'm pretty sure the error is not a permission
> error, and if it is not the open file-handles, what else should I look for?
>
>
> If I 'warn' the error instead of throwing it, I get:
>
> Can't call method "get_dup" on an undefined value at
> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360
>
> This kind of makes sense because the index appears not be be created and it
> can't look stuff up in an undefined tied hash.  I'm stuck.
>
> Thanks for any help and suggestions.
>
> OSX, perl 5.8.8, bioperl-live (svn last week)
>
>   


From cjfields at illinois.edu  Fri Aug 29 10:30:49 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 29 Aug 2008 09:30:49 -0500
Subject: [Bioperl-l] DB_File and assembly IO
In-Reply-To: <48B7B5F9.1050608@gmail.com>
References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
	<48B7B5F9.1050608@gmail.com>
Message-ID: <5717CE96-EC24-46D9-A922-88702B1647A1@illinois.edu>

This is a known problem with Bio::Assembly and stems from having a  
DB_File tied (opened) for each Bio::Assembly::Contig (via a retained  
Bio::SeqFeature::Collection).  You can extend the number of open  
filehandles on UNIX'y flavors using ulimit (see following link), but  
I'm not sure about Win32.

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

The general bug is reproducible using the following simple script.  If  
needed adjust the range end in the for loop to exceed the ulimit (via  
'ulimit -n);  Mac OS X 10.5 is set to 2560.

---------------------------
use Bio::Assembly::Contig;

my @contigs;

push @contigs, Bio::Assembly::Contig->new() for (1..10000);
---------------------------

I'll open a bug report on this for tracking (for release 1.7, along  
with any other Bio::Assembly issues).  That doesn't mean it won't get  
fixed sooner, just that we aren't under pressure with the next  
release, which already has a full plate.  IMO, I don't think there  
needs to be one SF::Collection per contig; one instance should work do  
for the entire assembly, using the same SF::Collection passed in to  
each contig and distinguishing the contig using the SeqFeature  
seq_id.  It would also be nice if we could change that to also allow  
other SeqFeature::CollectionI (i.e. Bio::DB::SeqFeature::Store and the  
like, for instance).

chris

On Aug 29, 2008, at 3:40 AM, Florent Angly wrote:

> Hi Joshua,
>
> I don't know the specifics of DB_File, but the 'Cannot open file  
> tree: Too many open files' is pretty explicit.
> If you're on Unix/Linux you can check the files that are open by  
> your program by typing:
>   lsof | grep name_of_program
> There is probably a filehandle that in not closed somewhere in your  
> code or the BioPerl code.
> Best,
>
> Florent
>
>
>
> Joshua Udall wrote:
>> Bioperl -
>>
>> I'm trying to read/parse a single cap3 ace file with several thousand
>> contigs.  I get a DB_File error at Contig247.  Here's the error:
>>
>> ------------- EXCEPTION -------------
>> MSG: Unable to tie DB_File handle
>> STACK Bio::SeqFeature::Collection::new
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
>> STACK Bio::Assembly::Contig::new
>> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
>> STACK Bio::Assembly::IO::ace::next_assembly
>> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
>> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
>> -------------------------------------
>>
>> Looking at the Collection::new, the error is on the middle line:
>>
>>  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
>> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die  
>> "Cannot open
>> file: $!\n" ;
>>  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>>  return $self;
>>
>> If I uncomment out the $! die statement that I inserted, I get this:
>>
>> 'Cannot open file tree: Too many open files'
>>
>> Apparently the Collection constructor is creating a new index file  
>> for each
>> one and the handles for each are sticking around?  That confuses me  
>> because
>> reading more about the Collection.pm and DB_File, it appeared to me  
>> that no
>> files were written by default (as I'm doing), rather the Collection  
>> objects
>> are all stored in memory.  I'm pretty sure the error is not a  
>> permission
>> error, and if it is not the open file-handles, what else should I  
>> look for?
>>
>>
>> If I 'warn' the error instead of throwing it, I get:
>>
>> Can't call method "get_dup" on an undefined value at
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm  
>> line 360
>>
>> This kind of makes sense because the index appears not be be  
>> created and it
>> can't look stuff up in an undefined tied hash.  I'm stuck.
>>
>> Thanks for any help and suggestions.
>>
>> OSX, perl 5.8.8, bioperl-live (svn last week)
>>


From milan.gilic at st.t-com.hr  Sat Aug 23 17:15:39 2008
From: milan.gilic at st.t-com.hr (Milan)
Date: Sat, 23 Aug 2008 23:15:39 +0200
Subject: [Bioperl-l]  Count or weight matrix in bioperl?
Message-ID: <48B07DFB.3040201@st.t-com.hr>


From dominic at bioinf.uni-leipzig.de  Mon Aug 25 06:40:42 2008
From: dominic at bioinf.uni-leipzig.de (Dominic Rose)
Date: Mon, 25 Aug 2008 12:40:42 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
Message-ID: <48B28C2A.8050803@bioinf.uni-leipzig.de>

Hi,

just a short suggestion to improve the code:

in function

sub _build_nt_matrix()

one finds the following lines:

         my $ti_index = $NucleotideIndexes{$ti};
         my $tj_index = $NucleotideIndexes{$tj};

         if( ! defined $ti_index ) {
             print "ti_index not defined for $ti\n";
             next;
         }

However, it should be possible to avoid the printing of that error 
message. Many alignments contain N's what causes many many
"ti_index not defined for N" messages. It should be possible to switch 
that message off.

Thanks,
Dominic

-- 
Dominic Rose
Professur f?r Bioinformatik
Institut f?r Informatik
Universit?t Leipzig
H?rtelstr. 16-18
D-04107 Leipzig
WWW    http://www.bioinf.uni-leipzig.de
Phone: +49 341 97-16698
Fax:   +49 341 97-16679


From brunovecchi at yahoo.com.ar  Fri Aug  1 03:15:19 2008
From: brunovecchi at yahoo.com.ar (Bruno Vecchi)
Date: Fri, 01 Aug 2008 00:15:19 -0300
Subject: [Bioperl-l] Bio::Biblio doesn't find articles
Message-ID: <48927FC7.5060103@yahoo.com.ar>

Hi everyone,
I am trying to retrieve bibliographic data using Bio::Biblio, but so far 
I haven't got any luck.

The following code prints zero results whatever the keyword ("atom" in 
this example) you choose. Could anyone please point me to my mistake? 
There are no errors in the output, just no articles found.

# Beginning of script
#/usr/bin/perl -w
use strict;
use Bio::Biblio;

my $bib_obj = Bio::Biblio->new();

my $biblio_results = $bib_obj->find("atom");
print $biblio_results->get_count;
# End of script

Calling the Bio::Biblio constructor method without parameters sets them 
to default values, which are:

access: soap
location: 'http://www.ebi.ac.uk/openbqs/services/MedlineSRS'

I also tried using the example script at 
http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/scripts/biblio/biblio.PLS 
with the same result. For instance, trying:

biblio.PLS -find Java -find perl

Gave the following output:

Looking for 'Java'...    Found 0
Looking for 'perl'...    Found 0
 
Maybe the URL of the service is out of date?

Thanks a lot in advance!

Bruno.


From brunovecchi at yahoo.com.ar  Fri Aug  1 04:16:16 2008
From: brunovecchi at yahoo.com.ar (Bruno Vecchi)
Date: Fri, 01 Aug 2008 01:16:16 -0300
Subject: [Bioperl-l] Bio::Biblio doesn't find articles [SOLVED]
Message-ID: <48928E10.7090903@yahoo.com.ar>

An HTML attachment was scrubbed...
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080801/f000f7b8/attachment-0004.html>

From Kevin.Clancy at invitrogen.com  Fri Aug  1 22:30:30 2008
From: Kevin.Clancy at invitrogen.com (Clancy, Kevin)
Date: Fri, 1 Aug 2008 15:30:30 -0700
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
Message-ID: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>

Hi Folks
I am using the windows version of Bioperl 1.5.2_100. I recently was
compiling a tool using ActiveState's PerlApp which included Bioperl
modules. I received an error for the Bio::SeqIO module, which was
calling for the Bio::SeqIO::staden::read method(?) on line 312 - 314 of
the Bio::SeqIO.pm module. I don't appear to have a copy of the staden
module under the Bio::SeqIO directory and it doesn't appear to be
present in the current BioPerl trunk. I simply commented this out of my
SeqIO.pm file to perform my build and its all running normally. Was this
simply a reference to a non existent module or am I missing something?
Thank you for your help.
kevin

Kevin Clancy, PhD
Senior Scientist, Informatic Sciences
Invitrogen Corp
Carlsbad, CA 92008
Phone: (768) 268 8356
Email: kevin.clancy at invitrogen.com 


From jason at bioperl.org  Sat Aug  2 12:58:05 2008
From: jason at bioperl.org (Jason Stajich)
Date: Sat, 2 Aug 2008 07:58:05 -0500
Subject: [Bioperl-l] Inframe stop codon
In-Reply-To: <516747.39380.qm@web36405.mail.mud.yahoo.com>
References: <516747.39380.qm@web36405.mail.mud.yahoo.com>
Message-ID: <F3DAE939-82AC-4EB8-8667-F6612478477E@bioperl.org>

[regarding PAML analyses]

You would need to translate the cDNA sequence and identify where the  
stop codon is, then remove that codon or remove that sequence from  
your bulk analyses.  it depends on why you think the stop codon is in  
the sequence - mis-annotation, this is a pseudogene, or what?  If  
this is a small percentage of a lot of sequences I would probably  
just skip these, if this is the terminal stop codon that being  
included in the sequences, you just need to remove the last codon  
from the sequences before providing it to PAML. There Seq HOWTO has  
many examples of how to manipulate a sequence object with substr,  
trunc, as well as just the simple seq() method that gives you the  
sequence as a string, which you can manipulate, then update the  
sequence object afterwards. As in
my $str = $seq->seq;
# remove the last codon from this cDNA sequence
substr($str, -3, 3,'');
$seq->seq($str);

Alternatively you can use trunc to truncate the sequence
my $trunc = $seq->trunc(1,$seq->length -3);
$seq = $trunc;


You can translate the sequence with the $seq->translate command, then  
test for presence of a stop codon (This is exactly the code that is  
running in the pairwise_kaks script that is in the scripts/utilities/  
directory).  If you have a stop codon you need to figure out where it  
is at the end of the sequence or not.  If it is the terminal codon,  
you can just lop off the last codon on all your sequences, but if it  
is internal, you need to decide what you want to do with this sequence.

If there are multiple stop codons, I am not sure it is appropriate to  
run PAML here, unless you are interested in some sort of pseudo-rate  
calculation that has many of the codons omitted.  Otherwise you may  
just want to calculate a DNA substitution rate for the sequences to  
make comparison.

I suggest working a single file by hand to get the appropriate steps  
down and then coding it up will be easier.

I am sure folks on the list can help too so it is important to post  
to the mailing list - I don't see any messages from you on the list  
about this query.

-jason
On Aug 2, 2008, at 5:42 AM, Tannistha wrote:

>
> Hi Jason,
>
> Please suggest me how to filter the inframe stop codons,  
> aa_to_dna_aln returns the sequence with in-frame stop codons.
> I have posted my query along with the input files to the forum.
>
> Thanks for your earlier advice, runmode =0 is working for me.
>
> Look forward to your reply
>
> Best Regards
> Tannistha
>
>
> Dr. Tannistha Nandi
> email: tannistha3 at yahoo.com
>
>
>


From David.Messina at sbc.su.se  Sun Aug  3 19:10:18 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 3 Aug 2008 21:10:18 +0200
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
In-Reply-To: <28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
References: <Acj0JjSSmx+wYhbUS0GAoF0WhwWjzg==>
	<28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
Message-ID: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>

Hi Kevin,

The staden module is a oddball one, to be sure.

A search on the BioPerl website turns up this FAQ entry:
http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F

Also the Windows install page
http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows

says:

> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden> and
> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs can only
> be installed on Windows by using Cygwin <http://www.cygwin.com/> and its gcc
> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>


In any case, the staden module (and associated external libraries) is used
only if you are trying to read the scf, abi, alf, pln, exp, ctf, or ztr
binary formats. So your edit shouldn't cause you any problems otherwise.

Dave


From cjfields at uiuc.edu  Sun Aug  3 20:20:52 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Aug 2008 15:20:52 -0500
Subject: [Bioperl-l] Reference to a staden module under Bio::SeqIO.pm
In-Reply-To: <628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>
References: <Acj0JjSSmx+wYhbUS0GAoF0WhwWjzg==>
	<28813B71732ED64A83348116D27A1A9A0251ACA3@CBD01EXCMBX01.ads.invitrogen.net>
	<628aabb70808031210u28f46f1fp5f40cd3443134d6c@mail.gmail.com>
Message-ID: <BA800345-8498-46CF-AD94-E8FD9098DEBB@uiuc.edu>

This seems to be a problem with PerlApp and eval{}; judging by a quick  
Google search this isn't the only module affected.  The line in  
question is wrapped in an eval{} to check for the availability of  
Bio::SeqIO::staden::read (but not die on it).

BTW, the eval was moved into the relevant plugin modules post-1.5.2,  
so the eval{} is checked when the module is loaded dynamically (i.e.  
when a format requiring it is passed in).  It was causing other issues  
with ActivePerl installations and was redundant, so it was removed.

http://bugzilla.open-bio.org/show_bug.cgi?id=2295

chris

On Aug 3, 2008, at 2:10 PM, Dave Messina wrote:

> Hi Kevin,
>
> The staden module is a oddball one, to be sure.
>
> A search on the BioPerl website turns up this FAQ entry:
> http://www.bioperl.org/wiki/FAQ#bioperl-ext_won.27t_compile_the_staden_IO_lib_part_-_what_do_I_do.3F
>
> Also the Windows install page
> http://www.bioperl.org/wiki/Installing_Bioperl_on_Windows
>
> says:
>
>> Some external programs such as Staden <http://www.bioperl.org/wiki/Staden 
>> > and
>> the EMBOSS <http://www.bioperl.org/wiki/EMBOSS> suite of programs  
>> can only
>> be installed on Windows by using Cygwin <http://www.cygwin.com/>  
>> and its gcc
>> C compiler <http://gcc.gnu.org/> (see Bioperl in Cygwin, below)
>>
>
>
> In any case, the staden module (and associated external libraries)  
> is used
> only if you are trying to read the scf, abi, alf, pln, exp, ctf, or  
> ztr
> binary formats. So your edit shouldn't cause you any problems  
> otherwise.
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From btemperton at googlemail.com  Sat Aug  2 20:05:37 2008
From: btemperton at googlemail.com (Benbo)
Date: Sat, 2 Aug 2008 13:05:37 -0700 (PDT)
Subject: [Bioperl-l]  Finding possible primers regex
Message-ID: <18792782.post@talk.nabble.com>


Hi there, 
I'm trying to write a perl script to scan an aligned multiple entry fasta
file and find possible primers. So far I've produced a string which contains
bases which match all sequences and * where they don't match e.g.
1) TTAGCCTAA
2) TTAGCAGAA
3) TTACCCTAA

would give TTA*C**AA.

I want to parse this string and pull out all sequences which are 18-21 bp in
length and have no more than 4 * in them.

So far, I've got this:

while($fragment_match =~ /([GTAC*]{18,21})/g){
print "$1\n";
}

hoping to match all fragments 18-21 characters in length. However even that
doesn't work as it has essentially chunked it into 21 char blocks, rather
than what I hoped for of
0-18
0-19
0-20
0-21
1-19
1-20
1-21
1-22

etc.

Can anyone let me know if this is already possible in BioPerl, or how one
would go about it with regex. Sadly I'm fairly new to perl and getting to
grips with BioPerl, so please treat me gently :).

Many thanks,

Ben


-- 
View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.


From cjfields at uiuc.edu  Mon Aug  4 04:08:51 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Sun, 3 Aug 2008 23:08:51 -0500
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <18792782.post@talk.nabble.com>
References: <18792782.post@talk.nabble.com>
Message-ID: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>

On Aug 2, 2008, at 3:05 PM, Benbo wrote:

>
> Hi there,
> I'm trying to write a perl script to scan an aligned multiple entry  
> fasta
> file and find possible primers. So far I've produced a string which  
> contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
>
> would give TTA*C**AA.
>
> I want to parse this string and pull out all sequences which are  
> 18-21 bp in
> length and have no more than 4 * in them.
>
> So far, I've got this:
>
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
>
> hoping to match all fragments 18-21 characters in length. However  
> even that
> doesn't work as it has essentially chunked it into 21 char blocks,  
> rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
>
> etc.
>
> Can anyone let me know if this is already possible in BioPerl, or  
> how one
> would go about it with regex. Sadly I'm fairly new to perl and  
> getting to
> grips with BioPerl, so please treat me gently :).
>
> Many thanks,
>
> Ben

There is a trick to this which is discussed more extensively in  
'Mastering Regular Expressions'.  Essentially you have to embed code  
into the regex and trick the parser into backtracking using a negative  
lookahead.  The match itself fails (i.e. no match is returned), but  
the embedded code is executed for each match attempt,

The following script is a slight modification of one I used which  
checks the consensus string from the input alignment (in aligned FASTA  
format here), extracts the alignment slice using that match, then spit  
the alignment out to STDOUT in clustalw format.  This should work for  
perl 5.8 and up, but it's only been tested on perl 5.10.  You should  
be able to use this to fit what you want.

my $in = Bio::AlignIO->new(-file => $file,
                            -format => 'fasta');
my $out = Bio::AlignIO->new(-fh => \*STDOUT,
                            -format => 'clustalw');

while (my $aln = $in->next_aln) {
     my $c = $aln->consensus_string(100);
     my @matches;
     $c =~ m/
         ([GTAC?]{18,21})
         (?{my $match = check_match($1);
            push @matches, [$match,
                            pos(),
                            length($match)]
               if defined $match;})
         (?!)
         /xig;
     for my $match (@matches) {
         my ($hit, $st, $end) = ($match->[0],
                                 $match->[1] - $match->[2] + 1,
                                 $match->[1]);
         my $newaln = $aln->slice($st, $end);
         $out->write_aln($newaln);
     }
}

sub check_match {
     my $match = shift;
     return unless $match;
     my $ct = $match =~ tr/?/?/;
     return $match if $ct <= 4;
}


chris


From heikki at sanbi.ac.za  Mon Aug  4 06:42:57 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Mon, 4 Aug 2008 08:42:57 +0200
Subject: [Bioperl-l] Bio::Coordinate::Pair
In-Reply-To: <C4B4F0DB.498%pchanpa@emory.edu>
References: <C4B4F0DB.498%pchanpa@emory.edu>
Message-ID: <200808040842.57466.heikki@sanbi.ac.za>

Prashanth,

Your example coordinates do not do the conversion but more or less report the 
locations of your features in some third coordinates.

The way to think coordinates pairs is to use them as HSPs. You tell the pair 
object what is the matching segment in the pair of sequences.


The synopsis in Bio::Coordinate::Pair class file gives the following example:


use Bio::Location::Simple;
use Bio::Coordinate::Pair;

my $match1 = Bio::Location::Simple->new
    (-seq_id => 'propeptide', -start => 21, -end => 40, -strand=>1 );
my $match2 = Bio::Location::Simple->new
    (-seq_id => 'peptide', -start => 1, -end => 20, -strand=>1 );
my $pair = Bio::Coordinate::Pair->new(-in => $match1,
				      -out => $match2
    );
# location to match
$pos = Bio::Location::Simple->new
    (-start => 25, -end => 25, -strand=> -1 );

$res = $pair->map($pos);
print $res->match->start; # 5


In other words, region 25-40 in the propeptide matches locations 1-20 in the 
final peptide. Therefore conversion from 25 gives 5:

     signalp        21  25             40
--------------------|---|--------------|
                    1   5  pep         20

I hope this clarifies it.

The advantage of using these objects over manual conversion is that the code 
has been debugged (no all too easy +/-1 errors) and that they can be chained 
together.

Yours,

      -Heikki


On Tuesday 29 July 2008 22:07:55 Prashanth Athri wrote:
> Dear Professor Lehvaslaiho:
>
> I had a quick question about the module- Bio::Coordinate::Pair
>
> The BioPerl tutorial has the following example:
>
> $input_coordinates = Bio::Location::Simple->new
> (-seq_id => 'propeptide', -start => 1000, -end => 2000, -strand=>1 );
>
> $output_coordinates = Bio::Location::Simple->new
> (-seq_id => 'peptide', -start => 1100, -end => 2100, -strand=>1 );
>
> $pair = Bio::Coordinate::Pair->new
> (-in => $input_coordinates , -out => $output_coordinates );
>
> $pos = Bio::Location::Simple->new (-start => 500, -end => 500 );
>
> $res = $pair->map($pos);
> $converted_start = $res->start;
>
> The way I understand it, $converted_start should return ?1600?. But when I
> run this snippet, it returns ?500?. Could you please let me know how
> $pair->map($pos) is processed?
>
> I appreciate your time and thanks in advance.
>
> Regards,
> Prashanth


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From lengjingmao at gmail.com  Tue Aug  5 07:36:23 2008
From: lengjingmao at gmail.com (Shaohua Fan)
Date: Tue, 5 Aug 2008 15:36:23 +0800
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
References: <18792782.post@talk.nabble.com>
Message-ID: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>

Hi, there ,

I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?

thanks a lot. 
yours,
shaohua
----- Original Message ----- 
From: "Benbo" <btemperton at googlemail.com>
To: <Bioperl-l at lists.open-bio.org>
Sent: Sunday, August 03, 2008 4:05 AM
Subject: [Bioperl-l] Finding possible primers regex


> 
> Hi there, 
> I'm trying to write a perl script to scan an aligned multiple entry fasta
> file and find possible primers. So far I've produced a string which contains
> bases which match all sequences and * where they don't match e.g.
> 1) TTAGCCTAA
> 2) TTAGCAGAA
> 3) TTACCCTAA
> 
> would give TTA*C**AA.
> 
> I want to parse this string and pull out all sequences which are 18-21 bp in
> length and have no more than 4 * in them.
> 
> So far, I've got this:
> 
> while($fragment_match =~ /([GTAC*]{18,21})/g){
> print "$1\n";
> }
> 
> hoping to match all fragments 18-21 characters in length. However even that
> doesn't work as it has essentially chunked it into 21 char blocks, rather
> than what I hoped for of
> 0-18
> 0-19
> 0-20
> 0-21
> 1-19
> 1-20
> 1-21
> 1-22
> 
> etc.
> 
> Can anyone let me know if this is already possible in BioPerl, or how one
> would go about it with regex. Sadly I'm fairly new to perl and getting to
> grips with BioPerl, so please treat me gently :).
> 
> Many thanks,
> 
> Ben
> 
> 
> 
> -- 
> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bernd.web at gmail.com  Tue Aug  5 09:49:55 2008
From: bernd.web at gmail.com (Bernd Web)
Date: Tue, 5 Aug 2008 11:49:55 +0200
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
References: <18792782.post@talk.nabble.com>
	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
Message-ID: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>

Hi,

There is a BioPerl Utility script doing this.
See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header.

" scripts/utilities/bp_nrdb.PLS
    Make a non-redundant database based on sequence, not id. Requires
Digest::MD5."

Alternatively, you can make a hash using the sequences as keys.


Regards,
Bernd

On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com> wrote:
> Hi, there ,
>
> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?
>
> thanks a lot.
> yours,
> shaohua
> ----- Original Message -----
> From: "Benbo" <btemperton at googlemail.com>
> To: <Bioperl-l at lists.open-bio.org>
> Sent: Sunday, August 03, 2008 4:05 AM
> Subject: [Bioperl-l] Finding possible primers regex
>
>
>>
>> Hi there,
>> I'm trying to write a perl script to scan an aligned multiple entry fasta
>> file and find possible primers. So far I've produced a string which contains
>> bases which match all sequences and * where they don't match e.g.
>> 1) TTAGCCTAA
>> 2) TTAGCAGAA
>> 3) TTACCCTAA
>>
>> would give TTA*C**AA.
>>
>> I want to parse this string and pull out all sequences which are 18-21 bp in
>> length and have no more than 4 * in them.
>>
>> So far, I've got this:
>>
>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>> print "$1\n";
>> }
>>
>> hoping to match all fragments 18-21 characters in length. However even that
>> doesn't work as it has essentially chunked it into 21 char blocks, rather
>> than what I hoped for of
>> 0-18
>> 0-19
>> 0-20
>> 0-21
>> 1-19
>> 1-20
>> 1-21
>> 1-22
>>
>> etc.
>>
>> Can anyone let me know if this is already possible in BioPerl, or how one
>> would go about it with regex. Sadly I'm fairly new to perl and getting to
>> grips with BioPerl, so please treat me gently :).
>>
>> Many thanks,
>>
>> Ben
>>
>>
>>
>> --
>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


From diriano at uni-potsdam.de  Tue Aug  5 10:28:58 2008
From: diriano at uni-potsdam.de (Diego Mauricio Riano Pachon)
Date: Tue, 05 Aug 2008 12:28:58 +0200
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
Message-ID: <48982B6A.4050304@uni-potsdam.de>

Hi all,

Or you might try a non-bioperl solution that works pretty well, check:

http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86

Best,

Diego

Bernd Web wrote:
> Hi,
> 
> There is a BioPerl Utility script doing this.
> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities header.
> 
> " scripts/utilities/bp_nrdb.PLS
>     Make a non-redundant database based on sequence, not id. Requires
> Digest::MD5."
> 
> Alternatively, you can make a hash using the sequences as keys.
> 
> 
> Regards,
> Bernd
> 
> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com> wrote:
>> Hi, there ,
>>
>> I have a sequence dataset which contains about 200 sequences. there are some identical sequences in this. is there any bioperl modules  which can remove those identical sequences?
>>
>> thanks a lot.
>> yours,
>> shaohua
>> ----- Original Message -----
>> From: "Benbo" <btemperton at googlemail.com>
>> To: <Bioperl-l at lists.open-bio.org>
>> Sent: Sunday, August 03, 2008 4:05 AM
>> Subject: [Bioperl-l] Finding possible primers regex
>>
>>
>>> Hi there,
>>> I'm trying to write a perl script to scan an aligned multiple entry fasta
>>> file and find possible primers. So far I've produced a string which contains
>>> bases which match all sequences and * where they don't match e.g.
>>> 1) TTAGCCTAA
>>> 2) TTAGCAGAA
>>> 3) TTACCCTAA
>>>
>>> would give TTA*C**AA.
>>>
>>> I want to parse this string and pull out all sequences which are 18-21 bp in
>>> length and have no more than 4 * in them.
>>>
>>> So far, I've got this:
>>>
>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>> print "$1\n";
>>> }
>>>
>>> hoping to match all fragments 18-21 characters in length. However even that
>>> doesn't work as it has essentially chunked it into 21 char blocks, rather
>>> than what I hoped for of
>>> 0-18
>>> 0-19
>>> 0-20
>>> 0-21
>>> 1-19
>>> 1-20
>>> 1-21
>>> 1-22
>>>
>>> etc.
>>>
>>> Can anyone let me know if this is already possible in BioPerl, or how one
>>> would go about it with regex. Sadly I'm fairly new to perl and getting to
>>> grips with BioPerl, so please treat me gently :).
>>>
>>> Many thanks,
>>>
>>> Ben
>>>
>>>
>>>
>>> --
>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> 


-- 
___________________________________
Diego Mauricio Ria?o Pach?n
Biologist - PhD student
AG Mueller-Roeber
Institute for Biochemistry and Biology
University of Potsdam

Address: Karl-Liebknecht-Str. 24-25
	 Haus 20
	 14476 Golm
	 Germany

Tel:	 +49 331 977 2809
Fax:	 +49 331 977 2512

web:	http://www.geocities.com/dmrp.geo


From cjfields at uiuc.edu  Tue Aug  5 15:19:54 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Aug 2008 10:19:54 -0500
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <48982B6A.4050304@uni-potsdam.de>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>

Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> 	 Haus 20
> 	 14476 Golm
> 	 Germany
>
> Tel:	 +49 331 977 2809
> Fax:	 +49 331 977 2512
>
> web:	http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at uiuc.edu  Tue Aug  5 15:19:54 2008
From: cjfields at uiuc.edu (Chris Fields)
Date: Tue, 5 Aug 2008 10:19:54 -0500
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
In-Reply-To: <48982B6A.4050304@uni-potsdam.de>
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
Message-ID: <4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>

Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> 	 Haus 20
> 	 14476 Golm
> 	 Germany
>
> Tel:	 +49 331 977 2809
> Fax:	 +49 331 977 2512
>
> web:	http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From lengjingmao at gmail.com  Tue Aug  5 15:24:22 2008
From: lengjingmao at gmail.com (Shaohua Fan)
Date: Tue, 5 Aug 2008 23:24:22 +0800
Subject: [Bioperl-l] how to remove indentical sequences from a dataset
References: <18792782.post@talk.nabble.com>	<79F0046F95254BE9B57DCC387671D908@6B2F7FFC298C46F>
	<716af09c0808050249p723b27c5uc84416663e1474bc@mail.gmail.com>
	<48982B6A.4050304@uni-potsdam.de>
	<4DDBF772-170A-414A-9468-A2607498F3E2@uiuc.edu>
Message-ID: <3A95AD6D18A749F3B73C135CCC8E7C90@6B2F7FFC298C46F>

hi,
thanks a lot for the help!
cheers,
shaohua
----- Original Message ----- 
From: "Chris Fields" <cjfields at uiuc.edu>
To: "Diego Mauricio Riano Pachon" <diriano at uni-potsdam.de>
Cc: "Bernd Web" <bernd.web at gmail.com>; <Bioperl-l at lists.open-bio.org>; "Shaohua Fan" <lengjingmao at gmail.com>
Sent: Tuesday, August 05, 2008 11:19 PM
Subject: Re: [Bioperl-l] how to remove indentical sequences from a dataset


Here are two links which go into detail (the last is a specific  
implementation):

http://en.wikipedia.org/wiki/Sequence_clustering
http://www.bioinformatics.org/cd-hit/

chris

On Aug 5, 2008, at 5:28 AM, Diego Mauricio Riano Pachon wrote:

> Hi all,
>
> Or you might try a non-bioperl solution that works pretty well, check:
>
> http://blast.wustl.edu/pub/nrdb/executables/nrdb.linux-x86
>
> Best,
>
> Diego
>
> Bernd Web wrote:
>> Hi,
>> There is a BioPerl Utility script doing this.
>> See http://www.bioperl.org/wiki/Bioperl_scripts under the Utilities  
>> header.
>> " scripts/utilities/bp_nrdb.PLS
>>    Make a non-redundant database based on sequence, not id. Requires
>> Digest::MD5."
>> Alternatively, you can make a hash using the sequences as keys.
>> Regards,
>> Bernd
>> On Tue, Aug 5, 2008 at 9:36 AM, Shaohua Fan <lengjingmao at gmail.com>  
>> wrote:
>>> Hi, there ,
>>>
>>> I have a sequence dataset which contains about 200 sequences.  
>>> there are some identical sequences in this. is there any bioperl  
>>> modules  which can remove those identical sequences?
>>>
>>> thanks a lot.
>>> yours,
>>> shaohua
>>> ----- Original Message -----
>>> From: "Benbo" <btemperton at googlemail.com>
>>> To: <Bioperl-l at lists.open-bio.org>
>>> Sent: Sunday, August 03, 2008 4:05 AM
>>> Subject: [Bioperl-l] Finding possible primers regex
>>>
>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple  
>>>> entry fasta
>>>> file and find possible primers. So far I've produced a string  
>>>> which contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are  
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However  
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char  
>>>> blocks, rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or  
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context: http://www.nabble.com/Finding-possible-primers-regex-tp18792782p18792782.html
>>>> Sent from the Perl - Bioperl-L mailing list archive at Nabble.com.
>>>>
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> -- 
> ___________________________________
> Diego Mauricio Ria?o Pach?n
> Biologist - PhD student
> AG Mueller-Roeber
> Institute for Biochemistry and Biology
> University of Potsdam
>
> Address: Karl-Liebknecht-Str. 24-25
> Haus 20
> 14476 Golm
> Germany
>
> Tel: +49 331 977 2809
> Fax: +49 331 977 2512
>
> web: http://www.geocities.com/dmrp.geo
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From martin.senger at gmail.com  Wed Aug  6 02:53:07 2008
From: martin.senger at gmail.com (Martin Senger)
Date: Wed, 6 Aug 2008 10:53:07 +0800
Subject: [Bioperl-l] Bio::Biblio doesn't find articles
Message-ID: <4d93f07c0808051953k4cb7511cg5ec4cd93f53cfd0f@mail.gmail.com>

I am afraid that the server that serves the MEDLINE database to the
Bio::Biblio module (using the SOAP protocol), and that is running at EBI,
may be not fully supported. I am not working at EBI anymore and I have
stopped to monitor their servers. I am still their collaborator - but I am
not, unfortunately, involved in the MEDLINE tools anymore.

I would be happy to continue to maintain the Bio::Biblio module but it
relies on a server that I do not anymore control.

Cheers,
Martin

-- 
Martin Senger
email: martin.senger at gmail.com,m.senger at cgiar.org
skype: martinsenger


From Russell.Smithies at agresearch.co.nz  Wed Aug  6 21:20:04 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Thu, 7 Aug 2008 09:20:04 +1200
Subject: [Bioperl-l] not BioPerl
Message-ID: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>

Has anyone taken a look at the new Perl interface to the NCBI C++
Toolkit?
Unfortunately, I can't even get their examples working as I'm behind a
firewall and documentation on setting proxy stuff is virtually
non-existant  :-(


Russell Smithies 

Bioinformatics Applications Developer 
T +64 3 489 9085 
E  russell.smithies at agresearch.co.nz 

Invermay  Research Centre 
Puddle Alley, 
Mosgiel, 
New Zealand 
T  +64 3 489 3809   
F  +64 3 489 9174  
www.agresearch.co.nz 


=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From cjfields at illinois.edu  Wed Aug  6 21:33:27 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 6 Aug 2008 16:33:27 -0500
Subject: [Bioperl-l] not BioPerl
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
Message-ID: <E4A73CE7-9288-4245-99D5-90F488698D6C@illinois.edu>

Looks like they're binary releases for 32- and 64-bit linux (quite  
large, at 25 MB).  Would be nice to have the C++ bindings for other  
OS's (my guess is this was set up via swig).

I have access to a linux cluster, so I may give this a try soon.

chris

On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:

> Has anyone taken a look at the new Perl interface to the NCBI C++
> Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>
>
> Russell Smithies
>
> Bioinformatics Applications Developer
> T +64 3 489 9085
> E  russell.smithies at agresearch.co.nz
>
> Invermay  Research Centre
> Puddle Alley,
> Mosgiel,
> New Zealand
> T  +64 3 489 3809
> F  +64 3 489 9174
> www.agresearch.co.nz
>
>
>
>
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From vinaykmittal at gatech.edu  Wed Aug  6 20:56:22 2008
From: vinaykmittal at gatech.edu (Mittal, Vinay K)
Date: Wed, 6 Aug 2008 16:56:22 -0400 (EDT)
Subject: [Bioperl-l] Error installing Biopel Module
Message-ID: <469631287.3995201218056182383.JavaMail.root@mail5.gatech.edu>

Hi,

I just installed Active perl 5.10.0 and was trying to install Bioperl Modules. While installing Bioperl through package manager(ppm), I am getting following errors:

ppm install failed: Can't find any package that provides SOAP::Lite for Bundle-BioPerl-Core
Can't find any package that provides Convert::Binary::C for Bundle-BioPerl-Core

I don't know what the problem is. I have never used Bioperl Modules before.

Thanks.


-- 
--------
Vinay Kumar Mittal
MS,Bioinformatics
Georgia Institute of Technology


From rfrancis at ichr.uwa.edu.au  Thu Aug  7 01:11:28 2008
From: rfrancis at ichr.uwa.edu.au (Richard Francis)
Date: Thu, 07 Aug 2008 09:11:28 +0800
Subject: [Bioperl-l] AlignIO::clustalw match_line query
Message-ID: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>

Dear List,

I wonder if you can help.

I?m having trouble finding out on which criteria the conserved and
semi-conserved substitution decisions for a match line produced from the
match_line function in AlignIO are based.

I note that match_line produces the same output as an alignment match
line would from ClustalW and indeed is used in the AlignIO::clustalw
module, but are the substitution decisions based on the same Venn
diagram at http://www.ebi.ac.uk/Tools/clustalw2/clustalw_help.html#color
ie are they faithful to the generation of the match line from within
ClustalW itself?

I need to know this as part of a paper I?m writing so I would really
appreciate your help with this.

Kind regards and thanks in advance,

Richard Francis
#####################################################################################
This e-mail message has been scanned for Viruses and Content and cleared 
by MailMarshal
#####################################################################################


From jason at bioperl.org  Thu Aug  7 02:26:06 2008
From: jason at bioperl.org (Jason Stajich)
Date: Wed, 6 Aug 2008 19:26:06 -0700
Subject: [Bioperl-l] AlignIO::clustalw match_line query
In-Reply-To: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>
References: <1218071488.3074.2.camel@acs-pc-a0966.ichr.uwa.edu.au>
Message-ID: <A7F598FF-C34F-466E-A73B-D714872201C8@bioperl.org>

Implemented independently, but it was based on what the clustalw  
documentation says.
The main code is in the match_line function in Bio::SimpleAlign.

See the CONSERVATION_GROUPS Hash which looks like this: So a  
'strong' (":") on the match line would be coded where the residues  
seen in a column are only 'S', 'T', or 'A' (for example).

            'strong' => [ qw(
				STA
				NEQK
				NHQK
				NDEQ
				QHRK
				MILV
				MILF
				HY
				FYW )],
             'weak' => [ qw(
				CSA
				ATV
				SAG
				STNK
				STPA
				SGND
				SNDEQK
				NDEQHK
				NEQHRK
				FVLIM
				HFY )],);
}

It was checked against clustalw output by hand when it was  
implemented. If you know of any inconsistencies, let use know.

-jason
On Aug 6, 2008, at 6:11 PM, Richard Francis wrote:

> Dear List,
>
> I wonder if you can help.
>
> I?m having trouble finding out on which criteria the conserved and
> semi-conserved substitution decisions for a match line produced  
> from the
> match_line function in AlignIO are based.
>
> I note that match_line produces the same output as an alignment match
> line would from ClustalW and indeed is used in the AlignIO::clustalw
> module, but are the substitution decisions based on the same Venn
> diagram at http://www.ebi.ac.uk/Tools/clustalw2/ 
> clustalw_help.html#color
> ie are they faithful to the generation of the match line from within
> ClustalW itself?
>
> I need to know this as part of a paper I?m writing so I would really
> appreciate your help with this.
>
> Kind regards and thanks in advance,
>
> Richard Francis
> ###################################################################### 
> ###############
> This e-mail message has been scanned for Viruses and Content and  
> cleared
> by MailMarshal
> ###################################################################### 
> ###############
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From betts at embl.de  Thu Aug  7 12:42:59 2008
From: betts at embl.de (Matthew Betts)
Date: Thu, 7 Aug 2008 14:42:59 +0200 (CEST)
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
Message-ID: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>


Hi,

Has any one tried to draw secondary structure with Bio::Graphics? i.e. two 
different types of glyph with different colours on the same track.

Could use a hash reference to get the different glyph types (would be nice 
if there was a cylinder glyph and a thick arrow glyph), or heterogeneous 
segments to get the different colours, but I can't see how to do both at 
the same time.

Any example code or suggestions on how I could implement it would be 
great.

Thanks,

Matthew

-- 
Matthew Betts PhD, Russell Group (Structural Bioinformatics)
EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
phone: +49 (0)6221 387 8305; mailto:betts at embl.de


From cain.cshl at gmail.com  Thu Aug  7 14:08:39 2008
From: cain.cshl at gmail.com (Scott Cain)
Date: Thu, 7 Aug 2008 10:08:39 -0400
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
In-Reply-To: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
References: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
Message-ID: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>

Hi Matthew,

I don't have any code examples, but people have used GBrowse for
protein secondary structure, which uses Bio::Graphics underneath the
hood.

If you want to put more than one glyph and/or more than one color in a
track, it is fairly easy.  You just need to provide a callback for
each option when you create the track, like this:

  $panel->add_track($features_array_ref,
                    -glyph       => sub { #code to set the glyph
according the attributes of the feature  },
                    -bgcolor     => sub { #code to set the color },
                    -fgcolor     => 'black',
                 ...etc...
  );

For more information, see the biographics howto:

  http://www.bioperl.org/wiki/HOWTO:Graphics

Scott


On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts <betts at embl.de> wrote:
>
> Hi,
>
> Has any one tried to draw secondary structure with Bio::Graphics? i.e. two
> different types of glyph with different colours on the same track.
>
> Could use a hash reference to get the different glyph types (would be nice
> if there was a cylinder glyph and a thick arrow glyph), or heterogeneous
> segments to get the different colours, but I can't see how to do both at
> the same time.
>
> Any example code or suggestions on how I could implement it would be
> great.
>
> Thanks,
>
> Matthew
>
> --
> Matthew Betts PhD, Russell Group (Structural Bioinformatics)
> EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
> phone: +49 (0)6221 387 8305; mailto:betts at embl.de
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory


From betts at embl.de  Thu Aug  7 16:27:28 2008
From: betts at embl.de (Matthew Betts)
Date: Thu, 7 Aug 2008 18:27:28 +0200 (CEST)
Subject: [Bioperl-l] Bio:Graphics for drawing secondary structure
In-Reply-To: <536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>
References: <Pine.LNX.4.64.0808071437550.3061@pc-russell13>
	<536f21b00808070708q6180d4fft279078f2a28ac93d@mail.gmail.com>
Message-ID: <Pine.LNX.4.64.0808071823310.8169@pc-russell13>


Hi Scott,

Thanks for that, was a great help - I didn't realise I could use a code 
ref for anything other than the glyph name.

I'm now doing this:

$panel->add_track(
                  '-bgcolor' => sub {
				     my($feature) = @_;
                                     $feature->display_name eq 'strand' ? 'cyan' : 'magenta';
                                    },
                  '-strand_arrow' => sub {
                                          my($feature) = @_;
                                          $feature->display_name eq 'strand' ? 1 : 0;
                                         },
);

Matthew


On Thu, 7 Aug 2008, Scott Cain wrote:

> Hi Matthew,
> 
> I don't have any code examples, but people have used GBrowse for
> protein secondary structure, which uses Bio::Graphics underneath the
> hood.
> 
> If you want to put more than one glyph and/or more than one color in a
> track, it is fairly easy.  You just need to provide a callback for
> each option when you create the track, like this:
> 
>   $panel->add_track($features_array_ref,
>                     -glyph       => sub { #code to set the glyph
> according the attributes of the feature  },
>                     -bgcolor     => sub { #code to set the color },
>                     -fgcolor     => 'black',
>                  ...etc...
>   );
> 
> For more information, see the biographics howto:
> 
>   http://www.bioperl.org/wiki/HOWTO:Graphics
> 
> Scott
> 
> 
> 
> On Thu, Aug 7, 2008 at 8:42 AM, Matthew Betts <betts at embl.de> wrote:
> >
> > Hi,
> >
> > Has any one tried to draw secondary structure with Bio::Graphics? i.e. two
> > different types of glyph with different colours on the same track.
> >
> > Could use a hash reference to get the different glyph types (would be nice
> > if there was a cylinder glyph and a thick arrow glyph), or heterogeneous
> > segments to get the different colours, but I can't see how to do both at
> > the same time.
> >
> > Any example code or suggestions on how I could implement it would be
> > great.
> >
> > Thanks,
> >
> > Matthew
> >
> > --
> > Matthew Betts PhD, Russell Group (Structural Bioinformatics)
> > EMBL, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
> > phone: +49 (0)6221 387 8305; mailto:betts at embl.de
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> 


From jay at jays.net  Thu Aug  7 16:32:29 2008
From: jay at jays.net (Jay Hannah)
Date: Thu, 07 Aug 2008 11:32:29 -0500
Subject: [Bioperl-l] not BioPerl
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
References: <D5DBA313349A4B458528BE63B387F36C0752343F@imail.agresearch.co.nz>
Message-ID: <489B239D.8060305@jays.net>

Smithies, Russell wrote:
> Has anyone taken a look at the new Perl interface to the NCBI C++ Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>   

Do people actually use the NCBI C++ Toolkit for things outside of NCBI?  
What?

I tried to leverage it a year or so ago for an Entrez/sequence/search 
project and got nowhere.

j


From jcherry at ncbi.nlm.nih.gov  Thu Aug  7 17:06:28 2008
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 7 Aug 2008 13:06:28 -0400 (EDT)
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
Message-ID: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>

For those who may be wondering what this is about, a Perl interface to the 
NCBI C++ Toolkit is available at 
ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/.  The C++ Toolkit is 
the main code base that we develop and use at NCBI.  It includes many 
things that may be of interest to BioPerl users, such as sequence analysis 
algorithms, means for interacting with NCBI databases, and facilities for 
reading, writing, and manipulating NCBI data model objects (usually 
defined by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and 
readable from ASN.1 and XML).

Russell, I think you can make things work from behind a firewall by 
setting some environment variables: set CONN_FIREWALL to 1, possibly set 
CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT 
as appropriate.  Please email me if you can't get things to work.  I'll 
see that decent instructions for this are included in the next release.

Josh Cherry


On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:

> Has anyone taken a look at the new Perl interface to the NCBI C++
> Toolkit?
> Unfortunately, I can't even get their examples working as I'm behind a
> firewall and documentation on setting proxy stuff is virtually
> non-existant  :-(
>
>
> Russell Smithies


From tristan.lefebure at gmail.com  Thu Aug  7 17:35:24 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Thu, 7 Aug 2008 13:35:24 -0400
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
Message-ID: <200808071335.24668.tristan.lefebure@gmail.com>

Hi list,

I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
BioPerl (with the only difference that I'm using taxids instead of taxon 
names). Basically, the script generates a taxonomic tree given a list of 
taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
object, and then merge this object to a tree object that keeps growing. It 
runs very well with a small number of taxa, but with many taxa (>1000), it is 
very very very slow (about a week for 3000 taxa).

The slowness is due to the  function merge_lineage (line 65), which merges the 
existing tree object with a new taxon object. I guess that the difficulty 
with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
between the tree and the new taxon object...

Would you have any idea of how to get around the problem? Should I look under 
the hood of merge_lineage to try to improve it for large trees?

Thanks!

Version: bioperl-1.5.2_102
OS: GNU/Linux

-Tristan


From cjfields at illinois.edu  Thu Aug  7 17:38:53 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Aug 2008 12:38:53 -0500
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
In-Reply-To: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
References: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
Message-ID: <E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>

Josh,

Thanks for the update.  I saw that these are only binaries for linux  
32/64-bit.  Are there plans to either support other OS's (OS X, Win,  
etc) or to maybe make a release with the XS-bindings so users can work  
towards that?  With additional support I can see this easily fitting  
into several spots in BioPerl, but otherwise I'm unsure.

chris

On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote:

> For those who may be wondering what this is about, a Perl interface  
> to the NCBI C++ Toolkit is available at ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/ 
> .  The C++ Toolkit is the main code base that we develop and use at  
> NCBI.  It includes many things that may be of interest to BioPerl  
> users, such as sequence analysis algorithms, means for interacting  
> with NCBI databases, and facilities for reading, writing, and  
> manipulating NCBI data model objects (usually defined by ASN.1  
> specifications; writeable as ASN.1, XML, and JSON, and readable from  
> ASN.1 and XML).
>
> Russell, I think you can make things work from behind a firewall by  
> setting some environment variables: set CONN_FIREWALL to 1, possibly  
> set CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and  
> CONN_HTTP_PROXY_PORT as appropriate.  Please email me if you can't  
> get things to work.  I'll see that decent instructions for this are  
> included in the next release.
>
> Josh Cherry
>
>
> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:
>
>> Has anyone taken a look at the new Perl interface to the NCBI C++
>> Toolkit?
>> Unfortunately, I can't even get their examples working as I'm  
>> behind a
>> firewall and documentation on setting proxy stuff is virtually
>> non-existant  :-(
>>
>>
>> Russell Smithies
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From jcherry at ncbi.nlm.nih.gov  Thu Aug  7 18:04:17 2008
From: jcherry at ncbi.nlm.nih.gov (Josh Cherry)
Date: Thu, 7 Aug 2008 14:04:17 -0400 (EDT)
Subject: [Bioperl-l] NCBI C++ Toolkit wrapper (was: not BioPerl)
In-Reply-To: <E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>
References: <Pine.LNX.4.64.0808071251240.13868@widget0.be-md.ncbi.nlm.nih.gov>
	<E0AF0EC1-DA53-4A08-9969-A3E1F02B53F2@illinois.edu>
Message-ID: <Pine.LNX.4.64.0808071359430.13868@widget0.be-md.ncbi.nlm.nih.gov>

Chris,

Support for other OS's is definitely a possibility, depending on community 
feedback (how useful are the wrappers in general, and how much demand is 
there for them on other platforms?).  I wish I could magically make them 
available for Windows and OS X, but there are some technical issues to 
work out.

Josh


On Thu, 7 Aug 2008, Chris Fields wrote:

> Josh,
>
> Thanks for the update.  I saw that these are only binaries for linux 
> 32/64-bit.  Are there plans to either support other OS's (OS X, Win, etc) or 
> to maybe make a release with the XS-bindings so users can work towards that? 
> With additional support I can see this easily fitting into several spots in 
> BioPerl, but otherwise I'm unsure.
>
> chris
>
> On Aug 7, 2008, at 12:06 PM, Josh Cherry wrote:
>
>> For those who may be wondering what this is about, a Perl interface to the 
>> NCBI C++ Toolkit is available at 
>> ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools++/CURRENT/.  The C++ Toolkit is 
>> the main code base that we develop and use at NCBI.  It includes many 
>> things that may be of interest to BioPerl users, such as sequence analysis 
>> algorithms, means for interacting with NCBI databases, and facilities for 
>> reading, writing, and manipulating NCBI data model objects (usually defined 
>> by ASN.1 specifications; writeable as ASN.1, XML, and JSON, and readable 
>> from ASN.1 and XML).
>> 
>> Russell, I think you can make things work from behind a firewall by setting 
>> some environment variables: set CONN_FIREWALL to 1, possibly set 
>> CONN_STATELESS to 1, and set CONN_HTTP_PROXY_HOST and CONN_HTTP_PROXY_PORT 
>> as appropriate.  Please email me if you can't get things to work.  I'll see 
>> that decent instructions for this are included in the next release.
>> 
>> Josh Cherry
>> 
>> 
>> On Aug 6, 2008, at 4:20 PM, Smithies, Russell wrote:
>> 
>>> Has anyone taken a look at the new Perl interface to the NCBI C++
>>> Toolkit?
>>> Unfortunately, I can't even get their examples working as I'm behind a
>>> firewall and documentation on setting proxy stuff is virtually
>>> non-existant  :-(
>>> 
>>> 
>>> Russell Smithies
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>


From bix at sendu.me.uk  Thu Aug  7 22:20:29 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Thu, 07 Aug 2008 23:20:29 +0100
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
 large trees
In-Reply-To: <200808071335.24668.tristan.lefebure@gmail.com>
References: <200808071335.24668.tristan.lefebure@gmail.com>
Message-ID: <489B752D.2080209@sendu.me.uk>

Tristan Lefebure wrote:
> I'm using a script very similar to bp_taxonomy2tree.pl distributed with 
> BioPerl (with the only difference that I'm using taxids instead of taxon 
> names). Basically, the script generates a taxonomic tree given a list of 
> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon 
> object, and then merge this object to a tree object that keeps growing. It 
> runs very well with a small number of taxa, but with many taxa (>1000), it is 
> very very very slow (about a week for 3000 taxa).
> 
> The slowness is due to the  function merge_lineage (line 65), which merges the 
> existing tree object with a new taxon object. I guess that the difficulty 
> with a big tree (i.e. more than 1000 leaf) is to find the nodes in common 
> between the tree and the new taxon object...
> 
> Would you have any idea of how to get around the problem? Should I look under 
> the hood of merge_lineage to try to improve it for large trees?

Yes, please do. It might have been me that wrote that, in which case I 
didn't do anything fancy or consider the above problem.


From cjfields at illinois.edu  Fri Aug  8 00:42:16 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 7 Aug 2008 19:42:16 -0500
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
In-Reply-To: <489B752D.2080209@sendu.me.uk>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
Message-ID: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>


On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:

> Tristan Lefebure wrote:
>> I'm using a script very similar to bp_taxonomy2tree.pl distributed  
>> with BioPerl (with the only difference that I'm using taxids  
>> instead of taxon names). Basically, the script generates a  
>> taxonomic tree given a list of taxids using the NCBI taxonomy db.  
>> For each taxon, it generates a taxon object, and then merge this  
>> object to a tree object that keeps growing. It runs very well with  
>> a small number of taxa, but with many taxa (>1000), it is very very  
>> very slow (about a week for 3000 taxa).
>> The slowness is due to the  function merge_lineage (line 65), which  
>> merges the existing tree object with a new taxon object. I guess  
>> that the difficulty with a big tree (i.e. more than 1000 leaf) is  
>> to find the nodes in common between the tree and the new taxon  
>> object...
>> Would you have any idea of how to get around the problem? Should I  
>> look under the hood of merge_lineage to try to improve it for large  
>> trees?
>
> Yes, please do. It might have been me that wrote that, in which case  
> I didn't do anything fancy or consider the above problem.

Actually I thought that was fixed; wasn't some caching added for that  
script at one point?

chris


From bix at sendu.me.uk  Fri Aug  8 07:50:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Fri, 08 Aug 2008 08:50:50 +0100
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
 large trees
In-Reply-To: <7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
	<7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
Message-ID: <489BFADA.1060308@sendu.me.uk>

Chris Fields wrote:
> 
> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
> 
>> Tristan Lefebure wrote:
>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed 
>>> with BioPerl (with the only difference that I'm using taxids instead 
>>> of taxon names). Basically, the script generates a taxonomic tree 
>>> given a list of taxids using the NCBI taxonomy db. For each taxon, it 
>>> generates a taxon object, and then merge this object to a tree object 
>>> that keeps growing. It runs very well with a small number of taxa, 
>>> but with many taxa (>1000), it is very very very slow (about a week 
>>> for 3000 taxa).
>>> The slowness is due to the  function merge_lineage (line 65), which 
>>> merges the existing tree object with a new taxon object. I guess that 
>>> the difficulty with a big tree (i.e. more than 1000 leaf) is to find 
>>> the nodes in common between the tree and the new taxon object...
>>> Would you have any idea of how to get around the problem? Should I 
>>> look under the hood of merge_lineage to try to improve it for large 
>>> trees?
>>
>> Yes, please do. It might have been me that wrote that, in which case I 
>> didn't do anything fancy or consider the above problem.
> 
> Actually I thought that was fixed;

Oh yeah. Looks like I did something related to 'speedup for 
merge_lineage()' on the 18th Dec 2006. Tristan, checkout 
Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.


From tristan.lefebure at gmail.com  Fri Aug  8 16:02:32 2008
From: tristan.lefebure at gmail.com (Tristan Lefebure)
Date: Fri, 8 Aug 2008 12:02:32 -0400
Subject: [Bioperl-l] (TreeFunctionsI) merge_lineage method very slow on
	large trees
In-Reply-To: <489BFADA.1060308@sendu.me.uk>
References: <200808071335.24668.tristan.lefebure@gmail.com>
	<489B752D.2080209@sendu.me.uk>
	<7A185A45-A886-4DD0-8BF0-E7CDC6B65F6B@illinois.edu>
	<489BFADA.1060308@sendu.me.uk>
Message-ID: <d31f7c40808080902o70fbf78dub50c9ba660355ba0@mail.gmail.com>

Yes indeed, with the svn code it took 10 minutes (compared to one week!)
Thanks, -Tristan

On Fri, Aug 8, 2008 at 3:50 AM, Sendu Bala <bix at sendu.me.uk> wrote:

> Chris Fields wrote:
>
>>
>> On Aug 7, 2008, at 5:20 PM, Sendu Bala wrote:
>>
>>  Tristan Lefebure wrote:
>>>
>>>> I'm using a script very similar to bp_taxonomy2tree.pl distributed with
>>>> BioPerl (with the only difference that I'm using taxids instead of taxon
>>>> names). Basically, the script generates a taxonomic tree given a list of
>>>> taxids using the NCBI taxonomy db. For each taxon, it generates a taxon
>>>> object, and then merge this object to a tree object that keeps growing. It
>>>> runs very well with a small number of taxa, but with many taxa (>1000), it
>>>> is very very very slow (about a week for 3000 taxa).
>>>> The slowness is due to the  function merge_lineage (line 65), which
>>>> merges the existing tree object with a new taxon object. I guess that the
>>>> difficulty with a big tree (i.e. more than 1000 leaf) is to find the nodes
>>>> in common between the tree and the new taxon object...
>>>> Would you have any idea of how to get around the problem? Should I look
>>>> under the hood of merge_lineage to try to improve it for large trees?
>>>>
>>>
>>> Yes, please do. It might have been me that wrote that, in which case I
>>> didn't do anything fancy or consider the above problem.
>>>
>>
>> Actually I thought that was fixed;
>>
>
> Oh yeah. Looks like I did something related to 'speedup for
> merge_lineage()' on the 18th Dec 2006. Tristan, checkout
> Bio/Tree/TreeFunctionsI.pm from svn and see if that solves your problem.
>


From rvos at interchange.ubc.ca  Fri Aug  8 23:59:20 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Fri, 8 Aug 2008 16:59:20 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
Message-ID: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>

Hi,

while going through a large genbank file
(ftp://ftp.ncbi.nlm.nih.gov/genbank/gbpri21.seq.gz) I ran into malloc
errors. Just for the record (I doubt this does anyone any good), I
got:

perl(391) malloc: *** vm_allocate(size=8421376) failed (error code=3)
perl(391) malloc: *** error: can't allocate region
perl(391) malloc: *** set a breakpoint in szone_error to debug
Out of memory!

What I was trying to do is go through the file, and only write out
those seq objects that aren't human, and that have CDS features, i.e.:

################################################

#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;

my $dir = shift @ARGV; # the directory with *.gz files
my $out = shift @ARGV; # the directory to write to...
mkdir $out if not -d $out; # ...which may need to be created

opendir my $dirhandle, $dir or die $!;
for my $archive ( readdir $dirhandle ) {
    next if $archive !~ /\.gz$/;
    my $file = $archive;
    $file =~ s/\.gz$//;

    # external call to the gunzip utility,
    # such that we keep the archive
    system( "gunzip -c \"${dir}/${archive}\" > \"${dir}/${file}\"" );

    # object that parses genbank files,
    # returns Bio::Seq objects
    my $reader = Bio::SeqIO->new(
        '-format' => 'genbank',
        '-file'   => "${dir}/${file}"
    );

    # object that receives Bio::Seq objects,
    # writes genbank files
    my $writer = Bio::SeqIO->new(
        '-format' => 'genbank',
        '-file'   => ">${out}/${file}",
    );
    while ( my $seq = $reader->next_seq ) {
        my $name = $seq->species->binomial;
        if ( $name ne 'Homo sapiens' ) {

            # search for coding sequences among the features
            my $HasCDS = 0;
            FEATURE: for my $f ( $seq->get_SeqFeatures ) {
                if ( $f->primary_tag eq 'CDS' ) {
                    $HasCDS++;
                    last FEATURE;
                }
            }

            # write the sequence to file
            if ( $HasCDS ) {
                $writer->write_seq( $seq );
            }
        }
    }

    # delete the extracted, unfiltered file
    unlink "${dir}/${file}";
}

################################################

Okay, so it runs out of memory. Can I do something to fix that? Should
I flush on either of the I/O objects after each $seq? Could there be
memory leaks in the Bio::Seq objects? Should I $seq->DESTROY them
explicitly or something like that?

Thanks,

Rutger

-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From David.Messina at sbc.su.se  Sat Aug  9 11:04:04 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 9 Aug 2008 13:04:04 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
Message-ID: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>

Hi Rutger,
I ran your script on the same genbank file and, while I did not run out of
memory, I did see what appears to be a memory leak. Even when I manually
undef'd the reader and writer object every 1000 records, memory usage
continued to grow.

I can't quite figure out what's going on, though.

If I run a different program using SeqIO (the simple sequence converter from
the SeqIO HOWTO) on the same input file, I don't see this same runaway
growth.

Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the
sequence converter holds steady at about 12MB of real memory, whereas on
5.10 it grows, albeit slowly, for as long as the program is executing. When
I killed it about 20% of the way through the file, it was up to  about 44MB
of real memory.

Anyone else have a chance to look at this?


Dave


From rvos at interchange.ubc.ca  Sat Aug  9 11:36:20 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Sat, 9 Aug 2008 04:36:20 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
Message-ID: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>

Hi Dave,

thanks for the reply. The memory usage is in fact much more atrocious
than just 44 mb: I'm actually looping over all 36 such archives (the
genbank primates), and on my macbook it steadily increase to over 1gb
of memory. What seemed to be helping somewhat is to call
$reader->flush and $writer->flush after each seq, at least to the
extent that I make it through that one file, but last time I tried I
didn't get much further: the whole terminal process died shortly after
instead. I seem to vaguely recall that even if perl free()'s memory,
that doesn't necessarily mean that the memory is returned to the OS
for the runtime of the program - depending on the OS and perl version.
What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.

Rutger

On Sat, Aug 9, 2008 at 4:04 AM, Dave Messina <David.Messina at sbc.su.se> wrote:
> Hi Rutger,
> I ran your script on the same genbank file and, while I did not run out of
> memory, I did see what appears to be a memory leak. Even when I manually
> undef'd the reader and writer object every 1000 records, memory usage
> continued to grow.
>
> I can't quite figure out what's going on, though.
> If I run a different program using SeqIO (the simple sequence converter from
> the SeqIO HOWTO) on the same input file, I don't see this same runaway
> growth.
> Also, the problem seems a lot worse on perl 5.10 than on 5.8.8; on 5.8.8 the
> sequence converter holds steady at about 12MB of real memory, whereas on
> 5.10 it grows, albeit slowly, for as long as the program is executing. When
> I killed it about 20% of the way through the file, it was up to  about 44MB
> of real memory.
> Anyone else have a chance to look at this?
>
> Dave
>


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From David.Messina at sbc.su.se  Sat Aug  9 12:58:56 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sat, 9 Aug 2008 14:58:56 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
Message-ID: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>

>
>  I seem to vaguely recall that even if perl free()'s memory that doesn't
> necessarily mean that the memory is returned to the OS for the runtime of
> the program


I believe that's correct.


> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>

perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.


Dave


From cjfields at illinois.edu  Sat Aug  9 13:56:19 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Aug 2008 08:56:19 -0500
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
Message-ID: <57147D88-ABE6-44E0-8D76-790B0C735438@illinois.edu>

There is definitely a memory leak.  I can confirm it on OSX 10.4/10.5  
using bioperl-live.

I'll try looking into it this weekend, but I can't promise when it'll  
be fixed; my laptop is on the fritz.

chris

On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:

>>
>> I seem to vaguely recall that even if perl free()'s memory that  
>> doesn't
>> necessarily mean that the memory is returned to the OS for the  
>> runtime of
>> the program
>
>
> I believe that's correct.
>
>
>
>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>
>
> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>
>
> Dave


From cjfields at illinois.edu  Sat Aug  9 14:15:23 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Sat, 9 Aug 2008 09:15:23 -0500
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
Message-ID: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>

Forgot to mention, maybe we can file this as a bug?  It's a pretty  
serious one but it should be easy to narrow down; the change had to be  
introduced fairly recently.

chris

On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:

>>
>> I seem to vaguely recall that even if perl free()'s memory that  
>> doesn't
>> necessarily mean that the memory is returned to the OS for the  
>> runtime of
>> the program
>
>
> I believe that's correct.
>
>
>
>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>
>
> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>
>
> Dave
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Sat Aug  9 16:00:46 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 9 Aug 2008 12:00:46 -0400
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
	<9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
Message-ID: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>

This smells of circular references somewhere. I think the first point  
I would go looking is the species storing - does the problem go away  
if you turn that off? Maybe the version of weaken() is at play here?

	-hilmar

On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:

> Forgot to mention, maybe we can file this as a bug?  It's a pretty  
> serious one but it should be easy to narrow down; the change had to  
> be introduced fairly recently.
>
> chris
>
> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>
>>>
>>> I seem to vaguely recall that even if perl free()'s memory that  
>>> doesn't
>>> necessarily mean that the memory is returned to the OS for the  
>>> runtime of
>>> the program
>>
>>
>> I believe that's correct.
>>
>>
>>
>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>
>>
>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>
>>
>> Dave
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From hlapp at gmx.net  Sat Aug  9 16:07:30 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Sat, 9 Aug 2008 12:07:30 -0400
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>
References: <18792782.post@talk.nabble.com>
	<33A8975C-2A88-4697-8298-7D16CB03CEAE@uiuc.edu>
Message-ID: <591AE8EB-4D45-4859-A93E-EA9BF01CA9C6@gmx.net>

This looks like a neat trick. Do you think it's worth including as a  
SimpleAlign method (obviously w/o the printing to STDOUT)? I can  
imagine that a lot of people might appreciate it.

	-hilmar

On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:

> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>
>>
>> Hi there,
>> I'm trying to write a perl script to scan an aligned multiple entry  
>> fasta
>> file and find possible primers. So far I've produced a string which  
>> contains
>> bases which match all sequences and * where they don't match e.g.
>> 1) TTAGCCTAA
>> 2) TTAGCAGAA
>> 3) TTACCCTAA
>>
>> would give TTA*C**AA.
>>
>> I want to parse this string and pull out all sequences which are  
>> 18-21 bp in
>> length and have no more than 4 * in them.
>>
>> So far, I've got this:
>>
>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>> print "$1\n";
>> }
>>
>> hoping to match all fragments 18-21 characters in length. However  
>> even that
>> doesn't work as it has essentially chunked it into 21 char blocks,  
>> rather
>> than what I hoped for of
>> 0-18
>> 0-19
>> 0-20
>> 0-21
>> 1-19
>> 1-20
>> 1-21
>> 1-22
>>
>> etc.
>>
>> Can anyone let me know if this is already possible in BioPerl, or  
>> how one
>> would go about it with regex. Sadly I'm fairly new to perl and  
>> getting to
>> grips with BioPerl, so please treat me gently :).
>>
>> Many thanks,
>>
>> Ben
>
> There is a trick to this which is discussed more extensively in  
> 'Mastering Regular Expressions'.  Essentially you have to embed code  
> into the regex and trick the parser into backtracking using a  
> negative lookahead.  The match itself fails (i.e. no match is  
> returned), but the embedded code is executed for each match attempt,
>
> The following script is a slight modification of one I used which  
> checks the consensus string from the input alignment (in aligned  
> FASTA format here), extracts the alignment slice using that match,  
> then spit the alignment out to STDOUT in clustalw format.  This  
> should work for perl 5.8 and up, but it's only been tested on perl  
> 5.10.  You should be able to use this to fit what you want.
>
> my $in = Bio::AlignIO->new(-file => $file,
>                           -format => 'fasta');
> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>                           -format => 'clustalw');
>
> while (my $aln = $in->next_aln) {
>    my $c = $aln->consensus_string(100);
>    my @matches;
>    $c =~ m/
>        ([GTAC?]{18,21})
>        (?{my $match = check_match($1);
>           push @matches, [$match,
>                           pos(),
>                           length($match)]
>              if defined $match;})
>        (?!)
>        /xig;
>    for my $match (@matches) {
>        my ($hit, $st, $end) = ($match->[0],
>                                $match->[1] - $match->[2] + 1,
>                                $match->[1]);
>        my $newaln = $aln->slice($st, $end);
>        $out->write_aln($newaln);
>    }
> }
>
> sub check_match {
>    my $match = shift;
>    return unless $match;
>    my $ct = $match =~ tr/?/?/;
>    return $match if $ct <= 4;
> }
>
>
> chris
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From rvos at interchange.ubc.ca  Sat Aug  9 17:47:33 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Sat, 9 Aug 2008 10:47:33 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<628aabb70808090558j4e820208h6883af0e112d7f55@mail.gmail.com>
	<9DB4A373-B4CF-4207-A631-64951D8DB119@illinois.edu>
	<897A8CAC-EDAF-4F26-B6E3-A8CF0F918A70@gmx.net>
Message-ID: <2bb9b24a0808091047t46a6bfa8r7e11a3a1537180@mail.gmail.com>

I am sure my version of weaken() works as advertised. Is there a way
to turn off species storing from outside the code base or do you mean
I go and start commenting bits out in Bio::SeqIO::genbank (or
wherever)?

On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
> This smells of circular references somewhere. I think the first point I
> would go looking is the species storing - does the problem go away if you
> turn that off? Maybe the version of weaken() is at play here?
>
>        -hilmar
>
> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:
>
>> Forgot to mention, maybe we can file this as a bug?  It's a pretty serious
>> one but it should be easy to narrow down; the change had to be introduced
>> fairly recently.
>>
>> chris
>>
>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>>
>>>>
>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't
>>>> necessarily mean that the memory is returned to the OS for the runtime
>>>> of
>>>> the program
>>>
>>>
>>> I believe that's correct.
>>>
>>>
>>>
>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>>
>>>
>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>>
>>>
>>> Dave
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Christopher Fields
>> Postdoctoral Researcher
>> Lab of Dr. Marie-Claude Hofmann
>> College of Veterinary Medicine
>> University of Illinois Urbana-Champaign
>>
>>
>>
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From hartzell at alerce.com  Sat Aug  9 18:33:51 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 9 Aug 2008 11:33:51 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
Message-ID: <18589.58127.57270.352974@almost.alerce.com>


I'm pretty sure that this fixes the problem:

g.

Index: Bio/Species.pm
===================================================================
--- Bio/Species.pm	(revision 14791)
+++ Bio/Species.pm	(working copy)
@@ -340,6 +340,7 @@
 		}
 		
 		$self->{_species} = $species;
+		weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
     }
 	return $self->{_species};
 }


From cjfields at illinois.edu  Sat Aug  9 19:08:27 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Sat,  9 Aug 2008 14:08:27 -0500 (CDT)
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
Message-ID: <20080809140827.BHN28056@expms6.cites.uiuc.edu>

I'm pretty sure it's not due to a particular version of weaken(), though it does 
sound like a circular references issue.  I have tried this with perl 5.8.6, 5.8.8, 
and 5.10 (all Mac OS, either 10.4 or 10.5); all have the same memory leak 
issues.  

You can try using SeqBuilder to get rid of the Bio::Species object.  I'll give that a 
try when I can.  Unfortunately my laptop is now with the local Apple geniuses 
awaiting a motherboard, so I can't get to it right away (I'll give it a try on my 
wife's laptop).

chris

---- Original message ----
>Date: Sat, 9 Aug 2008 10:47:33 -0700
>From: "Rutger Vos" <rvos at interchange.ubc.ca>  
>Subject: Re: [Bioperl-l] malloc errors while using Bio::SeqIO?  
>To: "Hilmar Lapp" <hlapp at gmx.net>
>Cc: Chris Fields <cjfields at illinois.edu>, bioperl list <bioperl-l at lists.open-
bio.org>
>
>I am sure my version of weaken() works as advertised. Is there a way
>to turn off species storing from outside the code base or do you mean
>I go and start commenting bits out in Bio::SeqIO::genbank (or
>wherever)?
>
>On Sat, Aug 9, 2008 at 9:00 AM, Hilmar Lapp <hlapp at gmx.net> wrote:
>> This smells of circular references somewhere. I think the first point I
>> would go looking is the species storing - does the problem go away if you
>> turn that off? Maybe the version of weaken() is at play here?
>>
>>        -hilmar
>>
>> On Aug 9, 2008, at 10:15 AM, Chris Fields wrote:
>>
>>> Forgot to mention, maybe we can file this as a bug?  It's a pretty serious
>>> one but it should be easy to narrow down; the change had to be 
introduced
>>> fairly recently.
>>>
>>> chris
>>>
>>> On Aug 9, 2008, at 7:58 AM, Dave Messina wrote:
>>>
>>>>>
>>>>> I seem to vaguely recall that even if perl free()'s memory that doesn't
>>>>> necessarily mean that the memory is returned to the OS for the runtime
>>>>> of
>>>>> the program
>>>>
>>>>
>>>> I believe that's correct.
>>>>
>>>>
>>>>
>>>>> What OS are you on? I'm running perl 5.8.6 on OS X 10.4.11 intel.
>>>>>
>>>>
>>>> perl 5.10 or 5.8.8 on OS X 10.5.4 Intel.
>>>>
>>>>
>>>> Dave
>>>> _______________________________________________
>>>> Bioperl-l mailing list
>>>> Bioperl-l at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>> Christopher Fields
>>> Postdoctoral Researcher
>>> Lab of Dr. Marie-Claude Hofmann
>>> College of Veterinary Medicine
>>> University of Illinois Urbana-Champaign
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> --
>> 
=================================================
==========
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> 
=================================================
==========
>>
>>
>>
>>
>
>
>
>-- 
>Dr. Rutger A. Vos
>Department of zoology
>University of British Columbia
>http://www.nexml.org
>http://rutgervos.blogspot.com
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From hartzell at alerce.com  Sun Aug 10 00:17:52 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sat, 9 Aug 2008 17:17:52 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <18589.58127.57270.352974@almost.alerce.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
Message-ID: <18590.13232.892714.952555@almost.alerce.com>


George Hartzell writes:
 > 
 > I'm pretty sure that this fixes the problem:
 > 
 > g.
 > 
 > Index: Bio/Species.pm
 > ===================================================================
 > --- Bio/Species.pm	(revision 14791)
 > +++ Bio/Species.pm	(working copy)
 > @@ -340,6 +340,7 @@
 >  		}
 >  		
 >  		$self->{_species} = $species;
 > +		weaken($self->{tree}->{'_rootnode'}) unless isweak($self->{tree}->{'_rootnode'});
 >      }
 >  	return $self->{_species};
 >  }

Actually, it's a bit clearer with the weaken moved up in the block so
that it's closer to where the new tree is allocated.

Chris suggested that I go ahead and I commit it.

g.


From David.Messina at sbc.su.se  Sun Aug 10 09:57:07 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Sun, 10 Aug 2008 11:57:07 +0200
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <18590.13232.892714.952555@almost.alerce.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
	<18590.13232.892714.952555@almost.alerce.com>
Message-ID: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>

Nice, George -- holds steady at about 32MB now.
Much better. :)


Dave


From hartzell at alerce.com  Sun Aug 10 16:51:39 2008
From: hartzell at alerce.com (George Hartzell)
Date: Sun, 10 Aug 2008 09:51:39 -0700
Subject: [Bioperl-l] malloc errors while using Bio::SeqIO?
In-Reply-To: <628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>
References: <2bb9b24a0808081659x7364fa66h574717ae519369b7@mail.gmail.com>
	<628aabb70808090404u343055d0had384e29f3408839@mail.gmail.com>
	<2bb9b24a0808090436o70030560l784d6f561f0d13fa@mail.gmail.com>
	<18589.58127.57270.352974@almost.alerce.com>
	<18590.13232.892714.952555@almost.alerce.com>
	<628aabb70808100257o1c905255vf1d3a6b9912e21de@mail.gmail.com>
Message-ID: <18591.7323.244987.436383@almost.alerce.com>

Dave Messina writes:
 > Nice, George -- holds steady at about 32MB now.
 > Much better. :)

Good to hear.

Bonus points go to rvos@ for providing such a nice clean bug report and
test case, it made running it down much more appealing.

g.


From valiente at lsi.upc.edu  Mon Aug 11 08:09:37 2008
From: valiente at lsi.upc.edu (Gabriel Valiente)
Date: Mon, 11 Aug 2008 11:09:37 +0300
Subject: [Bioperl-l] get_lca method very slow on many nodes
In-Reply-To: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
References: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
Message-ID: <B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>

Despite the speedup for merge_lineage, the get_lca method still runs  
very slow on a large number of nodes (say, 1500 nodes) and it does  
not rely on merge_lineage. In the get_lca method, all the lineages  
are first collected in @paths in order to later find their $lca,  
while it might be faster to process each $path as soon as it is  
obtained with the get_lineage_nodes method. Any other ideas how to  
speedup the get_lca method? Thanks,

Gabriel


From lmanchon at univ-montp2.fr  Mon Aug 11 16:32:20 2008
From: lmanchon at univ-montp2.fr (Laurent Manchon)
Date: Mon, 11 Aug 2008 18:32:20 +0200
Subject: [Bioperl-l] protein pattern scan
Message-ID: <5.0.2.1.2.20080811182952.00bebff0@pop.univ-montp2.fr>

Hi,

do you know if it's possible to search protein motif in a multifasta 
protein file
using bioperl to return the motif, the position and the name of the 
corresponding sequence ?

thank you for your help.


+---------------------------------------------+
  Laurent Manchon
  Email: lmanchon at univ-montp2.fr
+---------------------------------------------+


From cjfields at illinois.edu  Mon Aug 11 17:32:05 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Mon, 11 Aug 2008 12:32:05 -0500 (CDT)
Subject: [Bioperl-l] protein pattern scan
Message-ID: <20080811123205.BHO45474@expms6.cites.uiuc.edu>

This is covered the FAQ:

http://www.bioperl.org/wiki/FAQ#How_do_I_do_motif_searches_with_BioPerl.3F_Can_I_do_.22find_all_sequences_that_are_75.25_identical.22_to_a_given_motif.3F

chris

---- Original message ----
>Date: Mon, 11 Aug 2008 18:32:20 +0200
>From: Laurent Manchon <lmanchon at univ-montp2.fr>  
>Subject: [Bioperl-l] protein pattern scan  
>To: bioperl-l at lists.open-bio.org
>
>Hi,
>
>do you know if it's possible to search protein motif in a multifasta 
>protein file
>using bioperl to return the motif, the position and the name of the 
>corresponding sequence ?
>
>thank you for your help.
>
>
>+---------------------------------------------+
>  Laurent Manchon
>  Email: lmanchon at univ-montp2.fr
>+---------------------------------------------+
>_______________________________________________
>Bioperl-l mailing list
>Bioperl-l at lists.open-bio.org
>http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bix at sendu.me.uk  Mon Aug 11 17:44:37 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Mon, 11 Aug 2008 18:44:37 +0100
Subject: [Bioperl-l] get_lca method very slow on many nodes
In-Reply-To: <B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>
References: <mailman.2746.1218387110.2673.bioperl-l@lists.open-bio.org>
	<B7C5D84D-451B-499C-B99B-A1DED3EE130E@lsi.upc.edu>
Message-ID: <48A07A85.6050601@sendu.me.uk>

Gabriel Valiente wrote:
> Despite the speedup for merge_lineage, the get_lca method still runs 
> very slow on a large number of nodes (say, 1500 nodes) and it does not 
> rely on merge_lineage. In the get_lca method, all the lineages are first 
> collected in @paths in order to later find their $lca, while it might be 
> faster to process each $path as soon as it is obtained with the 
> get_lineage_nodes method.

If you try that idea out and it works, please do commit it. I've no 
further suggestions atm, but I haven't had a chance to look at it to 
remind myself what happens.


From cjfields at illinois.edu  Mon Aug 11 19:50:38 2008
From: cjfields at illinois.edu (Christopher Fields)
Date: Mon, 11 Aug 2008 14:50:38 -0500 (CDT)
Subject: [Bioperl-l] Finding possible primers regex
Message-ID: <20080811145038.BHO59267@expms6.cites.uiuc.edu>

When I can I could try generating a method which accepts a regex/Bio::Tools::SeqPattern and returns an AlignIO stream or array of SimpleAlign instances (the former could be attached to a temp file for iteration).  Any preference?

chris

---- Original message ----
>Date: Sat, 9 Aug 2008 12:07:30 -0400
>From: Hilmar Lapp <hlapp at gmx.net>  
>Subject: Re: [Bioperl-l] Finding possible primers regex  
>To: Chris Fields <cjfields at uiuc.edu>
>Cc: Benbo <btemperton at googlemail.com>, Bioperl-l at lists.open-bio.org
>
>This looks like a neat trick. Do you think it's worth including as a  
>SimpleAlign method (obviously w/o the printing to STDOUT)? I can  
>imagine that a lot of people might appreciate it.
>
>	-hilmar
>
>On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:
>
>> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>>
>>>
>>> Hi there,
>>> I'm trying to write a perl script to scan an aligned multiple entry  
>>> fasta
>>> file and find possible primers. So far I've produced a string which  
>>> contains
>>> bases which match all sequences and * where they don't match e.g.
>>> 1) TTAGCCTAA
>>> 2) TTAGCAGAA
>>> 3) TTACCCTAA
>>>
>>> would give TTA*C**AA.
>>>
>>> I want to parse this string and pull out all sequences which are  
>>> 18-21 bp in
>>> length and have no more than 4 * in them.
>>>
>>> So far, I've got this:
>>>
>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>> print "$1\n";
>>> }
>>>
>>> hoping to match all fragments 18-21 characters in length. However  
>>> even that
>>> doesn't work as it has essentially chunked it into 21 char blocks,  
>>> rather
>>> than what I hoped for of
>>> 0-18
>>> 0-19
>>> 0-20
>>> 0-21
>>> 1-19
>>> 1-20
>>> 1-21
>>> 1-22
>>>
>>> etc.
>>>
>>> Can anyone let me know if this is already possible in BioPerl, or  
>>> how one
>>> would go about it with regex. Sadly I'm fairly new to perl and  
>>> getting to
>>> grips with BioPerl, so please treat me gently :).
>>>
>>> Many thanks,
>>>
>>> Ben
>>
>> There is a trick to this which is discussed more extensively in  
>> 'Mastering Regular Expressions'.  Essentially you have to embed code  
>> into the regex and trick the parser into backtracking using a  
>> negative lookahead.  The match itself fails (i.e. no match is  
>> returned), but the embedded code is executed for each match attempt,
>>
>> The following script is a slight modification of one I used which  
>> checks the consensus string from the input alignment (in aligned  
>> FASTA format here), extracts the alignment slice using that match,  
>> then spit the alignment out to STDOUT in clustalw format.  This  
>> should work for perl 5.8 and up, but it's only been tested on perl  
>> 5.10.  You should be able to use this to fit what you want.
>>
>> my $in = Bio::AlignIO->new(-file => $file,
>>                           -format => 'fasta');
>> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>>                           -format => 'clustalw');
>>
>> while (my $aln = $in->next_aln) {
>>    my $c = $aln->consensus_string(100);
>>    my @matches;
>>    $c =~ m/
>>        ([GTAC?]{18,21})
>>        (?{my $match = check_match($1);
>>           push @matches, [$match,
>>                           pos(),
>>                           length($match)]
>>              if defined $match;})
>>        (?!)
>>        /xig;
>>    for my $match (@matches) {
>>        my ($hit, $st, $end) = ($match->[0],
>>                                $match->[1] - $match->[2] + 1,
>>                                $match->[1]);
>>        my $newaln = $aln->slice($st, $end);
>>        $out->write_aln($newaln);
>>    }
>> }
>>
>> sub check_match {
>>    my $match = shift;
>>    return unless $match;
>>    my $ct = $match =~ tr/?/?/;
>>    return $match if $ct <= 4;
>> }
>>
>>
>> chris
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>-- 
>===========================================================
>: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>===========================================================
>
>
>


From hlapp at gmx.net  Tue Aug 12 02:35:13 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Mon, 11 Aug 2008 22:35:13 -0400
Subject: [Bioperl-l] Finding possible primers regex
In-Reply-To: <20080811145038.BHO59267@expms6.cites.uiuc.edu>
References: <20080811145038.BHO59267@expms6.cites.uiuc.edu>
Message-ID: <B79B827A-E66C-417D-A5E8-B4FE96791089@gmx.net>

Actually, now that you ask I'm wondering whether one wouldn't  
sometimes want to retain the relationship between the match and the  
resulting spliced alignment? If so, neither AlignIO nor array would  
accomplish that, right?

Other than that I myself don't have a strong preference either way. I  
suppose AlignIO stream is somewhat more extensible, since as you say  
it could be coupled to a file if the resulting set of alignments is  
really large.

	-hilmar

On Aug 11, 2008, at 3:50 PM, Christopher Fields wrote:

> When I can I could try generating a method which accepts a regex/ 
> Bio::Tools::SeqPattern and returns an AlignIO stream or array of  
> SimpleAlign instances (the former could be attached to a temp file  
> for iteration).  Any preference?
>
> chris
>
> ---- Original message ----
>> Date: Sat, 9 Aug 2008 12:07:30 -0400
>> From: Hilmar Lapp <hlapp at gmx.net>
>> Subject: Re: [Bioperl-l] Finding possible primers regex
>> To: Chris Fields <cjfields at uiuc.edu>
>> Cc: Benbo <btemperton at googlemail.com>, Bioperl-l at lists.open-bio.org
>>
>> This looks like a neat trick. Do you think it's worth including as a
>> SimpleAlign method (obviously w/o the printing to STDOUT)? I can
>> imagine that a lot of people might appreciate it.
>>
>> 	-hilmar
>>
>> On Aug 4, 2008, at 12:08 AM, Chris Fields wrote:
>>
>>> On Aug 2, 2008, at 3:05 PM, Benbo wrote:
>>>
>>>>
>>>> Hi there,
>>>> I'm trying to write a perl script to scan an aligned multiple entry
>>>> fasta
>>>> file and find possible primers. So far I've produced a string which
>>>> contains
>>>> bases which match all sequences and * where they don't match e.g.
>>>> 1) TTAGCCTAA
>>>> 2) TTAGCAGAA
>>>> 3) TTACCCTAA
>>>>
>>>> would give TTA*C**AA.
>>>>
>>>> I want to parse this string and pull out all sequences which are
>>>> 18-21 bp in
>>>> length and have no more than 4 * in them.
>>>>
>>>> So far, I've got this:
>>>>
>>>> while($fragment_match =~ /([GTAC*]{18,21})/g){
>>>> print "$1\n";
>>>> }
>>>>
>>>> hoping to match all fragments 18-21 characters in length. However
>>>> even that
>>>> doesn't work as it has essentially chunked it into 21 char blocks,
>>>> rather
>>>> than what I hoped for of
>>>> 0-18
>>>> 0-19
>>>> 0-20
>>>> 0-21
>>>> 1-19
>>>> 1-20
>>>> 1-21
>>>> 1-22
>>>>
>>>> etc.
>>>>
>>>> Can anyone let me know if this is already possible in BioPerl, or
>>>> how one
>>>> would go about it with regex. Sadly I'm fairly new to perl and
>>>> getting to
>>>> grips with BioPerl, so please treat me gently :).
>>>>
>>>> Many thanks,
>>>>
>>>> Ben
>>>
>>> There is a trick to this which is discussed more extensively in
>>> 'Mastering Regular Expressions'.  Essentially you have to embed code
>>> into the regex and trick the parser into backtracking using a
>>> negative lookahead.  The match itself fails (i.e. no match is
>>> returned), but the embedded code is executed for each match attempt,
>>>
>>> The following script is a slight modification of one I used which
>>> checks the consensus string from the input alignment (in aligned
>>> FASTA format here), extracts the alignment slice using that match,
>>> then spit the alignment out to STDOUT in clustalw format.  This
>>> should work for perl 5.8 and up, but it's only been tested on perl
>>> 5.10.  You should be able to use this to fit what you want.
>>>
>>> my $in = Bio::AlignIO->new(-file => $file,
>>>                          -format => 'fasta');
>>> my $out = Bio::AlignIO->new(-fh => \*STDOUT,
>>>                          -format => 'clustalw');
>>>
>>> while (my $aln = $in->next_aln) {
>>>   my $c = $aln->consensus_string(100);
>>>   my @matches;
>>>   $c =~ m/
>>>       ([GTAC?]{18,21})
>>>       (?{my $match = check_match($1);
>>>          push @matches, [$match,
>>>                          pos(),
>>>                          length($match)]
>>>             if defined $match;})
>>>       (?!)
>>>       /xig;
>>>   for my $match (@matches) {
>>>       my ($hit, $st, $end) = ($match->[0],
>>>                               $match->[1] - $match->[2] + 1,
>>>                               $match->[1]);
>>>       my $newaln = $aln->slice($st, $end);
>>>       $out->write_aln($newaln);
>>>   }
>>> }
>>>
>>> sub check_match {
>>>   my $match = shift;
>>>   return unless $match;
>>>   my $ct = $match =~ tr/?/?/;
>>>   return $match if $ct <= 4;
>>> }
>>>
>>>
>>> chris
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> -- 
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
>> ===========================================================
>>
>>
>>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mirhan at indiana.edu  Tue Aug 12 03:46:35 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Mon, 11 Aug 2008 23:46:35 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <D8FD887C-86A9-4E53-9A59-10FDE295C067@duke.edu>
Message-ID: <C4C67FDB.5282%mirhan@indiana.edu>


Hi,
Yes it is true that it's similar to get_all_Annotations, it's basically a recursive version of it.
I wanted to provide a method to get at nested annotations without going through all the if(isa collection) do recursive call.. etc. everytime, because most of the xml elements are implemented as nested annotation collections to the nodes.
( I am contemplating on using tagtrees instead of nested annotation collections in the future, but as of now, Annotation::tagtrees was documented as a temporary implementation, so I passed on that option. )
I forgot about the interface part.
At least for my purpose I would think it's a good function to have in the interface.
I agree that adding a recursive option to the get_all_Annotation would be better.

Mira


On 8/11/08 11:28 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:33 AM, Han, Mira wrote:

> Added get_deep_Annotations in Annotation::Collection
> in order to get annotations that are within nested collections.


I hope I'm not contradicting Chris here, but we will probably want to
think about this a bit more. Your implementation won't work as it is
assuming an interface function that isn't defined on the interface
(both get_deep_Annotations() and _deep_annotation_helper()). Also, it
does nearly the same as get_all_Annotations(), and passing on the keys
to nested collections should maybe simply be an option to that method.
Alternatively, one could add an option -recurse to get_Annotation.

The other difference you note is that your method does not flatten the
nested annotations, but unless I am missing something your
implementation does flatten annotations from nested collections.

So even if we need a separate method for this, something like
get_nested_Annotations() would probably be a more appropriate name,
and if we do need a separate method, it should be compelling enough to
add it to the interface too (as otherwise your code will only work
with certain implementation classes).

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From mirhan at indiana.edu  Tue Aug 12 04:00:28 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Tue, 12 Aug 2008 00:00:28 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu>
Message-ID: <C4C6831C.5287%mirhan@indiana.edu>


Oh yes,
I meant get_Annotations,
I want a get_Annotations that is recursive and passes the keys to the recursive calls.


On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is
already recursive)

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From hlapp at duke.edu  Tue Aug 12 03:54:43 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Mon, 11 Aug 2008 23:54:43 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4C67FDB.5282%mirhan@indiana.edu>
References: <C4C67FDB.5282%mirhan@indiana.edu>
Message-ID: <9E53DAE8-3A8F-4EEC-B2B4-741214907D90@duke.edu>

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's  
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is  
already recursive)

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From mrphysh at juno.com  Tue Aug 12 14:30:36 2008
From: mrphysh at juno.com (mrphysh at juno.com)
Date: Tue, 12 Aug 2008 14:30:36 GMT
Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem
Message-ID: <20080812.083036.25924.0@webmail02.vgs.untd.com>

I am studying bioperl and making progress.  I have been struggling with the database retrieval from 
on-line databases.  This is an example................

#!/usr/bin/perl -w
 
use Bio::Perl;

 
$seq_object = get_sequence('swiss',"ROA1_HUMAN");
 
write_sequence(">roa1.fasta",'fasta',$seq_object);

exit;

This script gives

Can't locate IO/String.pm in @INC (@INC contains: /etc/perl /usr/local/lib/perl/5.8.8 /usr/local/share/perl/5.8.8 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.8 /usr/share/perl/5.8 /usr/local/lib/site_perl .) at ee_bpo.pl line 12.
BEGIN failed--compilation aborted at ee_bpo.pl line 12.

I have chased around with the paths in @INC, using "use lib'.

This is an install problem.  The original installation was with perl Makefile.pl.  I reinstalled over the old with cpan.  stuff like this:
cpan> o conf prerequisites_policy follow
cpan> i /bioperl/
cpan> install Bundle::BioPerl
cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
This all seemed to proceed smoothly.

this guy did not produce an error.
use Bio::Perl;

I am almost thinking that the problem is with the perl. But regular ftp through perl works:

use Net::FTP;#I found this in usr/share/perl/5.8.8/Net
As a perl command this module seems to work.

I looked in the archives and found nothing.  I think I have done my homework.  any ideas?

I run Ubuntu on a pentium III (and love it).  the version of Ubuntu is new.  the Perl (and MySQL) came with the OS: perl 5.8.8

John Brigham in Denver.


____________________________________________________________
Click to get a free auto insurance quotes from top companies.
http://thirdpartyoffers.juno.com/TGL2141/fc/Ioyw6i3m2nsox4VCjepKpyEFCMEzNF4I2x42PAQjIIwUwo0E7h1wL0/


From jay at jays.net  Tue Aug 12 15:08:59 2008
From: jay at jays.net (Jay Hannah)
Date: Tue, 12 Aug 2008 10:08:59 -0500
Subject: [Bioperl-l] Can't locate IO/String.pm._._..install problem
In-Reply-To: <20080812.083036.25924.0@webmail02.vgs.untd.com>
References: <20080812.083036.25924.0@webmail02.vgs.untd.com>
Message-ID: <CA21F550-DF97-41EE-BE8C-CBDFE67B0148@jays.net>

On Aug 12, 2008, at 2:30 PM, mrphysh at juno.com wrote:
> Can't locate IO/String.pm in @INC
...
> cpan> install Bundle::BioPerl
> cpan> install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> cpan> force install B/BI/BIRNEY/bioperl-1.2.1.tar.gz
> This all seemed to proceed smoothly

bioperl-1.2.1 is very old. Apparently Bundle::BioPerl is out of date?

Here's lots of info about installing BioPerl:

    http://www.bioperl.org/wiki/Getting_BioPerl

I recommend using bioperl-live directly from SVN, but I'm sort of a  
rebel like that.  :)

Alternately, you could try just doing a

    cpan> install IO::String

HTH,

j
http://clab.ist.unomaha.edu/CLAB/index.php/User:Jhannah


From heikki at sanbi.ac.za  Thu Aug 14 13:14:48 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Thu, 14 Aug 2008 15:14:48 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
Message-ID: <200808141514.49124.heikki@sanbi.ac.za>

A generic method for retrieving nodes from a Bio::Tree::TreeI objects is 
Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the 'id' 
attribute unless a field is given. I can retrieve nodes based on internal id 
like this:

   $tree->find_node(-internal_id => $internal_id);

I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that retrieves by id. 
However, the POD documentation claims that it retrieves by internal id.

What needs to be done?

A. Fix the doc to speak about id
B. Fix to code to retrieve by internal_id
C. Fix the doc and create findnode_by_internal_id()
C. Remove findnode_by_id() as redundant and confusing
D. Deprecate findnode_by_id() as redundant and confusing

There are no tests for findnode_by_id() which to me tilts selection to D and A 
for now.

Any other opinions?

	-Heikki


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From hlapp at gmx.net  Thu Aug 14 22:28:20 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Thu, 14 Aug 2008 18:28:20 -0400
Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API?
In-Reply-To: <48A448DD.4000206@psb.ugent.be>
References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com>
	<48A448DD.4000206@psb.ugent.be>
Message-ID: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>

Hi Erick,

how did you determine that go-perl is specific to GO? I've found it to  
work quite well for any kind of OBO-formatted ontology.

Also, you note that BioPerl doesn't have the ability to write in  
certain formats, and to intersect and "unify" (would you mind  
explaining what you mean by that?) ontologies. It seems that your  
implementation of RDF etc export isn't really reusable or modular in  
any way, but I'd love to bring the intersection function over to  
BioPerl (BTW when you decided to roll your own ontology API, did you  
get the impression that BioPerl isn't receptive to you adding to it?).  
Would you mind pointing me to the place in the code where I would find  
that, as I can't seem to find it.

	-hilmar

On Aug 14, 2008, at 11:01 AM, Erick Antezana wrote:

> Hi Arne,
>
> if you plan to work with PERL, you might take a look at ONTO-PERL :
>
> http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btn042
> http://search.cpan.org/dist/ONTO-PERL/
> http://search.cpan.org/src/EASR/ONTO-PERL-1.13/doc/example00.html
>
> ONTO-PERL has been used intensively to build the Cell Cycle Ontology.
>
> cheers,
> Erick
>
> Arne Muller wrote:
>> Dear All,
>>
>> I'm new to this list and don't know much about ontologies in general
>> (I worked a bit with GO some time ago).
>>
>> Let me explain my problem: We have several related vocabularies
>> (non-hierarchical and redundant because of different spellings etc
>> ...) to describe organs and tissues in our department, and we need to
>> map each of these vocabs to all of our other legacy vocabs that
>> describe similar concepts. We'd like to use the adult mouse anatomy
>> ontology and modify/extend it with additional terms (if necessary),
>> synonyms and dbXrefs. Most of our vocabs should be mapped as dbXrefs
>> to existing terms in the MA ontology. The goal is that different  
>> units
>> in our department use slightly different vocabulary to describe
>> samples, and we now need link these different system (always the same
>> old story ... ;-).
>>
>> For the moment I'm not planning to turn our messy legacy vocabs into
>> OBO formated ontologies and to map them via cross products and the  
>> OBO
>> relation ontology - though this might be the most  proper way to do
>> ... (comments are welcome).
>>
>> I'll have to write an "easy to use" tool that allows our data curator
>> to easily map the legacy vocabs as dbXrefs of terms in the MA
>> ontology. The question is, how am I gonna do this? I've a fairly good
>> idea of how my software (java webapp) should look like, but are there
>> any APIs and implementations of the OBO model as well as a DB schema
>> and mappings between the model and the schema?
>>
>> I've had a look into the OLS from the EBI that seems to be fairly
>> simple (which is good ;-) and that uses the oboedit.jar somewhere at
>> the back-end. I've also found something like an obo api on
>> http://wiki.geneontology.org/index.php/OBO-Edit:_Getting_the_Source_Code#.28Optional.29_Getting_the_OBO_API_from_Subclipse
>> but so far I've not found any documentation nor examples on how to  
>> get
>> started.
>>
>> I'd be happy to hear how developers and bioinformatics people use obo
>> in their own tools (I better ask before going DIY ...).
>>
>>   thanks a lot for your comments and help
>>   +kind regards,
>>
>>   Arne
>>
>> ------------------------------------------------------------------------
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's  
>> challenge
>> Build the coolest Linux based applications with Moblin SDK & win  
>> great prizes
>> Grand prize is a trip for two to an Open Source event anywhere in  
>> the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> Obo-discuss mailing list
>> Obo-discuss at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/obo-discuss
>>
>
> -- 
> ==================================================================
> Erick Antezana                    http://www.cellcycleontology.org
> PhD student
> Tel:+32 (0)9 331 38 24                        fax:+32 (0)9 3313809
> VIB Department of Plant Systems Biology, Ghent University
> Technologiepark 927, 9052 Gent, BELGIUM
> erant at psb.ugent.be                  http://www.psb.ugent.be/~erant
> ==================================================================
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Obo-discuss mailing list
> Obo-discuss at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/obo-discuss

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From mjanis at chem.ucla.edu  Thu Aug 14 23:37:05 2008
From: mjanis at chem.ucla.edu (Michael Janis)
Date: Thu, 14 Aug 2008 16:37:05 -0700
Subject: [Bioperl-l] Code to contribute
Message-ID: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>

Hi,

 
I've had some perl code lying around for what seems like forever and I'd
like to contribute it to bioperl, if such facilities don't already exist in
bioperl.  The code implements shuffling (DNA or RNA) keeping the
dinucleotide composition (and codon usage) intact through a Eularian path
approach as described in Altschul and Erickson (1985).  The code seeds the
Eularian paths by keeping the first and last nucleotide invariant in the
shuffle - which has minimal detrimental effects to the purpose of the
algorithm, in my experience.

 
A quick search on the bioperl website shows that there is a mutation.pls
script, and facilities for using Sean Eddy's SQUID C library, which
implements the same function (I wrote this particular function before I knew
how to use C).  As such, it's probably not as elegant as Sean Eddy's
implementation, but it works - and it's entirely in perl.

 
The bioperl developer pages suggest a post to the mailing list as the best
place to start contributing to bioperl.  Is this a useful function to add to
the project?

 
Best Regards,

 
Michael

 
-------------------------------

Michael Janis

 <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu

-------------------------------

 
From rvos at interchange.ubc.ca  Thu Aug 14 23:51:43 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Thu, 14 Aug 2008 16:51:43 -0700
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>
	<2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
Message-ID: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>

Sounds exciting! I bet the general advice you'll get is to i) check
out the latest code from svn ii) see which bioperl objects/interfaces
(e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
iii) write a class that performs the algorithm as some sort of
analysis factory taking the sequence object (or ideally object
interface) as an input iv) run that class by the mailing list v) check
it into svn.

On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu> wrote:
> Hi,
>
>
>
> I've had some perl code lying around for what seems like forever and I'd
> like to contribute it to bioperl, if such facilities don't already exist in
> bioperl.  The code implements shuffling (DNA or RNA) keeping the
> dinucleotide composition (and codon usage) intact through a Eularian path
> approach as described in Altschul and Erickson (1985).  The code seeds the
> Eularian paths by keeping the first and last nucleotide invariant in the
> shuffle - which has minimal detrimental effects to the purpose of the
> algorithm, in my experience.
>
>
>
> A quick search on the bioperl website shows that there is a mutation.pls
> script, and facilities for using Sean Eddy's SQUID C library, which
> implements the same function (I wrote this particular function before I knew
> how to use C).  As such, it's probably not as elegant as Sean Eddy's
> implementation, but it works - and it's entirely in perl.
>
>
>
> The bioperl developer pages suggest a post to the mailing list as the best
> place to start contributing to bioperl.  Is this a useful function to add to
> the project?
>
>
>
> Best Regards,
>
>
>
> Michael
>
>
>
> -------------------------------
>
> Michael Janis
>
>  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>
> -------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From mjanis at chem.ucla.edu  Thu Aug 14 23:55:04 2008
From: mjanis at chem.ucla.edu (Michael Janis)
Date: Thu, 14 Aug 2008 16:55:04 -0700
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu>
	<2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
Message-ID: <008701c8fe69$2cee6020$86cb2060$@ucla.edu>

Thanks, Rutger, I'll do exactly that!  (give me a few days)


Best Regards,

Michael

-------------------------------
Michael Janis
mjanis at chem.ucla.edu
-------------------------------


-----Original Message-----
From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-bio.org] On Behalf Of Rutger Vos
Sent: Thursday, August 14, 2008 4:52 PM
To: bioperl-l at lists.open-bio.org
Subject: [Bioperl-l] Fwd: Code to contribute

Sounds exciting! I bet the general advice you'll get is to i) check
out the latest code from svn ii) see which bioperl objects/interfaces
(e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
iii) write a class that performs the algorithm as some sort of
analysis factory taking the sequence object (or ideally object
interface) as an input iv) run that class by the mailing list v) check
it into svn.

On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu> wrote:
> Hi,
>
>
>
> I've had some perl code lying around for what seems like forever and I'd
> like to contribute it to bioperl, if such facilities don't already exist
in
> bioperl.  The code implements shuffling (DNA or RNA) keeping the
> dinucleotide composition (and codon usage) intact through a Eularian path
> approach as described in Altschul and Erickson (1985).  The code seeds the
> Eularian paths by keeping the first and last nucleotide invariant in the
> shuffle - which has minimal detrimental effects to the purpose of the
> algorithm, in my experience.
>
>
>
> A quick search on the bioperl website shows that there is a mutation.pls
> script, and facilities for using Sean Eddy's SQUID C library, which
> implements the same function (I wrote this particular function before I
knew
> how to use C).  As such, it's probably not as elegant as Sean Eddy's
> implementation, but it works - and it's entirely in perl.
>
>
>
> The bioperl developer pages suggest a post to the mailing list as the best
> place to start contributing to bioperl.  Is this a useful function to add
to
> the project?
>
>
>
> Best Regards,
>
>
>
> Michael
>
>
>
> -------------------------------
>
> Michael Janis
>
>  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>
> -------------------------------
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com
_______________________________________________
Bioperl-l mailing list
Bioperl-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/bioperl-l


From Russell.Smithies at agresearch.co.nz  Fri Aug 15 01:17:23 2008
From: Russell.Smithies at agresearch.co.nz (Smithies, Russell)
Date: Fri, 15 Aug 2008 13:17:23 +1200
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
Message-ID: <D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>

You forgot 2 points,

vi) write documentation/examples, and  vii) write tests

;-)


Russell

> -----Original Message-----
> From: bioperl-l-bounces at lists.open-bio.org
[mailto:bioperl-l-bounces at lists.open-
> bio.org] On Behalf Of Rutger Vos
> Sent: Friday, 15 August 2008 11:52 a.m.
> To: bioperl-l at lists.open-bio.org
> Subject: [Bioperl-l] Fwd: Code to contribute
> 
> Sounds exciting! I bet the general advice you'll get is to i) check
> out the latest code from svn ii) see which bioperl objects/interfaces
> (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
> iii) write a class that performs the algorithm as some sort of
> analysis factory taking the sequence object (or ideally object
> interface) as an input iv) run that class by the mailing list v) check
> it into svn.
> 
> On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu>
wrote:
> > Hi,
> >
> >
> >
> > I've had some perl code lying around for what seems like forever and
I'd
> > like to contribute it to bioperl, if such facilities don't already
exist in
> > bioperl.  The code implements shuffling (DNA or RNA) keeping the
> > dinucleotide composition (and codon usage) intact through a Eularian
path
> > approach as described in Altschul and Erickson (1985).  The code
seeds the
> > Eularian paths by keeping the first and last nucleotide invariant in
the
> > shuffle - which has minimal detrimental effects to the purpose of
the
> > algorithm, in my experience.
> >
> >
> >
> > A quick search on the bioperl website shows that there is a
mutation.pls
> > script, and facilities for using Sean Eddy's SQUID C library, which
> > implements the same function (I wrote this particular function
before I knew
> > how to use C).  As such, it's probably not as elegant as Sean Eddy's
> > implementation, but it works - and it's entirely in perl.
> >
> >
> >
> > The bioperl developer pages suggest a post to the mailing list as
the best
> > place to start contributing to bioperl.  Is this a useful function
to add to
> > the project?
> >
> >
> >
> > Best Regards,
> >
> >
> >
> > Michael
> >
> >
> >
> > -------------------------------
> >
> > Michael Janis
> >
> >  <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
> >
> > -------------------------------
> >
> >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
> >
> 
> 
> 
> --
> Dr. Rutger A. Vos
> Department of zoology
> University of British Columbia
> http://www.nexml.org
> http://rutgervos.blogspot.com
> 
> 
> 
> --
> Dr. Rutger A. Vos
> Department of zoology
> University of British Columbia
> http://www.nexml.org
> http://rutgervos.blogspot.com
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================


From mirhan at indiana.edu  Fri Aug 15 06:11:46 2008
From: mirhan at indiana.edu (Han, Mira)
Date: Fri, 15 Aug 2008 02:11:46 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4C6831C.5287%mirhan@indiana.edu>
Message-ID: <C4CA9662.52BD%mirhan@indiana.edu>


Hi,
I've fixed the get_deep_Annotations() to get_nested_Annotations()
It has arguments -keys and -recursive,
And behaves exactly like get_Annotations() when recursive is not set (tested by replacing the get_Annotations() in the Annotation.t)
I made it a new function instead of modifying get_Annotations() because I wasn't sure how to modify it to be backwards compatible.
I thought of adding the function to the interface AnnotationCollectionI,
But it seemed like get_all_Annotations() was missing from the interface as well,
So decided to ask if it should be added to the interface at all.
Isn't it possible that certain implementations of the interface has functions that are only specific to that implementation?

Mira


On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:


Oh yes,
I meant get_Annotations,
I want a get_Annotations that is recursive and passes the keys to the recursive calls.


On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:

Hi Mira -

On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:

> Yes it is true that it's similar to get_all_Annotations, it's
> basically a recursive version of it.


I suppose you mean get_Annotations(), right? (get_all_Annotations() is
already recursive)

        -hilmar
--
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From cjfields at illinois.edu  Fri Aug 15 13:59:42 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Aug 2008 08:59:42 -0500
Subject: [Bioperl-l] Fwd:  Code to contribute
In-Reply-To: <D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>
References: <008201c8fe66$aa21f2d0$fe65d870$@ucla.edu><2bb9b24a0808141651n20fa102eh735f6a9d07409edd@mail.gmail.com>
	<2bb9b24a0808141651x46239ad5o1d8790eabd922453@mail.gmail.com>
	<D5DBA313349A4B458528BE63B387F36C07597F2F@imail.agresearch.co.nz>
Message-ID: <BF55B9C8-B43A-404A-8085-4C8200B9B4AA@illinois.edu>

Agreed!

We're hoping to move to a more structured core after 1.6 which will  
require decent documentation and tests for inclusion.  My feeling is  
one should free to add code samples to relevant pages/sections in the  
BioPerl wiki, or write up your own HOWTO (it's not terribly hard to  
do, and it adds to your karma).

chris

On Aug 14, 2008, at 8:17 PM, Smithies, Russell wrote:

> You forgot 2 points,
>
> vi) write documentation/examples, and  vii) write tests
>
> ;-)
>
>
>
> Russell
>
>> -----Original Message-----
>> From: bioperl-l-bounces at lists.open-bio.org
> [mailto:bioperl-l-bounces at lists.open-
>> bio.org] On Behalf Of Rutger Vos
>> Sent: Friday, 15 August 2008 11:52 a.m.
>> To: bioperl-l at lists.open-bio.org
>> Subject: [Bioperl-l] Fwd: Code to contribute
>>
>> Sounds exciting! I bet the general advice you'll get is to i) check
>> out the latest code from svn ii) see which bioperl objects/interfaces
>> (e.g. Bio::Seq) you'd use to integrate your algorithm into bioperl
>> iii) write a class that performs the algorithm as some sort of
>> analysis factory taking the sequence object (or ideally object
>> interface) as an input iv) run that class by the mailing list v)  
>> check
>> it into svn.
>>
>> On Thu, Aug 14, 2008 at 4:37 PM, Michael Janis <mjanis at chem.ucla.edu>
> wrote:
>>> Hi,
>>>
>>>
>>>
>>> I've had some perl code lying around for what seems like forever and
> I'd
>>> like to contribute it to bioperl, if such facilities don't already
> exist in
>>> bioperl.  The code implements shuffling (DNA or RNA) keeping the
>>> dinucleotide composition (and codon usage) intact through a Eularian
> path
>>> approach as described in Altschul and Erickson (1985).  The code
> seeds the
>>> Eularian paths by keeping the first and last nucleotide invariant in
> the
>>> shuffle - which has minimal detrimental effects to the purpose of
> the
>>> algorithm, in my experience.
>>>
>>>
>>>
>>> A quick search on the bioperl website shows that there is a
> mutation.pls
>>> script, and facilities for using Sean Eddy's SQUID C library, which
>>> implements the same function (I wrote this particular function
> before I knew
>>> how to use C).  As such, it's probably not as elegant as Sean Eddy's
>>> implementation, but it works - and it's entirely in perl.
>>>
>>>
>>>
>>> The bioperl developer pages suggest a post to the mailing list as
> the best
>>> place to start contributing to bioperl.  Is this a useful function
> to add to
>>> the project?
>>>
>>>
>>>
>>> Best Regards,
>>>
>>>
>>>
>>> Michael
>>>
>>>
>>>
>>> -------------------------------
>>>
>>> Michael Janis
>>>
>>> <mailto:mjanis at chem.ucla.edu> mjanis at chem.ucla.edu
>>>
>>> -------------------------------
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>>
>>
>>
>>
>> --
>> Dr. Rutger A. Vos
>> Department of zoology
>> University of British Columbia
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>>
>>
>>
>> --
>> Dr. Rutger A. Vos
>> Department of zoology
>> University of British Columbia
>> http://www.nexml.org
>> http://rutgervos.blogspot.com
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
> = 
> ======================================================================
> Attention: The information contained in this message and/or  
> attachments
> from AgResearch Limited is intended only for the persons or entities
> to which it is addressed and may contain confidential and/or  
> privileged
> material. Any review, retransmission, dissemination or other use of,  
> or
> taking of any action in reliance upon, this information by persons or
> entities other than the intended recipients is prohibited by  
> AgResearch
> Limited. If you have received this message in error, please notify the
> sender immediately.
> = 
> ======================================================================
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at illinois.edu  Fri Aug 15 14:12:10 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 15 Aug 2008 09:12:10 -0500
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <C4CA9662.52BD%mirhan@indiana.edu>
References: <C4CA9662.52BD%mirhan@indiana.edu>
Message-ID: <3371D481-1416-4976-9846-83CF83395039@illinois.edu>

The method get_all_annotation_keys() is present in  
AnnotationCollectionI but not get_all_Annotations(), though I doubt it  
is set up for recursive retrieval (something that might be worth  
testing).  I don't have a problem adding this in.  Hilmar, thoughts?

chris

On Aug 15, 2008, at 1:11 AM, Han, Mira wrote:

>
> Hi,
> I've fixed the get_deep_Annotations() to get_nested_Annotations()
> It has arguments -keys and -recursive,
> And behaves exactly like get_Annotations() when recursive is not set  
> (tested by replacing the get_Annotations() in the Annotation.t)
> I made it a new function instead of modifying get_Annotations()  
> because I wasn't sure how to modify it to be backwards compatible.
> I thought of adding the function to the interface  
> AnnotationCollectionI,
> But it seemed like get_all_Annotations() was missing from the  
> interface as well,
> So decided to ask if it should be added to the interface at all.
> Isn't it possible that certain implementations of the interface has  
> functions that are only specific to that implementation?
>
> Mira
>
>
> On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:
>
>
> Oh yes,
> I meant get_Annotations,
> I want a get_Annotations that is recursive and passes the keys to  
> the recursive calls.
>
>
>
> On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
>
> Hi Mira -
>
> On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:
>
>> Yes it is true that it's similar to get_all_Annotations, it's
>> basically a recursive version of it.
>
>
> I suppose you mean get_Annotations(), right? (get_all_Annotations() is
> already recursive)
>
>        -hilmar
> --
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
> ===========================================================
>
>
>
>
>
>
>

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From jorvis at gmail.com  Fri Aug 15 19:45:23 2008
From: jorvis at gmail.com (Joshua Orvis)
Date: Fri, 15 Aug 2008 15:45:23 -0400
Subject: [Bioperl-l] help creating de novo GFF3
Message-ID: <a58187640808151245o48af6311hb8afee40341b7ef2@mail.gmail.com>

I don't have a lot of experience with Bioperl and have used it mostly for
simple format conversions or parsing Genbank files.  I need to create a
quick script to create GFF3 and decided to give bioperl a try again instead
of just printing the columns myself but have had a few problems.  My
apologies for the narrative here but I know it can sometimes be informative
to hear 'how' a user arrived at a problem rather than just knowing the
problem itself.

Is there a documented explicit mapping between the GFF3 columns and the
predefined tags (ID, Name, etc.) and their Bioperl object attribute
equivalents?  Is it preferrable to create Bio::SeqFeature::Generic objects
and pass them to Bio::Tools::GFF->write_feature or rather to create
Bio::SeqFeature::Annotated and pass them to Bio::FeatureIO::gff ?   I may be
overlooking it, but a simple tutorial showing how to create and define a new
sequence object, attach annotations to it and dump in GFF format seems to be
missing.  This seems like a basic thing to do - most of the documentation I
find is about converting between formats rather than creating new
annotation.

Here are some of the problems I (a typical naive user?) ran into when
adventuring with bioperl here.   My first attempt resulted in the string
"SEQ" as column 0 in all my GFF output.  I thought that maybe this was
because my features weren't 'attached' to a sequence, so I created a
Bio::Seq::RichSeq object and tried both (separately):

$seq->add_SeqFeature( $feat );

and

$feat->attach_seq( $seq );

Neither changed the first column of output.  Looking at the
docs.bioperl.orgmethods for Bio::SeqFeature::Generic I found the
seq_id attribute, which
came with the warning: "This attribute should *not* be used in GFF dumping"
- but since it's the only thing I did that worked, I used it anyway.

Next I wanted to have ID tags within my last column.  I first tried setting
all the relevant attributes I could see on my features (id, primary_tag,
display_name, display_id, etc.) but none of these caused ID=? to be
created.  Next, I tried something like this:


my $feat = new Bio::SeqFeature::Annotated (
        -start        => $start,
        -end          => $end,
        -strand       => $strand,
        -primary      => 'gene',
        -seq_id       => $asmbl_id,  ## this works but is discouraged
        -tag          => { ID => $transcript->{pub_locus},
                           product_name => $transcript->{com_name},
                           ec_number => $transcript->{'ec#'},
                           gene_symbol => $transcript->{gene_sym}
                         }
);

My hopes that passing it via the -tag option would do the trick failed, as
it created a line like this instead:

10263   .       gene    58512   56983   .       +       .       iD=AN9220.4;

Notice the 'ID' -> 'iD' transformation (without any command-line warnings).

I'm still stuck on this one (Parent would be next) but overall guidance or
pointers to a tutorial/documentation I'm overlooking would be great.

JO


From jason at bioperl.org  Fri Aug 15 23:00:04 2008
From: jason at bioperl.org (Jason Stajich)
Date: Fri, 15 Aug 2008 16:00:04 -0700
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <200808141514.49124.heikki@sanbi.ac.za>
References: <200808141514.49124.heikki@sanbi.ac.za>
Message-ID: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>

I think D.

There should probably only be one find_node function.

findnode_by_id was written by Ramiro for the re-rooting code and I  
guess it wasn't checked to reduce unneeded functions.  I don't have  
any problems removing/deprecating it but will need to update the code  
that depends on it to use find_node properly.

-jason
On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:

> A generic method for retrieving nodes from a Bio::Tree::TreeI  
> objects is
> Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the  
> 'id'
> attribute unless a field is given. I can retrieve nodes based on  
> internal id
> like this:
>
>   $tree->find_node(-internal_id => $internal_id);
>
> I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that  
> retrieves by id.
> However, the POD documentation claims that it retrieves by internal  
> id.
>
> What needs to be done?
>
> A. Fix the doc to speak about id
> B. Fix to code to retrieve by internal_id
> C. Fix the doc and create findnode_by_internal_id()
> C. Remove findnode_by_id() as redundant and confusing
> D. Deprecate findnode_by_id() as redundant and confusing
>
> There are no tests for findnode_by_id() which to me tilts selection  
> to D and A
> for now.
>
> Any other opinions?
>
> 	-Heikki
>
>
> -- 
> ______ _/      _/_____________________________________________________
>      _/      _/
>     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
>    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
>   _/  _/  _/  SANBI, South African National Bioinformatics Institute
>  _/  _/  _/  University of Western Cape, South Africa
>     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> ___ _/_/_/_/_/________________________________________________________
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Jason Stajich
jason at bioperl.org


From schmidtc at udel.edu  Fri Aug 15 23:42:44 2008
From: schmidtc at udel.edu (Carl Schmidt)
Date: Fri, 15 Aug 2008 19:42:44 -0400
Subject: [Bioperl-l] lazy symbol binding
Message-ID: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu>

When I attempt  bp_load_gff.pl
I get the following error:

dyld: lazy symbol binding failed: Symbol not found: _mysql_init
   Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ 
auto/DBD/mysql/mysql.bundle
   Expected in: dynamic lookup

dyld: Symbol not found: _mysql_init
   Referenced from: /Library/Perl/5.8.8/darwin-thread-multi-2level/ 
auto/DBD/mysql/mysql.bundle
   Expected in: dynamic lookup

Trace/BPT trap

Any suggestions?  I apologize if this is the wrong place for posting  
such a question.

Thanks
Carl

Carl J. Schmidt
Department of Animal & Food Sciences
University of Delaware
Newark, DE 19716
schmidtc at udel.edu
http://copland.udel.edu/~schmidtc


From rvos at interchange.ubc.ca  Sat Aug 16 00:11:48 2008
From: rvos at interchange.ubc.ca (Rutger Vos)
Date: Fri, 15 Aug 2008 17:11:48 -0700
Subject: [Bioperl-l] Fwd:  lazy symbol binding
In-Reply-To: <2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com>
References: <770D6835-9BFA-40EE-BA9B-2009577D6371@udel.edu>
	<2bb9b24a0808151711q1e2b2703k56e4abe8ad4549ad@mail.gmail.com>
Message-ID: <2bb9b24a0808151711m5eba5500k468a0effc711f3c@mail.gmail.com>

---------- Forwarded message ----------
From: Rutger Vos <rutgeraldo at gmail.com>
Date: Fri, Aug 15, 2008 at 5:11 PM
Subject: Re: [Bioperl-l] lazy symbol binding
To: Carl Schmidt <schmidtc at udel.edu>


This looks like a faulty install of the combo of DBI/DBD::mysql

Specifically, the perl driver (DBD::mysql) for the mysql database
builds a dynamic library on installation, which the DBI interface
tries to load dynamically, but fails.

Unfortunately, this is by no means bioperl related. Try a reinstall as
per the instructions at:

http://search.cpan.org/~capttofu/DBD-mysql-4.008/lib/DBD/mysql/INSTALL.pod

On Fri, Aug 15, 2008 at 4:42 PM, Carl Schmidt <schmidtc at udel.edu> wrote:
> When I attempt  bp_load_gff.pl
> I get the following error:
>
> dyld: lazy symbol binding failed: Symbol not found: _mysql_init
>  Referenced from:
> /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle
>  Expected in: dynamic lookup
>
> dyld: Symbol not found: _mysql_init
>  Referenced from:
> /Library/Perl/5.8.8/darwin-thread-multi-2level/auto/DBD/mysql/mysql.bundle
>  Expected in: dynamic lookup
>
> Trace/BPT trap
>
> Any suggestions?  I apologize if this is the wrong place for posting such a
> question.
>
> Thanks
> Carl
>
> Carl J. Schmidt
> Department of Animal & Food Sciences
> University of Delaware
> Newark, DE 19716
> schmidtc at udel.edu
> http://copland.udel.edu/~schmidtc
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>


--
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


-- 
Dr. Rutger A. Vos
Department of zoology
University of British Columbia
http://www.nexml.org
http://rutgervos.blogspot.com


From hlapp at duke.edu  Sat Aug 16 17:43:46 2008
From: hlapp at duke.edu (Hilmar Lapp)
Date: Sat, 16 Aug 2008 13:43:46 -0400
Subject: [Bioperl-l] [Wg-phyloinformatics] Re: phyloXML weekly report
In-Reply-To: <3371D481-1416-4976-9846-83CF83395039@illinois.edu>
References: <C4CA9662.52BD%mirhan@indiana.edu>
	<3371D481-1416-4976-9846-83CF83395039@illinois.edu>
Message-ID: <D096DA37-D36A-4A3E-A6C3-98254CAC6B8B@duke.edu>


On Aug 15, 2008, at 10:12 AM, Chris Fields wrote:

> The method get_all_annotation_keys() is present in  
> AnnotationCollectionI but not get_all_Annotations()

Interesting. I wonder whether that was a result of the forward and  
reverse surgeries done to the Annotation* interfaces. (I'm off-line as  
I write this so can't check.)

> , though I doubt it is set up for recursive retrieval (something  
> that might be worth testing). I don't have a problem adding this  
> in.  Hilmar, thoughts?

get_all_Annotations() has always been recursive (similarly as  
get_all_SeqFeatures() is for SeqI). However, the recursive behavior is  
different from the behavior that Mira wants.

Specifically, if get_all_Annotations() finds a nested collection under  
a matching tag, it will consider the entire nested collection as  
match, and returns the recursively flattened out annotation objects it  
contains. What Mira needs (if I am understanding her implementation  
correctly) is recursively retrieving annotations if their tag matches  
the query key or set of keys. I.e., a nested collection would be  
searched for matching tags even if the tag of the collection itself  
does not match, and if it does match, only those of the contained  
annotations would be returned that have matching tags.

I'm not sure whether it's better to fold both behaviors into one  
method which has an optional argument to control which one is desired,  
or to have two methods. I'm leaning towards having two methods,  
because support for an added optional argument in external  
implementations of the interface is hard to test for, as opposed to  
testing for the presence/absence of a new method. On the other hand,  
if the existing method wasn't even on the interface to begin with  
(which I'm not yet convinced about) then that shouldn't really be an  
issue.

It seems Chris you are also for adding a second method (and putting  
get_all_Annotations() (back) into the interface)? Does anyone else  
have thoughts or preferences on this?

	-hilmar

>
>
> chris
>
> On Aug 15, 2008, at 1:11 AM, Han, Mira wrote:
>
>>
>> Hi,
>> I've fixed the get_deep_Annotations() to get_nested_Annotations()
>> It has arguments -keys and -recursive,
>> And behaves exactly like get_Annotations() when recursive is not  
>> set (tested by replacing the get_Annotations() in the Annotation.t)
>> I made it a new function instead of modifying get_Annotations()  
>> because I wasn't sure how to modify it to be backwards compatible.
>> I thought of adding the function to the interface  
>> AnnotationCollectionI,
>> But it seemed like get_all_Annotations() was missing from the  
>> interface as well,
>> So decided to ask if it should be added to the interface at all.
>> Isn't it possible that certain implementations of the interface has  
>> functions that are only specific to that implementation?
>>
>> Mira
>>
>>
>> On 8/12/08 12:00 AM, "Mira Han" <mirhan at indiana.edu> wrote:
>>
>>
>> Oh yes,
>> I meant get_Annotations,
>> I want a get_Annotations that is recursive and passes the keys to  
>> the recursive calls.
>>
>>
>>
>> On 8/11/08 11:54 PM, "Hilmar Lapp" <hlapp at duke.edu> wrote:
>>
>> Hi Mira -
>>
>> On Aug 11, 2008, at 11:46 PM, Han, Mira wrote:
>>
>>> Yes it is true that it's similar to get_all_Annotations, it's
>>> basically a recursive version of it.
>>
>>
>> I suppose you mean get_Annotations(), right? (get_all_Annotations()  
>> is
>> already recursive)
>>
>>       -hilmar
>> --
>> ===========================================================
>> : Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
>> ===========================================================
>>
>>
>>
>>
>>
>>
>>
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Marie-Claude Hofmann
> College of Veterinary Medicine
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:- hlapp at duke dot edu :
===========================================================


From heikki at sanbi.ac.za  Sun Aug 17 07:02:31 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 17 Aug 2008 09:02:31 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
References: <200808141514.49124.heikki@sanbi.ac.za>
	<A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
Message-ID: <200808170902.32485.heikki@sanbi.ac.za>


Done.

findnode_by_id() was not and is not used anywhere in BioPerl core code.

Thanks for input,

   -Heikki

On Saturday 16 August 2008 01:00:04 Jason Stajich wrote:
> I think D.
>
> There should probably only be one find_node function.
>
> findnode_by_id was written by Ramiro for the re-rooting code and I
> guess it wasn't checked to reduce unneeded functions.  I don't have
> any problems removing/deprecating it but will need to update the code
> that depends on it to use find_node properly.
>
> -jason
>
> On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:
> > A generic method for retrieving nodes from a Bio::Tree::TreeI
> > objects is
> > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the
> > 'id'
> > attribute unless a field is given. I can retrieve nodes based on
> > internal id
> > like this:
> >
> >   $tree->find_node(-internal_id => $internal_id);
> >
> > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that
> > retrieves by id.
> > However, the POD documentation claims that it retrieves by internal
> > id.
> >
> > What needs to be done?
> >
> > A. Fix the doc to speak about id
> > B. Fix to code to retrieve by internal_id
> > C. Fix the doc and create findnode_by_internal_id()
> > C. Remove findnode_by_id() as redundant and confusing
> > D. Deprecate findnode_by_id() as redundant and confusing
> >
> > There are no tests for findnode_by_id() which to me tilts selection
> > to D and A
> > for now.
> >
> > Any other opinions?
> >
> > 	-Heikki
> >
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From heikki at sanbi.ac.za  Sun Aug 17 07:02:31 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Sun, 17 Aug 2008 09:02:31 +0200
Subject: [Bioperl-l] TreeFunctionsI::findnode_by_id ?
In-Reply-To: <A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
References: <200808141514.49124.heikki@sanbi.ac.za>
	<A1F72168-A2B9-457F-BF13-25C201411F1F@bioperl.org>
Message-ID: <200808170902.32485.heikki@sanbi.ac.za>


Done.

findnode_by_id() was not and is not used anywhere in BioPerl core code.

Thanks for input,

   -Heikki

On Saturday 16 August 2008 01:00:04 Jason Stajich wrote:
> I think D.
>
> There should probably only be one find_node function.
>
> findnode_by_id was written by Ramiro for the re-rooting code and I
> guess it wasn't checked to reduce unneeded functions.  I don't have
> any problems removing/deprecating it but will need to update the code
> that depends on it to use find_node properly.
>
> -jason
>
> On Aug 14, 2008, at 6:14 AM, Heikki Lehvaslaiho wrote:
> > A generic method for retrieving nodes from a Bio::Tree::TreeI
> > objects is
> > Bio::Tree::TreeFunctionsI::find_node. It defaults to searching the
> > 'id'
> > attribute unless a field is given. I can retrieve nodes based on
> > internal id
> > like this:
> >
> >   $tree->find_node(-internal_id => $internal_id);
> >
> > I now found Bio::Tree::TreeFunctionsI::findnode_by_id() that
> > retrieves by id.
> > However, the POD documentation claims that it retrieves by internal
> > id.
> >
> > What needs to be done?
> >
> > A. Fix the doc to speak about id
> > B. Fix to code to retrieve by internal_id
> > C. Fix the doc and create findnode_by_internal_id()
> > C. Remove findnode_by_id() as redundant and confusing
> > D. Deprecate findnode_by_id() as redundant and confusing
> >
> > There are no tests for findnode_by_id() which to me tilts selection
> > to D and A
> > for now.
> >
> > Any other opinions?
> >
> > 	-Heikki
> >
> >
> > --
> > ______ _/      _/_____________________________________________________
> >      _/      _/
> >     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
> >    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
> >   _/  _/  _/  SANBI, South African National Bioinformatics Institute
> >  _/  _/  _/  University of Western Cape, South Africa
> >     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
> > ___ _/_/_/_/_/________________________________________________________
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
> Jason Stajich
> jason at bioperl.org
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From mike.thon at gmail.com  Mon Aug 18 05:01:43 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Mon, 18 Aug 2008 07:01:43 +0200
Subject: [Bioperl-l] Build.PL options?
Message-ID: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>

Hi all - I am trying to write a port of bioperl 1.5.2 to enable its  
installation with the macports system (www.macports.org).  I'm not too  
familiar with the Build.PL build system - is there any way to disable  
the dependency resolution that the build script does (i.e. without  
patching the script)?
Thanks
Mike


From David.Messina at sbc.su.se  Mon Aug 18 07:37:53 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Mon, 18 Aug 2008 09:37:53 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
Message-ID: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>

Hi Mike,

Great to hear you're planning add BioPerl to MacPorts. Thanks!

I'm afraid I don't know how to disable the dependency resolution, but while
you're waiting for others to chime in , here are a couple of things you
might try if you haven't already:

- The Build.PL for BioPerl is configured to automatically write out a
Makefile.PL for you. If you're more familiar with MakeFile.PL, then you
could work with that.

Module::Build tries to maintain a certain level of cross-compatibility with
ExtUtils::MakeMaker; how do you disable dependency resolution with the
Makefile.PL system?


- The Build.PL API is pretty flexible, and it's easy to change just about
any behavior by passing parameters to Build.PL on the command line. See


http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/API.pod

and


http://search.cpan.org/~kwilliams/Module-Build-0.2808/lib/Module/Build/Cookbook.pm


I'm curious -- it seems to me that a major purpose of Build.PL and
Makefile.PL is specifying dependencies (and installing them where
necessary). Does MacPorts override that and do its own dependency-checking
for Perl modules?


Dave


From neetisomaiya at gmail.com  Mon Aug 18 11:45:38 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 18 Aug 2008 17:15:38 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
Message-ID: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>

Hi,

I am fetching data from the ent gene file of KEGG which is available here :
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent

I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
gene names and pathways in which they participate. I am getting the gene
names fine. But this method

"for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
}"

does'nt seem to be working. I am not able to get the data of the pathways in
which the gene is involved.

Can someone please suggest how I can get the pathway data of genes from the
KEGG ent file??

Thanks.

-- 
-Neeti
Even my blood says, B positive


From neetisomaiya at gmail.com  Mon Aug 18 12:13:08 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Mon, 18 Aug 2008 17:43:08 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
Message-ID: <764978cf0808180513k75388671if72ca7913f6e7fc2@mail.gmail.com>

Hi,

I am fetching data from the ent gene file of KEGG which is available here :
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent

I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
gene names and pathways in which they participate. I am getting the gene
names fine. But this method

"for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
}"

does'nt seem to be working. I am not able to get the data of the pathways in
which the gene is involved.

Can someone please suggest how I can get the pathway data of genes from the
KEGG ent file??

Thanks.

-- 
-Neeti
Even my blood says, B positive


From johnsonm at gmail.com  Mon Aug 18 13:26:26 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 08:26:26 -0500
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
Message-ID: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>

On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya <neetisomaiya at gmail.com> wrote:

> I am fetching data from the ent gene file of KEGG which is available here :
> ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent
>
> I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
> gene names and pathways in which they participate. I am getting the gene
> names fine. But this method
>
> "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
> }"
>
> does'nt seem to be working. I am not able to get the data of the pathways in
> which the gene is involved.
>
> Can someone please suggest how I can get the pathway data of genes from the
> KEGG ent file??

What exactly do you mean by "doesn't seem to be working" and what
version of BioPerl are you using?  The code below seems to function as
expected with BioPerl 1.5.2, producing output like this:

hsa04612  Antigen processing and presentation
hsa01430  Cell Communication
hsa04020  Calcium signaling pathway
hsa04080  Neuroactive ligand-receptor interaction
hsa04540  Gap junction
...
...
...

#!/wherever/bin/perl

use strict;
use warnings;

use Bio::SeqIO;


my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]);

while (my $seq = $seqio->next_seq()) {

    foreach my $pathway ($seq->annotation->get_Annotations('pathway')) {

        ## $pathway should be a Bio::Annotation::Comment
        print $pathway->text(), "\n";

    }

}


From johnsonm at gmail.com  Mon Aug 18 13:29:38 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 08:29:38 -0500
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
	<ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
Message-ID: <ebf5eb170808180629j7813a2d7w2a9b238bd455b962@mail.gmail.com>

On Mon, Aug 18, 2008 at 8:26 AM, Mark Johnson <johnsonm at gmail.com> wrote:

> What exactly do you mean by "doesn't seem to be working" and what
> version of BioPerl are you using?  The code below seems to function as
> expected with BioPerl 1.5.2, producing output like this:

Note that I downloaded
ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent and
provided the path to it as the argument to the script.


From mike.thon at gmail.com  Mon Aug 18 19:48:02 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Mon, 18 Aug 2008 21:48:02 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
	<628aabb70808180037p24ec8bd9n960e6b7702dcc183@mail.gmail.com>
Message-ID: <5BE575B1-67E1-41B1-AB2F-728558377DDA@gmail.com>


On Aug 18, 2008, at 9:37 AM, Dave Messina wrote:

>
> I'm curious -- it seems to me that a major purpose of Build.PL and  
> Makefile.PL is specifying dependencies (and installing them where  
> necessary). Does MacPorts override that and do its own dependency- 
> checking for Perl modules?
>
Hi Dave - Thanks for the links- I will check them out.  MacPorts can  
resolve dependencies that are specified for each package, much like  
rpm and other linux packaging systems, so its probably better to  
disable the dependency resolution in the bioperl build script and let  
macports handle them.  It looks like I can patch the build.PL script  
pretty easily, unless I can find a better way.

Mike


From johnsonm at gmail.com  Mon Aug 18 20:53:48 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Mon, 18 Aug 2008 15:53:48 -0500
Subject: [Bioperl-l] Bio::Annotation issues with BioSQL
Message-ID: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>

    I'm presently refactoring an in-house protein annotation pipeline
and converting it to use BioSQL as a data store.  I've noticed some
slightly screwy behavior with regard to how some of the
Bio::Annotation classes are handled:

-Instances of Bio::Annotation::SimpleValue and
Bio::Annotation::StructuredValue attached to the annotation collection
for a sequence feature (Bio::SeqFeature::Generic) are converted to
tags/values on the feature.
-Instances of Bio::AnnotationDBLink with attached comments loose the comment.

    I'm storing and retrieving things thusly:

my $dbadp = Bio::DB::BioDB->new(
                                                   -database => 'biosql',
                                                   -user        => $user',
                                                   -pass       => $pass,
                                                   -dbname  => $ora_instance,
                                                   -driver      => 'Oracle'
                            );

my $adp = $dbadp->get_object_adaptor("Bio::SeqI");

my $seq = Bio::Seq->new(
                        -id                         => 'DEBUG001',
                        -accession_number => 'DBG001',
                        -desc                     => 'Debug Sequence',
                        -seq                      => 'GATTACA',
                        -namespace           => 'DEBUG',
                       );

my $feature = Bio::SeqFeature::Generic->new(
                                            -seq_id           => 'DEBUG001',
                                            -display_name => 'FEAT0001',
                                            -primary          => 'debug',
                                            -source           => 'test',
                                            -start              => 3,
                                            -end               => 5,
                                            -strand           => 1,
                                           );

my $dblink = Bio::Annotation::DBLink->new(

-database   => 'FAKE001',

-primary_id => 'FK1234567890'',

-comment    => 'This is a fake comment',
                                                              );

$feature->annotation->add_Annotation('ANNO0001, $dblink);
$seq->add_SeqFeature($feature);

my $pseq = $dbadp->create_persistent($seq);

$pseq->store();
$adp->commit();


my $dbadp = Bio::DB::BioDB->new(
                                                    ...
                                                   );

my $adp = $dbadp->get_object_adaptor("Bio::SeqI");

my $query = Bio::DB::Query::BioQuery->new();

$query->datacollections([
                         "Bio::PrimarySeqI s",
                        ]);

$query->where(["s.display_id like DEBUG%'"]);

my $result = $adp->find_by_query($query);

while (my $seq = $result->next_object()) {

    my @features = $seq->get_SeqFeatures();

    foreach my $feature (@features) {

        ## Contents of Bio::Annotation::SimpleValue and
Bio::Annotation::StructeredValue have
        ## migrated to tag/value pairs on $feature and are missing
from $annotation_collection.
        ##
        ## Comments have gone missing from Bio::Annotation::DBLink,
but DBLinks are otherwise intact and present.
        my $annotation_collection = $feature->annotation();
        ...
        ...

    }

}

    Is bioperl-db / BioSQL trying to tell me that I shouldn't be using
Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue?  Is
there even a place in the BioSQL schema for a comment to be attached
to a DBLink?


From neetisomaiya at gmail.com  Tue Aug 19 04:31:28 2008
From: neetisomaiya at gmail.com (neeti somaiya)
Date: Tue, 19 Aug 2008 10:01:28 +0530
Subject: [Bioperl-l] need help in parsing KEGG data
In-Reply-To: <ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
References: <764978cf0808180445l1a0899cbp3fff911690490d9f@mail.gmail.com>
	<ebf5eb170808180626u70028580k6ef546fe3d1566dc@mail.gmail.com>
Message-ID: <764978cf0808182131p620a2dedu40b651be50be5b3c@mail.gmail.com>

Thanks a lot for the reply. It was a problem of the bioperl version.

~Neeti.

On Mon, Aug 18, 2008 at 6:56 PM, Mark Johnson <johnsonm at gmail.com> wrote:

> On Mon, Aug 18, 2008 at 6:45 AM, neeti somaiya <neetisomaiya at gmail.com>
> wrote:
>
> > I am fetching data from the ent gene file of KEGG which is available here
> :
> > ftp://ftp.genome.jp/pub/kegg/genes/organisms/hsa/H.sapiens.ent
> >
> > I am using Bio::SeqIO with file format of type KEGG. I am trying to fetch
> > gene names and pathways in which they participate. I am getting the gene
> > names fine. But this method
> >
> > "for my $pathway ( $seq->annotation->get_Annotations('pathway') ){
> > }"
> >
> > does'nt seem to be working. I am not able to get the data of the pathways
> in
> > which the gene is involved.
> >
> > Can someone please suggest how I can get the pathway data of genes from
> the
> > KEGG ent file??
>
> What exactly do you mean by "doesn't seem to be working" and what
> version of BioPerl are you using?  The code below seems to function as
> expected with BioPerl 1.5.2, producing output like this:
>
> hsa04612  Antigen processing and presentation
> hsa01430  Cell Communication
> hsa04020  Calcium signaling pathway
> hsa04080  Neuroactive ligand-receptor interaction
> hsa04540  Gap junction
> ...
> ...
> ...
>
> #!/wherever/bin/perl
>
> use strict;
> use warnings;
>
> use Bio::SeqIO;
>
>
> my $seqio = Bio::SeqIO->new(-format => 'kegg', -file => $ARGV[0]);
>
> while (my $seq = $seqio->next_seq()) {
>
>    foreach my $pathway ($seq->annotation->get_Annotations('pathway')) {
>
>        ## $pathway should be a Bio::Annotation::Comment
>        print $pathway->text(), "\n";
>
>    }
>
> }
>


-- 
-Neeti
Even my blood says, B positive


From wgallin at ualberta.ca  Tue Aug 19 06:25:27 2008
From: wgallin at ualberta.ca (Warren Gallin)
Date: Tue, 19 Aug 2008 00:25:27 -0600
Subject: [Bioperl-l] EUtilities help
Message-ID: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>

Hi,

	Is there a cogent document on using Bio::DB::EUtilities with Bioperl  
1.5.2 around somewhere?

	We upgraded and now my scripts are broken when invoking it.

	Any pointers appreciated.

Thanks,

Warren Gallin


From David.Messina at sbc.su.se  Tue Aug 19 07:30:26 2008
From: David.Messina at sbc.su.se (Dave Messina)
Date: Tue, 19 Aug 2008 09:30:26 +0200
Subject: [Bioperl-l] EUtilities help
In-Reply-To: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
References: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
Message-ID: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>

Hi Warren,

Are you upgrading to 1.5.2 or downgrading from bioperl-live?

If the former, you might consider going all the way to bioperl-live, whose
EUtilities support is improved significantly and documented extensively
here:

http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook

If the latter, I don't believe there is, but Chris Fields will know for sure
and will probably chime in.


Dave

To get bioperl-live
as a tarball: http://www.bioperl.org/DIST/nightly_builds/
or via Subversion: http://www.bioperl.org/wiki/Using_Subversion


From bix at sendu.me.uk  Tue Aug 19 08:34:50 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Aug 2008 09:34:50 +0100
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
Message-ID: <48AA85AA.7010405@sendu.me.uk>

Michael Thon wrote:
> Hi all - I am trying to write a port of bioperl 1.5.2 to enable its 
> installation with the macports system (www.macports.org).  I'm not too 
> familiar with the Build.PL build system - is there any way to disable 
> the dependency resolution that the build script does (i.e. without 
> patching the script)?

How else will you be doing the dependency resolution? If your system 
just installs all dependencies itself beforehand, then dependency 
resolution won't be invoked in Build.PL :)

Otherwise, don't run Build.PL at all (why do you need to?), or have your 
system answer default to all questions:

echo | perl Build.PL


From mike.thon at gmail.com  Tue Aug 19 15:16:03 2008
From: mike.thon at gmail.com (Michael Thon)
Date: Tue, 19 Aug 2008 17:16:03 +0200
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <48AA85AA.7010405@sendu.me.uk>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>
	<48AA85AA.7010405@sendu.me.uk>
Message-ID: <E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>

> How else will you be doing the dependency resolution? If your system  
> just installs all dependencies itself beforehand, then dependency  
> resolution won't be invoked in Build.PL :)
> Otherwise, don't run Build.PL at all (why do you need to?),
Good point.  Now, why didn't I think of that? :)
Does Build.PL do anything other than copy Perl modules (and install  
dependencies)?
> or have your system answer default to all questions:
> echo | perl Build.PL
>


From downloadondemand at gmail.com  Thu Aug 14 23:06:04 2008
From: downloadondemand at gmail.com (N)
Date: Fri, 15 Aug 2008 02:06:04 +0300
Subject: [Bioperl-l] HOWTO:Graphics/BLAST output
Message-ID: <923c9ce30808141606k61d9cc23nb18e55dec5112ac4@mail.gmail.com>

Hello again!
Followed HOWTO:Graphics and encountered problem.
While parsing blast output i have clearly offending for me part of file. The
problem is that in this hit there are two HSPs, but the second one is in
"not right" strand orientation (Plus/Plus vs. Plus/Minus). How can i tell
bioperl to use only HSPs oriented along with the best-scored HSP? Or better,
althrough not related to this mailing list: How can i tell BLAST to put this
second HSP to a separate hit? What am i doing/thinking wrong?
Attached small png with problem. The second HSP is in white, but it is
biologically without sense, isn't it?
Thanks.


BLASTN 2.2.18 [Mar-02-2008]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.

Query= gi|145226209
         (471 letters)

Database: ../database/1000
           24 sequences; 19,482 total letters

Searching..................................................done


                                                                 Score    E
Sequences producing significant alignments:                      (bits)
Value

gi|145226176 hypothetical_protein                                     599
e-174
gi|145226174 hypothetical_protein                                      30
0.007
gi|145226175 ATP-dependent_exoDNAse_(exonuclease_V)_alpha_subuni...    26
0.11
gi|145226190 hypothetical_protein                                      24
0.43
gi|145226195 protein_of_unknown_function_DUF1526                       22
1.7
gi|145226193 hypothetical_protein                                      22
1.7
gi|145226187 aminoglycoside_nucleotidyltransferase                     22
1.7
gi|145226181 diguanylate_cyclase                                       22
1.7
gi|145226179 hypothetical_protein                                      22
1.7
gi|145226178 hypothetical_protein                                      22
1.7
gi|145226177 putative_methyl-accepting_chemotaxis_sensory_transd...    22
1.7

>gi|145226176 hypothetical_protein
          Length = 477

 Score =  599 bits (302), Expect = e-174
 Identities = 428/470 (91%)
 Strand = Plus / Plus


Query: 1   atgaatgcagacgtgtcagtcaaccagtggaatccgttagaggtagccgctgaggcgaca 60
           |||||||| ||||||||||||||||||||||||||||||||||| |||||||||||||||
Sbjct: 1   atgaatgcggacgtgtcagtcaaccagtggaatccgttagaggtggccgctgaggcgaca 60


Query: 61  atcgctgccgccacagccgcgctggtgtgggaaggcccagacagctacggggtgctggaa 120
           || ||||||||||| |||||  |||||||||||||||||||||||||||||||| ||||
Sbjct: 61  attgctgccgccacggccgcattggtgtgggaaggcccagacagctacggggtgttggag 120


Query: 121 cgggtcgccggggccacagcgaaaggcatagcaacagctcggatagccgccgaaatcatg 180
           ||||| ||||||||||||||||||||||| ||||||||||||||| ||||||||||||||
Sbjct: 121 cgggtagccggggccacagcgaaaggcatggcaacagctcggataaccgccgaaatcatg 180


Query: 181 gctgacgtcaccacctcagttcagttcactgcggccaccgaacatgcgcgcggcggcgct 240
           ||||||||||||||||||||||||||||||||||||  |||| |||||||||||||||||
Sbjct: 181 gctgacgtcaccacctcagttcagttcactgcggccgacgaagatgcgcgcggcggcgct 240


Query: 241 gtagcgggacttccggggtggctggcgccgcggtgggcggcgtccgtgcgtgccgcactg 300
           |||||||| ||||||||||||||||||||||||||||||||||||||||| | |||||||
Sbjct: 241 gtagcggggcttccggggtggctggcgccgcggtgggcggcgtccgtgcgcggcgcactg 300


Query: 301 gacgaactcgaagccgccgggcggcccggctacgccatggtcaaggcgatcacctggcct 360
           ||||| ||||||||||||||||| |||||| ||  |||| | || ||    ||  ||||
Sbjct: 301 gacgagctcgaagccgccgggcgccccggcgacatcatgatgaaagcccggacacggccg 360


Query: 361 gccttgcgcagcgtcgcggggtggacccaagacgggccgctgcaaacatggcagacggct 420
           ||  |||||||||| |||| ||||||||||||||| |||||||| |||||||||||||||
Sbjct: 361 gcactgcgcagcgtggcggtgtggacccaagacggaccgctgcagacatggcagacggct 420


Query: 421 ctaattgtgagcgaagcacggactgctctggctcaccgcgtaggcgtctg 470
           || || |||  |||||||||||||||||||||||||||||||||||||||
Sbjct: 421 ctgatcgtggacgaagcacggactgctctggctcaccgcgtaggcgtctg 470


 Score = 22.3 bits (11), Expect = 1.7
 Identities = 11/11 (100%)
 Strand = Plus / Minus


Query: 124 gtcgccggggc 134
           |||||||||||
Sbjct: 333 gtcgccggggc 323


The rest of output truncated...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: tst.png
Type: image/png
Size: 1668 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080815/24a3f555/attachment-0004.png>

From UKaraoz at lbl.gov  Fri Aug 15 00:03:51 2008
From: UKaraoz at lbl.gov (Ulas Karaoz)
Date: Thu, 14 Aug 2008 17:03:51 -0700
Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for
	blasttable
Message-ID: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>

Hi,

I found out that the save_output routine in RemoteBlast.pm doesn't  
save the output properly into a file when the Blast output is a  
hittable(blasttable). Might the reason be the fact that the tabular  
output has a line that starts with a # while the parser is looking for  
a line starting with just BLASTN, as in the section pasted below:

if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i ||
			 $l =~/^RPS-BLAST\s*.+$/i ) {
			$seentop=1;
}


From erant at psb.ugent.be  Fri Aug 15 12:25:59 2008
From: erant at psb.ugent.be (Erick Antezana)
Date: Fri, 15 Aug 2008 14:25:59 +0200
Subject: [Bioperl-l] [Obo-discuss] software developer resources, OBO API?
In-Reply-To: <1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>
References: <6caff30c0808140627ucdfc25cj7c11a7ffb255c06a@mail.gmail.com>
	<48A448DD.4000206@psb.ugent.be>
	<1CFC1BF0-7718-4641-82DB-C094E4C56A53@gmx.net>
Message-ID: <48A575D7.7010709@psb.ugent.be>

Hi Hilmar,

Hilmar Lapp wrote:
> Hi Erick,
>
> how did you determine that go-perl is specific to GO? I've found it to 
> work quite well for any kind of OBO-formatted ontology.
we have used go-perl during the gestation (2005) of the ontologies we 
built/handled/etc. In particular while dealing with GO (as it was 
originally conceived for that purpose). It is extremely useful indeed. 
However; as new extensions were needed by our team, new modules were 
developed on top of the existing ones and that code was actually 
inducing a new development branch which ended up in something called 
onto-perl. Onto-perl, as you might have realized, is strongly influenced 
by go-perl. On the other hand,  we had had some communications with 
Chris Mungall at that time, and he manifested he might drop further 
development on go-perl. Nevertheless, some time later he told me he will 
continue, which is good since many systems are based on it. But, by then 
we already had a sort of independent package which has shown to be useful.
>
> Also, you note that BioPerl doesn't have the ability to write in 
> certain formats, and to intersect and "unify" (would you mind 
> explaining what you mean by that?) ontologies.
While working with several OBO ontologies, we needed to have them (or 
part of them) merged, intersected, join (=get one ontology=unify). It 
can be of course a bit subjective..since you can "unify" ontologies 
based on different features/approaches/etc and while building 
application ontologies (such as CCO) you might be confronted to identify 
identical terms coming from different ontologies and get only one in 
your integrated resource....
> It seems that your implementation of RDF etc export isn't really 
> reusable or modular in any way, 
The exports (RDF, OWL, ...) are part of the Ontology module's 
functionality. have you had any particular problems while exporting an 
ontology? Please let us know so that it can be fixed or improved. On the 
other hand, that module is nowadays undergoing a deep improvement (not 
released yet) to accommodate a huge set of "ontologiz-ed" resources into 
an RDF repository. I would be also interested in discussing about it so 
that we could improve it.
> but I'd love to bring the intersection function over to BioPerl 
excellent! I think this is an appropriate time to make the diverse 
developments converge so that the users community could have a standard 
set of tools. We are  interested in following up these discussions.
> (BTW when you decided to roll your own ontology API, did you get the 
> impression that BioPerl isn't receptive to you adding to it?).
as I mentioned, we took originally go-perl since it offered much more 
functionalities that BioPerl::Ontology::*
> Would you mind pointing me to the place in the code where I would find 
> that, as I can't seem to find it.
I forwarded your request to the developer who might give you more 
details about it.
>
>     -hilmar
>


cheers,
Erick


From bix at sendu.me.uk  Tue Aug 19 15:56:32 2008
From: bix at sendu.me.uk (Sendu Bala)
Date: Tue, 19 Aug 2008 16:56:32 +0100
Subject: [Bioperl-l] Build.PL options?
In-Reply-To: <E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>
References: <C603705E-06A2-4F3F-8B0C-AEF53887D1DE@gmail.com>	<48AA85AA.7010405@sendu.me.uk>
	<E3CE5212-4A4D-4F26-8DFB-B75E96B44A1D@gmail.com>
Message-ID: <48AAED30.1090103@sendu.me.uk>

Michael Thon wrote:
>> How else will you be doing the dependency resolution? If your system 
>> just installs all dependencies itself beforehand, then dependency 
>> resolution won't be invoked in Build.PL :)
>> Otherwise, don't run Build.PL at all (why do you need to?),
>
> Good point.  Now, why didn't I think of that? :)
> Does Build.PL do anything other than copy Perl modules (and install 
> dependencies)?

It generates the 'Build' script, which provides a whole host of functions:

./Build help

The only one other than 'install' that might be relevant to you is

./Build test

to run the test suite, if testing is part of you own system. There are 
other standard ways to run the tests though, that don't need you to 
create the Build script.

Somewhere along the lines, probably during the install function, it also 
creates man files and other forms of documentation iirc, and installs 
those too.

If you handle dependencies, installation and testing (or don't do 
testing) yourself, and don't care about man files (perldoc is good 
enough?) you can ignore Build.PL.


From xxq.t.xu at gmail.com  Tue Aug 19 16:57:41 2008
From: xxq.t.xu at gmail.com (XQ Xu)
Date: Tue, 19 Aug 2008 09:57:41 -0700
Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined primer
Message-ID: <3fde82050808190957y271aa52eh30e39a438cc8a8e3@mail.gmail.com>

Hi all,
I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use
Primer3 to calculate Tm for some pre-defined primers; however there is no
direct way to calculate Tm with Primer3. I have to call Primer3 and supply a
pre-defined primer, a template, etc to let it run and hopefully Primer3
finds a pair of primers for me, then I have to open the output and find out
what the Tm is for my pre-defined primer. Do I miss any function that can do
this quickly for me?
I know there's another module (Bio::SeqFeature::Primer) can do this
quickly, but the Tm is calculated with different parameters; therefore it's
not good to use it while I use Primer3 to design primers.
Any input?
Thanks!
-Tony


From cjfields at illinois.edu  Tue Aug 19 17:39:03 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 19 Aug 2008 12:39:03 -0500
Subject: [Bioperl-l] EUtilities help
In-Reply-To: <628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>
References: <B0BBB0A9-56E0-469C-AC4D-838A1C185564@ualberta.ca>
	<628aabb70808190030w4061c218jf9cb75fb32786811@mail.gmail.com>
Message-ID: <E67F3EE3-7D3B-497A-926F-101A06D34956@illinois.edu>

As mentioned in previous posts and in the original POD, the original  
Bio::DB::EUtilities was experimental (unstable API).  It is deprecated  
in favor of the bioperl-live interface, which splits the user agent  
and parameter handling (Bio::DB::EUtilities, EUtilParameters) from the  
parsers (Bio::Tools::EUtilities).  Unfortunately, the original design  
was too rushed so any XML changes broke the tools; I basically had to  
start from the ground up again.  Any changes to eutil output should  
now be easier to deal with (famous last words).

I am still planning on adding a few things to it (including tests for  
the parser and user agent) but it shouldn't change substantially from  
what is in the cookbook.

chris

On Aug 19, 2008, at 2:30 AM, Dave Messina wrote:

> Hi Warren,
>
> Are you upgrading to 1.5.2 or downgrading from bioperl-live?
>
> If the former, you might consider going all the way to bioperl-live,  
> whose
> EUtilities support is improved significantly and documented  
> extensively
> here:
>
> http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook
>
> If the latter, I don't believe there is, but Chris Fields will know  
> for sure
> and will probably chime in.
>
>
> Dave
>
> To get bioperl-live
> as a tarball: http://www.bioperl.org/DIST/nightly_builds/
> or via Subversion: http://www.bioperl.org/wiki/Using_Subversion
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From cjfields at illinois.edu  Tue Aug 19 18:00:42 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Tue, 19 Aug 2008 13:00:42 -0500
Subject: [Bioperl-l] RemoteBlast's save_output not saving properly for
	blasttable
In-Reply-To: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>
References: <E8353FBC-9FEC-49CA-ACCE-39FA46A8F57F@lbl.gov>
Message-ID: <4B8B416E-2DA7-4B1E-9DE2-E1301B467637@illinois.edu>

Saving tabular BLAST to a file is working for me using bioperl-live.   
NCBI recently changed tabular BLAST output which broke parsing (there  
is an extra column now, can't remember what), but it is now fixed.

chris

On Aug 14, 2008, at 7:03 PM, Ulas Karaoz wrote:

> Hi,
>
> I found out that the save_output routine in RemoteBlast.pm doesn't  
> save the output properly into a file when the Blast output is a  
> hittable(blasttable). Might the reason be the fact that the tabular  
> output has a line that starts with a # while the parser is looking  
> for a line starting with just BLASTN, as in the section pasted below:
>
> if( $l =~ /^(?:[T]?BLAST[NPX])\s*.+$/i ||
> 			 $l =~/^RPS-BLAST\s*.+$/i ) {
> 			$seentop=1;
> }
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From hlapp at gmx.net  Tue Aug 19 17:56:42 2008
From: hlapp at gmx.net (Hilmar Lapp)
Date: Tue, 19 Aug 2008 13:56:42 -0400
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
Message-ID: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>


On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote:

>    I'm presently refactoring an in-house protein annotation pipeline
> and converting it to use BioSQL as a data store.  I've noticed some
> slightly screwy behavior with regard to how some of the
> Bio::Annotation classes are handled:
>
> -Instances of Bio::Annotation::SimpleValue and
> Bio::Annotation::StructuredValue attached to the annotation collection
> for a sequence feature (Bio::SeqFeature::Generic) are converted to
> tags/values on the feature.
>
> -Instances of Bio::Annotation::DBLink with attached comments loose  
> the comment.
> [...]
> $query->where(["s.display_id like DEBUG%'"]);

There's a single quote missing here, but I'm assuming that's a result  
of copy/paste editing?

> [...]
>    Is bioperl-db / BioSQL trying to tell me that I shouldn't be using
> Bio::Annotation::SimpleValue and Bio::Annotation::StructuredValue?

Your example code doesn't contain an example for where you are getting  
the B::A::StructuredValue object from. If you didn't create that  
yourself, it would be good to know what you did to end up with that.  
Chris Fields has written B::A::Tagtree which would be way forward, and  
if you created the object yourself, can you take a look at that and  
see whether that class wouldn't serve your purpose as well or even  
better?

In order to be stored in BioSQL structured (hierarchical, nested)  
annotation is flattened into a string representation, because BioSQL  
can't store nested annotation collections natively. Right now if I am  
not mistaken upon retrieval this is not converted back into a  
B::A::Tagtree object but rather left flat. This is being worked on  
though, we've just discussed some issues connected with that.

I could make B::A::StructuredValue work the same way, but I'm not sure  
what it provides that B::A::Tagtree doesn't. The latter uses  
Data::Stag under the hood, which is much cleaner, and more extensible  
in the future.

As for SimpleValue annotation versus tag/value annotation for  
seqfeatures, yes right now these are treated interchangeably for the  
purposes of BioSQL and Bioperl-db. You can do this easily too on your  
end by using Bio::SeqFeature::AnnotationAdaptor.

> Is there even a place in the BioSQL schema for a comment to be  
> attached
> to a DBLink?

No there isn't. I thought it is but it turns out that this isn't yet  
one of the desirable extensions to BioSQL from 1.1.x onwards, as  
documented on the wiki:

http://www.biosql.org/wiki/Enhancement_Requests

I'll add it (but feel free to do so yourself, especially if you have  
other enhancmenets).

	-hilmar
-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================


From johnsonm at gmail.com  Wed Aug 20 18:43:25 2008
From: johnsonm at gmail.com (Mark Johnson)
Date: Wed, 20 Aug 2008 13:43:25 -0500
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
	<2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
Message-ID: <ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>

On Tue, Aug 19, 2008 at 12:56 PM, Hilmar Lapp <hlapp at gmx.net> wrote:
> On Aug 18, 2008, at 4:53 PM, Mark Johnson wrote:
> There's a single quote missing here, but I'm assuming that's a result of
> copy/paste editing?

Yes, I was a bit sloppy with the example.

> Your example code doesn't contain an example for where you are getting the
> B::A::StructuredValue object from. If you didn't create that yourself, it
> would be good to know what you did to end up with that. Chris Fields has
> written B::A::Tagtree which would be way forward, and if you created the
> object yourself, can you take a look at that and see whether that class
> wouldn't serve your purpose as well or even better?

I created the B::A::StructuredValue myself.  I'm using it to store the
output from PSORTb, which gives a cellular localization and a score
for a protein sequence (gene), which I'm trying to keep paired
together, if possible.  I'll take a look at B::A::Tagtree, that's
probably a better fit.

> In order to be stored in BioSQL structured (hierarchical, nested) annotation
> is flattened into a string representation, because BioSQL can't store nested
> annotation collections natively. Right now if I am not mistaken upon
> retrieval this is not converted back into a B::A::Tagtree object but rather
> left flat. This is being worked on though, we've just discussed some issues
> connected with that.

The data I have isn't really deeply nested.  I just like to keep
related annotation in one object, if possible.

> I could make B::A::StructuredValue work the same way, but I'm not sure what
> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag under the
> hood, which is much cleaner, and more extensible in the future.

Perhaps B::A::StructuredValue should be deprecated?

> As for SimpleValue annotation versus tag/value annotation for seqfeatures,
> yes right now these are treated interchangeably for the purposes of BioSQL
> and Bioperl-db. You can do this easily too on your end by using
> Bio::SeqFeature::AnnotationAdaptor.

I'll check out the AnnotationAdaptor, but I'll probably just end using
seqfeature tags/values.  They're functionally equivalent to
B::A::SimpleValue.

>> Is there even a place in the BioSQL schema for a comment to be attached
>> to a DBLink?
>
> No there isn't. I thought it is but it turns out that this isn't yet one of
> the desirable extensions to BioSQL from 1.1.x onwards, as documented on the
> wiki:
>
> http://www.biosql.org/wiki/Enhancement_Requests
>
> I'll add it (but feel free to do so yourself, especially if you have other
> enhancmenets).

I'll take a look at the wiki....I'll file that as a feature request if
I get there before you do it.


From cjfields at illinois.edu  Wed Aug 20 20:25:55 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Wed, 20 Aug 2008 15:25:55 -0500
Subject: [Bioperl-l] [BioSQL-l] Bio::Annotation issues with BioSQL
In-Reply-To: <ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>
References: <ebf5eb170808181353s38144ec5o7f069cc09809af15@mail.gmail.com>
	<2A06BA0B-100B-4A5E-8425-C6FEF6AD0C75@gmx.net>
	<ebf5eb170808201143i43afe8c8u394a3ec4f5d47aa8@mail.gmail.com>
Message-ID: <9872D07D-61AB-4F0A-A477-35AA87ABF72E@illinois.edu>


On Aug 20, 2008, at 1:43 PM, Mark Johnson wrote:

> ...
>
>> I could make B::A::StructuredValue work the same way, but I'm not  
>> sure what
>> it provides that B::A::Tagtree doesn't. The latter uses Data::Stag  
>> under the
>> hood, which is much cleaner, and more extensible in the future.
>
> Perhaps B::A::StructuredValue should be deprecated?

Probably.  The only place it was used in core was SeqIO::swiss (and  
now that uses Tagtree in bioperl-live).

Let me know if you have any problems with Bio::Annotation::Tagtree.  I  
am planning on doing some more work with it soon.

chris


From cjfields at illinois.edu  Thu Aug 21 14:26:22 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Aug 2008 09:26:22 -0500
Subject: [Bioperl-l] Annotations issue (GenBank)
Message-ID: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>

I'm working on a GenBank patch and noticed a few cases where  
annotations are being stored as all uppercase strings (CONTIG, WGS,  
etc).  I'm planning on converting these to lowercase (e.g. 'wgs',  
'contig') for consistency with other annotation tag values.  I'm  
making sure output is consistent as well.

These are used fairly infrequently so I don't think it should cause  
problems, but just in case, does anyone have a problem with this change?

chris

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign


From awitney at sgul.ac.uk  Thu Aug 21 14:41:10 2008
From: awitney at sgul.ac.uk (Adam Witney)
Date: Thu, 21 Aug 2008 15:41:10 +0100
Subject: [Bioperl-l] adding HSP information to BLAST output graphic
	(Bio::Graphics)
Message-ID: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>


Hi,

I am going through the Bio::Graphics HOWTO on the wiki.

Looking at render_blast4.pl, the description text describes the whole  
hit and is set for the whole track, but i would like to be able to add  
HSP information such as the identity matches onto the picture, this is  
stored in the $hsp object. How would i go about adding that to the  
picture?

the relevant piece of code is:

       my $track = $panel->add_track(
                                     -glyph       => 'graded_segments',
                                     -label       => 1,
                                     -connector   => 'dashed',
                                     -bgcolor     => 'blue',
                                     -font2color  => 'red',
                                     -sort_order  => 'high_score',
                                     -description => sub {
                                       my $feature = shift;
                                       return unless $feature- 
 >has_tag('description');
                                       my ($description) = $feature- 
 >each_tag_value('description');
                                       my $score = $feature->score;
                                       "$description, score=$score";
                           #            "score=$score";
                                      },
                                    );

       next unless $hit->significance < 1E-20;

       my $feature = Bio::SeqFeature::Generic->new(
                                                   -score        =>  
$hit->raw_score,
                                                   -display_name =>  
$hit->name,
                                                   -tag          => {
                                                                      
description => $hit->description
                                                                    },
                                                  );

       while( my $hsp = $hit->next_hsp )
         {
          $feature->add_sub_SeqFeature($hsp,'EXPAND');
         }

       $track->add_feature($feature);

thanks for any help

adam


From cjfields at illinois.edu  Thu Aug 21 16:01:11 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Thu, 21 Aug 2008 11:01:11 -0500
Subject: [Bioperl-l] Annotations issue (GenBank)
In-Reply-To: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>
References: <2E286949-0824-4458-A217-A6D94F6DD409@illinois.edu>
Message-ID: <0ACE20F0-43E5-4E12-9437-094871766083@illinois.edu>

I went ahead and committed this; if there are any disagreements about  
it I can back it out or modify as needed.

chris

On Aug 21, 2008, at 9:26 AM, Chris Fields wrote:

> I'm working on a GenBank patch and noticed a few cases where  
> annotations are being stored as all uppercase strings (CONTIG, WGS,  
> etc).  I'm planning on converting these to lowercase (e.g. 'wgs',  
> 'contig') for consistency with other annotation tag values.  I'm  
> making sure output is consistent as well.
>
> These are used fairly infrequently so I don't think it should cause  
> problems, but just in case, does anyone have a problem with this  
> change?
>
> chris


From mshafiullah at mail.unomaha.edu  Thu Aug 21 19:35:54 2008
From: mshafiullah at mail.unomaha.edu (Mohammad Shafiullah)
Date: Thu, 21 Aug 2008 14:35:54 -0500
Subject: [Bioperl-l] bioperl-network test error
Message-ID: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>

To whom it may concern: 

Encountered the following error while running ./Build test on 
bioperl-network-1.5.2_100

Can't stat scripts: No such file or directory
 at /usr/share/perl5/Module/Build/Base.pm line 3836
t/Edge...........ok 
t/Graph-MD5......1/19 Not an ARRAY reference at 
/usr/share/perl5/Heap/Elem.pm line 31.
t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
 All 19 subtests passed 
t/Graph-Seq......1/16 Not an ARRAY reference at 
/usr/share/perl5/Heap/Elem.pm line 31.
t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
 All 16 subtests passed 
t/IO_dip_tab.....ok 
t/IO_psi.........ok 
t/Interaction....ok 
t/Node...........ok 
t/ProteinNet.....ok 

Test Summary Report
-------------------
t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
  Non-zero exit status: 9
t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
  Non-zero exit status: 255
Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33 cusr 
0.37 csys =  1.72 CPU)
Result: FAIL
Failed 2/8 test programs. 0/292 subtests failed.

Please advise on the issue. 

Sincerely,

- Mohammad


From bosborne11 at verizon.net  Thu Aug 21 20:43:35 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 21 Aug 2008 16:43:35 -0400
Subject: [Bioperl-l] bioperl-network test error
In-Reply-To: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
References: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
Message-ID: <83EB2D5B-37F2-4163-8A14-F93ECE740197@verizon.net>

Mohammed,

Try replacing the ModuleBuildBioperl.pm file that you have with the  
one that's attached. Then do this inside the bioperl-network-1.5.2_100  
directory:

./Build clean
perl Build.PL
./Build test

What do you see? Also, what's your version of Perl?


Brian O.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: ModuleBuildBioperl.pm
Type: text/x-perl-script
Size: 40360 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20080821/17d19dd5/attachment-0004.bin>
-------------- next part --------------
:


On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote:

> To whom it may concern:
>
> Encountered the following error while running ./Build test on
> bioperl-network-1.5.2_100
>
> Can't stat scripts: No such file or directory
> at /usr/share/perl5/Module/Build/Base.pm line 3836
> t/Edge...........ok
> t/Graph-MD5......1/19 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
> All 19 subtests passed
> t/Graph-Seq......1/16 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
> All 16 subtests passed
> t/IO_dip_tab.....ok
> t/IO_psi.........ok
> t/Interaction....ok
> t/Node...........ok
> t/ProteinNet.....ok
>
> Test Summary Report
> -------------------
> t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
>  Non-zero exit status: 9
> t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
>  Non-zero exit status: 255
> Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33  
> cusr
> 0.37 csys =  1.72 CPU)
> Result: FAIL
> Failed 2/8 test programs. 0/292 subtests failed.
>
> Please advise on the issue.
>
> Sincerely,
>
> - Mohammad
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From bosborne11 at verizon.net  Thu Aug 21 21:57:17 2008
From: bosborne11 at verizon.net (Brian Osborne)
Date: Thu, 21 Aug 2008 17:57:17 -0400
Subject: [Bioperl-l] bioperl-network test error
In-Reply-To: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
References: <OF7CE4AAA8.EF9CD1EE-ON862574AC.006B247A-862574AC.006BA2F3@UNOMAHA.EDU>
Message-ID: <AC71DAAF-C324-4D6E-8D56-EE19B1957A27@verizon.net>

Mohammad,

Take a look at this:

http://coding.derkeiler.com/Archive/Perl/comp.lang.perl.misc/2007-06/msg00365.html

It discusses that Heap::Elem error when using Graph.

Brian O.


On Aug 21, 2008, at 3:35 PM, Mohammad Shafiullah wrote:

> To whom it may concern:
>
> Encountered the following error while running ./Build test on
> bioperl-network-1.5.2_100
>
> Can't stat scripts: No such file or directory
> at /usr/share/perl5/Module/Build/Base.pm line 3836
> t/Edge...........ok
> t/Graph-MD5......1/19 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-MD5...... Dubious, test returned 9 (wstat 2304, 0x900)
> All 19 subtests passed
> t/Graph-Seq......1/16 Not an ARRAY reference at
> /usr/share/perl5/Heap/Elem.pm line 31.
> t/Graph-Seq...... Dubious, test returned 255 (wstat 65280, 0xff00)
> All 16 subtests passed
> t/IO_dip_tab.....ok
> t/IO_psi.........ok
> t/Interaction....ok
> t/Node...........ok
> t/ProteinNet.....ok
>
> Test Summary Report
> -------------------
> t/Graph-MD5  (Wstat: 2304 Tests: 19 Failed: 0)
>  Non-zero exit status: 9
> t/Graph-Seq  (Wstat: 65280 Tests: 16 Failed: 0)
>  Non-zero exit status: 255
> Files=8, Tests=292,  2 wallclock secs ( 0.00 usr  0.02 sys +  1.33  
> cusr
> 0.37 csys =  1.72 CPU)
> Result: FAIL
> Failed 2/8 test programs. 0/292 subtests failed.
>
> Please advise on the issue.
>
> Sincerely,
>
> - Mohammad
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l


From dominic at bioinf.uni-leipzig.de  Fri Aug 22 13:35:21 2008
From: dominic at bioinf.uni-leipzig.de (Dominic Rose)
Date: Fri, 22 Aug 2008 15:35:21 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
Message-ID: <48AEC099.20106@bioinf.uni-leipzig.de>

Hi,

just a short suggestion to improve the code:

in function

sub _build_nt_matrix()

one finds the following lines:

         my $ti_index = $NucleotideIndexes{$ti};
         my $tj_index = $NucleotideIndexes{$tj};

         if( ! defined $ti_index ) {
             print "ti_index not defined for $ti\n";
             next;
         }

However, it should be possible to stop/silence the printing of that 
error message. Many alignments contain N's what causes many many
"ti_index not defined for N" messages. That should be avoidable.

Thanks,
Dominic

-- 
Dominic Rose
Professur f?r Bioinformatik
Institut f?r Informatik
Universit?t Leipzig
H?rtelstr. 16-18
D-04107 Leipzig
WWW    http://www.bioinf.uni-leipzig.de
Phone: +49 341 97-16698
Fax:   +49 341 97-16679


From heikki at sanbi.ac.za  Wed Aug 27 06:23:39 2008
From: heikki at sanbi.ac.za (Heikki Lehvaslaiho)
Date: Wed, 27 Aug 2008 08:23:39 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
In-Reply-To: <48AEC099.20106@bioinf.uni-leipzig.de>
References: <48AEC099.20106@bioinf.uni-leipzig.de>
Message-ID: <200808270823.39852.heikki@sanbi.ac.za>

Dominic,

You are absolutely right. 

I've changed 'print' into '$self->warn' in the SVN. Now it is possible to set 
$object->verbose(-1) to silence the warning or, if deemed necessary, set 
$object->verbose(2) and catch the error with an eval statement.

Thanks for reporting this,

    -Heikki

On Friday 22 August 2008 15:35:21 Dominic Rose wrote:
> Hi,
>
> just a short suggestion to improve the code:
>
> in function
>
> sub _build_nt_matrix()
>
> one finds the following lines:
>
>          my $ti_index = $NucleotideIndexes{$ti};
>          my $tj_index = $NucleotideIndexes{$tj};
>
>          if( ! defined $ti_index ) {
>              print "ti_index not defined for $ti\n";
>              next;
>          }
>
> However, it should be possible to stop/silence the printing of that
> error message. Many alignments contain N's what causes many many
> "ti_index not defined for N" messages. That should be avoidable.
>
> Thanks,
> Dominic

-- 
______ _/      _/_____________________________________________________
      _/      _/
     _/  _/  _/  Heikki Lehvaslaiho    heikki at_sanbi _ac _za
    _/_/_/_/_/  Senior Scientist    skype: heikki_lehvaslaiho
   _/  _/  _/  SANBI, South African National Bioinformatics Institute
  _/  _/  _/  University of Western Cape, South Africa
     _/      Phone: +27 21 959 2096   FAX: +27 21 959 2512
___ _/_/_/_/_/________________________________________________________


From giles.weaver at googlemail.com  Wed Aug 27 10:39:19 2008
From: giles.weaver at googlemail.com (Giles Weaver)
Date: Wed, 27 Aug 2008 11:39:19 +0100
Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined
	primer
In-Reply-To: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com>
References: <1d06cd5d0808220229x6bd6feaasa00f6a5fd7241c8c@mail.gmail.com>
Message-ID: <1d06cd5d0808270339p4432f91bo72f48364696caf5a@mail.gmail.com>

Hi Tony,

It isn't well documented, but Primer3 includes a program called oligotm,
which is used to calculate the Tm of short sequences (up to 32bp). You can
run it directly by typing something like "oligotm ACGTACGTACGTACGT" in the
terminal. Just typing oligotm will give you the options.

If you are using Linux, these snippets of code may help you call oligotm
from within a perl script:

use IPC::Open3;

sub _run_oligotm
{
    my ($class, $sequence) = @_;

    my $run = "oligotm -tp 1 -sc 1 $sequence";
    my $pid = open3(\*WTRFH, \*RDRFH, \*ERRFH, $run);
    close (WTRFH);

    my ($tm, $errors);
    while (<RDRFH>) { $tm .= $_;}
    while (<ERRFH>) { $errors .= $_;}
    chomp $tm;
    return ($tm, $errors);
}

You'll need to put this in a package or edit out the $class bit for it to
work.

This is my first post to this list. I'm receiving the digest so replying to
posts is a bit of a faff. Can anyone recommend a better way of replying to
posts than replying to the digest, editing it and pasting the subject into
the subject field?

Giles Weaver
Unilever R&D


>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 19 Aug 2008 09:57:41 -0700
> From: "XQ Xu" <xxq.t.xu at gmail.com>
> Subject: [Bioperl-l] Bioperl Primer3 Tm calculation of a pre-defined
>        primer
> To: bioperl-l at lists.open-bio.org
> Message-ID:
>        <3fde82050808190957y271aa52eh30e39a438cc8a8e3 at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi all,
> I'm using Primer3 to design primers (Bio::Tools::Primer3). I also need use
> Primer3 to calculate Tm for some pre-defined primers; however there is no
> direct way to calculate Tm with Primer3. I have to call Primer3 and supply
> a
> pre-defined primer, a template, etc to let it run and hopefully Primer3
> finds a pair of primers for me, then I have to open the output and find out
> what the Tm is for my pre-defined primer. Do I miss any function that can
> do
> this quickly for me?
> I know there's another module (Bio::SeqFeature::Primer) can do this
> quickly, but the Tm is calculated with different parameters; therefore it's
> not good to use it while I use Primer3 to design primers.
> Any input?
> Thanks!
> -Tony
>


From awitney at sgul.ac.uk  Wed Aug 27 16:21:12 2008
From: awitney at sgul.ac.uk (Adam Witney)
Date: Wed, 27 Aug 2008 17:21:12 +0100
Subject: [Bioperl-l] adding HSP information to BLAST output graphic
	(Bio::Graphics)
In-Reply-To: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>
References: <07295109-85BA-4C23-9699-9904EC9E3E1B@sgul.ac.uk>
Message-ID: <8939216F-7AFC-40BB-A416-8508E3E5D871@sgul.ac.uk>


after some digging around myself, this seems to do the trick, although  
i don't know if it will always work, as i am having to add a separate  
sort function.

On 21 Aug 2008, at 15:41, Adam Witney wrote:

>
> Hi,
>
> I am going through the Bio::Graphics HOWTO on the wiki.
>
> Looking at render_blast4.pl, the description text describes the  
> whole hit and is set for the whole track, but i would like to be  
> able to add HSP information such as the identity matches onto the  
> picture, this is stored in the $hsp object. How would i go about  
> adding that to the picture?
>
> the relevant piece of code is:
>
>      my $track = $panel->add_track(
>                                    -glyph       => 'graded_segments',
>                                    -label       => 1,
>                                    -connector   => 'dashed',
>                                    -bgcolor     => 'blue',
>                                    -font2color  => 'red',
>                                    -sort_order  => 'high_score',

                                     -part_labels => sub {
                                       my ($feature,undef,$partno) = @_;
                                       my @features =  
sort_features($feature->get_SeqFeatures());
                                       return $features[$partno]- 
 >num_identical.'/'.$features[$partno]->length.' (score='. 
$features[$partno]->score.')' if $features[$partno];
                                      },

>                                    -description => sub {
>                                      my $feature = shift;
>                                      return unless $feature- 
> >has_tag('description');
>                                      my ($description) = $feature- 
> >each_tag_value('description');
>                                      my $score = $feature->score;
>                                      "$description, score=$score";
>                          #            "score=$score";
>                                     },
>                                   );


sub sort_features {
   my @array = @_;

   if(@array < 2){return @array}

   my @sorted = sort {$a->start <=> $b->start} @array;

   return @sorted;
}


is this the best way to achieve this?

thanks

adam


From mauricio at open-bio.org  Thu Aug 28 17:43:04 2008
From: mauricio at open-bio.org (Mauricio Herrera Cuadra)
Date: Thu, 28 Aug 2008 12:43:04 -0500
Subject: [Bioperl-l] Pdoc updates
Message-ID: <48B6E3A8.3050305@open-bio.org>

For those who use the online Pdoc documentation (doc.bioperl.org), this 
is to let you know that the CvsWeb links at the top of any module page 
now link to the proper place in the SVN web interface.

Cheers,
Mauricio.


From jaudall at gmail.com  Fri Aug 29 06:46:20 2008
From: jaudall at gmail.com (Joshua Udall)
Date: Thu, 28 Aug 2008 23:46:20 -0700
Subject: [Bioperl-l] DB_File and assembly IO
Message-ID: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>

Bioperl -

I'm trying to read/parse a single cap3 ace file with several thousand
contigs.  I get a DB_File error at Contig247.  Here's the error:

------------- EXCEPTION -------------
MSG: Unable to tie DB_File handle
STACK Bio::SeqFeature::Collection::new
/Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
STACK Bio::Assembly::Contig::new
/Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
STACK Bio::Assembly::IO::ace::next_assembly
/Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
STACK toplevel /Users/jaudall/bin/read_ace.pl:214
-------------------------------------

Looking at the Collection::new, the error is on the middle line:

  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
$self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die "Cannot open
file: $!\n" ;
  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
  return $self;

If I uncomment out the $! die statement that I inserted, I get this:

'Cannot open file tree: Too many open files'

Apparently the Collection constructor is creating a new index file for each
one and the handles for each are sticking around?  That confuses me because
reading more about the Collection.pm and DB_File, it appeared to me that no
files were written by default (as I'm doing), rather the Collection objects
are all stored in memory.  I'm pretty sure the error is not a permission
error, and if it is not the open file-handles, what else should I look for?


If I 'warn' the error instead of throwing it, I get:

Can't call method "get_dup" on an undefined value at
/Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360

This kind of makes sense because the index appears not be be created and it
can't look stuff up in an undefined tied hash.  I'm stuck.

Thanks for any help and suggestions.

OSX, perl 5.8.8, bioperl-live (svn last week)

-- 
Joshua Udall
Assistant Professor
295 WIDB
Plant and Wildlife Science Dept.
Brigham Young University
Provo, UT 84602
801-422-9307
Fax: 801-422-0008
USA


From florent.angly at gmail.com  Fri Aug 29 08:40:25 2008
From: florent.angly at gmail.com (Florent Angly)
Date: Fri, 29 Aug 2008 18:40:25 +1000
Subject: [Bioperl-l] DB_File and assembly IO
In-Reply-To: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
Message-ID: <48B7B5F9.1050608@gmail.com>

Hi Joshua,

I don't know the specifics of DB_File, but the 'Cannot open file tree: 
Too many open files' is pretty explicit.
If you're on Unix/Linux you can check the files that are open by your 
program by typing:
    lsof | grep name_of_program
There is probably a filehandle that in not closed somewhere in your code 
or the BioPerl code.
Best,

Florent


Joshua Udall wrote:
> Bioperl -
>
> I'm trying to read/parse a single cap3 ace file with several thousand
> contigs.  I get a DB_File error at Contig247.  Here's the error:
>
> ------------- EXCEPTION -------------
> MSG: Unable to tie DB_File handle
> STACK Bio::SeqFeature::Collection::new
> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
> STACK Bio::Assembly::Contig::new
> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
> STACK Bio::Assembly::IO::ace::next_assembly
> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
> -------------------------------------
>
> Looking at the Collection::new, the error is on the middle line:
>
>   $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die "Cannot open
> file: $!\n" ;
>   $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>   return $self;
>
> If I uncomment out the $! die statement that I inserted, I get this:
>
> 'Cannot open file tree: Too many open files'
>
> Apparently the Collection constructor is creating a new index file for each
> one and the handles for each are sticking around?  That confuses me because
> reading more about the Collection.pm and DB_File, it appeared to me that no
> files were written by default (as I'm doing), rather the Collection objects
> are all stored in memory.  I'm pretty sure the error is not a permission
> error, and if it is not the open file-handles, what else should I look for?
>
>
> If I 'warn' the error instead of throwing it, I get:
>
> Can't call method "get_dup" on an undefined value at
> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm line 360
>
> This kind of makes sense because the index appears not be be created and it
> can't look stuff up in an undefined tied hash.  I'm stuck.
>
> Thanks for any help and suggestions.
>
> OSX, perl 5.8.8, bioperl-live (svn last week)
>
>   


From cjfields at illinois.edu  Fri Aug 29 14:30:49 2008
From: cjfields at illinois.edu (Chris Fields)
Date: Fri, 29 Aug 2008 09:30:49 -0500
Subject: [Bioperl-l] DB_File and assembly IO
In-Reply-To: <48B7B5F9.1050608@gmail.com>
References: <52cea20c0808282346y177ba011x446d586079929f17@mail.gmail.com>
	<48B7B5F9.1050608@gmail.com>
Message-ID: <5717CE96-EC24-46D9-A922-88702B1647A1@illinois.edu>

This is a known problem with Bio::Assembly and stems from having a  
DB_File tied (opened) for each Bio::Assembly::Contig (via a retained  
Bio::SeqFeature::Collection).  You can extend the number of open  
filehandles on UNIX'y flavors using ulimit (see following link), but  
I'm not sure about Win32.

http://bugzilla.open-bio.org/show_bug.cgi?id=2320

The general bug is reproducible using the following simple script.  If  
needed adjust the range end in the for loop to exceed the ulimit (via  
'ulimit -n);  Mac OS X 10.5 is set to 2560.

---------------------------
use Bio::Assembly::Contig;

my @contigs;

push @contigs, Bio::Assembly::Contig->new() for (1..10000);
---------------------------

I'll open a bug report on this for tracking (for release 1.7, along  
with any other Bio::Assembly issues).  That doesn't mean it won't get  
fixed sooner, just that we aren't under pressure with the next  
release, which already has a full plate.  IMO, I don't think there  
needs to be one SF::Collection per contig; one instance should work do  
for the entire assembly, using the same SF::Collection passed in to  
each contig and distinguishing the contig using the SeqFeature  
seq_id.  It would also be nice if we could change that to also allow  
other SeqFeature::CollectionI (i.e. Bio::DB::SeqFeature::Store and the  
like, for instance).

chris

On Aug 29, 2008, at 3:40 AM, Florent Angly wrote:

> Hi Joshua,
>
> I don't know the specifics of DB_File, but the 'Cannot open file  
> tree: Too many open files' is pretty explicit.
> If you're on Unix/Linux you can check the files that are open by  
> your program by typing:
>   lsof | grep name_of_program
> There is probably a filehandle that in not closed somewhere in your  
> code or the BioPerl code.
> Best,
>
> Florent
>
>
>
> Joshua Udall wrote:
>> Bioperl -
>>
>> I'm trying to read/parse a single cap3 ace file with several thousand
>> contigs.  I get a DB_File error at Contig247.  Here's the error:
>>
>> ------------- EXCEPTION -------------
>> MSG: Unable to tie DB_File handle
>> STACK Bio::SeqFeature::Collection::new
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm:195
>> STACK Bio::Assembly::Contig::new
>> /Users/jaudall/bin/bioperl-live/Bio/Assembly/Contig.pm:256
>> STACK Bio::Assembly::IO::ace::next_assembly
>> /Users/jaudall/bin/src/bioperl-live/Bio/Assembly/IO/ace.pm:148
>> STACK toplevel /Users/jaudall/bin/read_ace.pl:214
>> -------------------------------------
>>
>> Looking at the Collection::new, the error is on the middle line:
>>
>>  $self->{'_btree'} = tie %{$self->{'_btreehash'}}, 'DB_File',
>> $self->indexfile, O_RDWR|O_CREAT, 0640, $DB_BTREE;  # or die  
>> "Cannot open
>> file: $!\n" ;
>>  $self->{'_btree'} || $self->throw("Unable to tie DB_File handle");
>>  return $self;
>>
>> If I uncomment out the $! die statement that I inserted, I get this:
>>
>> 'Cannot open file tree: Too many open files'
>>
>> Apparently the Collection constructor is creating a new index file  
>> for each
>> one and the handles for each are sticking around?  That confuses me  
>> because
>> reading more about the Collection.pm and DB_File, it appeared to me  
>> that no
>> files were written by default (as I'm doing), rather the Collection  
>> objects
>> are all stored in memory.  I'm pretty sure the error is not a  
>> permission
>> error, and if it is not the open file-handles, what else should I  
>> look for?
>>
>>
>> If I 'warn' the error instead of throwing it, I get:
>>
>> Can't call method "get_dup" on an undefined value at
>> /Users/jaudall/bin/src/bioperl-live/Bio/SeqFeature/Collection.pm  
>> line 360
>>
>> This kind of makes sense because the index appears not be be  
>> created and it
>> can't look stuff up in an undefined tied hash.  I'm stuck.
>>
>> Thanks for any help and suggestions.
>>
>> OSX, perl 5.8.8, bioperl-live (svn last week)
>>


From milan.gilic at st.t-com.hr  Sat Aug 23 21:15:39 2008
From: milan.gilic at st.t-com.hr (Milan)
Date: Sat, 23 Aug 2008 23:15:39 +0200
Subject: [Bioperl-l]  Count or weight matrix in bioperl?
Message-ID: <48B07DFB.3040201@st.t-com.hr>


From dominic at bioinf.uni-leipzig.de  Mon Aug 25 10:40:42 2008
From: dominic at bioinf.uni-leipzig.de (Dominic Rose)
Date: Mon, 25 Aug 2008 12:40:42 +0200
Subject: [Bioperl-l] Bio::Align::DNAStatistics.pm
Message-ID: <48B28C2A.8050803@bioinf.uni-leipzig.de>

Hi,

just a short suggestion to improve the code:

in function

sub _build_nt_matrix()

one finds the following lines:

         my $ti_index = $NucleotideIndexes{$ti};
         my $tj_index = $NucleotideIndexes{$tj};

         if( ! defined $ti_index ) {
             print "ti_index not defined for $ti\n";
             next;
         }

However, it should be possible to avoid the printing of that error 
message. Many alignments contain N's what causes many many
"ti_index not defined for N" messages. It should be possible to switch 
that message off.

Thanks,
Dominic

-- 
Dominic Rose
Professur f?r Bioinformatik
Institut f?r Informatik
Universit?t Leipzig
H?rtelstr. 16-18
D-04107 Leipzig
WWW    http://www.bioinf.uni-leipzig.de
Phone: +49 341 97-16698
Fax:   +49 341 97-16679